Hotfixes for 2.15
The following is a list of the known and fixed issues for Run:ai V2.15.
Version 2.15.9 - February 5, 2024¶
Fixed issues¶
Internal ID | Description |
---|---|
RUN-15296 | Fixed an issue where the resources parameter was deprecated in the Projects and Departments API. |
Version 2.15.4 - January 5, 2024¶
Fixed issues¶
Internal ID | Description |
---|---|
RUN-15026 | Fixed an issue in workloads that were built on a cluster that does not support the NFS field. |
RUN-14907 | Fixed an issue after an upgrade where the Analytics dashboard was missing the time ranges from before the upgrade. |
RUN-14903 | Fixed an issue where internal operations were exposed to the customer audit log. |
RUN-14062 | Fixed an issue in the Overview dashboard where the content for the Running Workload per Type panel did not fit. |
Version 2.15.2 - February 5, 2024¶
Fixed issues¶
Internal ID | Description |
---|---|
RUN-14434 | Fixed an issue where the Allocated GPUs metric was multiplied by seven. |
Version 2.15.1 - December 17, 2023¶
Release content¶
-
Added environment variables for customizable QPS and burst support.
-
Added the ability to support running multiple Prometheus replicas.
Fixed issues¶
Internal ID | Description |
---|---|
RUN-14292 | Fixed an issue where BCM installations were failing due to missing create cluster permissions. |
RUN-14289 | Fixed an issue where metrics were not working due to an incorrect parameter in the cluster-config file. |
RUN-14198 | Fixed an issue in services where multi nodepool jobs were not scheduled due to an unassigned nodepool status. |
RUN-14191 | Fixed an issue where a consolidation failure would cause unnecessary evictions. |
RUN-14154 | Fixed an issue in the New cluster form, whefre the dropdown listed versions that were incompatible with the installed control plane. |
RUN-13956 | Fixed an issue in the Jobs table where templates were not edited successfully. |
RUN-13891 | Fixed an issue where Ray job statuses were shown as empty. |
RUN-13825 | Fixed an issue where GPU sharing configmaps were not deleted. |
RUN-13628 | Fixed an issue where the pre-install pod failed to run pre-install tasks due to the request being denied (Unauthorized). |
RUN-13550 | Fixed an issue where environments were not recovering from a node restart due to a missing GPU runtime class for containerized nodes. |
RUN-11895 | Fixed an issue where the wrong amount of GPU memory usage was shown (is now MB). |
RUN-11681 | Fixed an issue in OpenShift environments where some metrics were not shown on dashboards when the GPU Operator from the RedHat marketplace was installed. |
Version 2.15.0¶
Fixed issues¶
Internal ID | Description |
---|---|
RUN-13456 | Fixed an issue where the Researcher L1 role did not have permissions to create and manage credentials. |
RUN-13282 | Fixed an issue where Workspace logs crashed unexpectedly after restarting. |
RUN-13121 | Fixed an issue in not being able to launch jobs using the API after an upgrade overrode a change in keycloak for applications which have a custom mapping to an email. |
RUN-13103 | Fixed an issue in the Workspaces and Trainings table where the action buttons were not greyed out for users with only the view role. |
RUN-12993 | Fixed an issue where Prometheus was reporting metrics even though the cluster was disconnected. |
RUN-12978 | Fixed an issue after an upgrade, where permissions fail to sync to a project due to a missing application name in the CRD. |
RUN-12900 | Fixed an issue in the Projects table, when sorting by Allocated GPUs, the projects were displayed alphabetically and not numerically. |
RUN-12846 | Fixed an issue after a control-plane upgrade, where GPU, CPU, and Memory Cost fields (in the Consumption Reports) were missing when not using Grafana. |
RUN-12824 | Fixed an issue where airgapped environments tried to pull an image from gcr.io (Internet). |
RUN-12769 | Fixed an issue where SSO users were unable to see projects in Job Form unless the group they belong to was added directly to the project. |
RUN-12602 | Fixed an issue in the documentation where the WorkloadServices configuration in the runaiconfig file was incorrect. |
RUN-12528 | Fixed an issue where the Workspace duration scheduling rule was suspending workspaces regardless of the configured duration. |
RUN-12298 | Fixed an issue where projects were not shown in the Projects table due to the API not sanitizing the project name at time of creation. |
RUN-12157 | Fixed an issue where querying pods completion time returned a negative number. |
RUN-10560 | Fixed an issue where no Prometheus alerts were sent due to a misconfiguration of the parameter RunaiDaemonSetRolloutStuck . |