Hot Fixes for 2.13

The following is a list of the known and fixed issues for Run:ai V2.13.

Version 2.13.48 - March 14, 2024¶

Internal ID	Description
RUN-16787	Fixed an issue after an upgrade to 2.13 where distributed PyTorch jobs were not able to run due to PVCs being assigned to only worker pods.
RUN-16626	Fixed an issue in SSO environments, where Workspaces created using a template were assigned the template creator's UID/GID and not the Workspace creator's UID/GID.
RUN-16357	Fixed an issue where pressing the Project link in Jobs screen redirects the view to the Projects of a different cluster in multi-cluster environments.

Internal ID	Description
RUN-14946	Fixed an issue where Dashboards are displaying the hidden Grafana path.

Internal ID	Description
RUN-13300	Fixed an issue where projects will appear with a status of empty while waiting for the project controller to update its status. This was caused because the cluster-sync works faster than the project controller.

Internal ID	Description
RUN-14472	Fixed an issue where template updates were not being applied to the workload.
RUN-14434	Fixed an issue where `runai_allocated_gpu_count_per_gpu` was multiplied by seven.
RUN-13956	Fixed an issue where editing templates failed.
RUN-13825	Fixed an issue when deleting a job that is allocated a fraction of a GPU, an associated configmap is not deleted.
RUN-13343	Fixed an issue in pod status calculation.

Internal ID	Description
RUN-11367	Fixed an issue where a double click on SSO Users redirects to a blank screen.
RUN-10560	Fixed an issue where the `RunaiDaemonSetRolloutStuck` alert did not work.

Internal ID	Description
RUN-13171	Fixed an issue when a cluster is not connected the actions in the Workspace and Training pages are still enabled. After the corrections, the actions will be disabled.

Internal ID	Description
RUN-12563	Fixed an issue where users are unable to login after upgrading the control plane from 2.9.16 to 2.13.16. To correct the issue, secrets need to be upgraded manually in keycloak.

Added the prevention of selecting tenant or department scopes for credentials, and the prevention of selecting s3, PVC, and Git data sources if the cluster version does not support these.
Quota management is now enabled by default.

Internal ID	Description
RUN-12923	Fixed an issue in upgrading due to a misconfigured Docker image for airgapped systems in 2.13.19. The helm chart contained an error, and the image is not used even though it is packaged as part of the tar.
RUN-12928, RUN-12968	Fixed an issue in upgrading Prometheus due to a misconfigured image for airgapped systems in 2.13.19. The helm chart contained an error, and the image is not used even though it is packaged as part of the tar.
RUN-12751	Fixed an issue when upgrading from 2.9 to 2.13 results with a missing engine-config file.
RUN-12717	Fixed an issue where the user that is logged in as researcher manager can't see the clusters.
RUN-12642	Fixed an issue where assets-sync could not restart due to failing to get token from control plane.
RUN-12191	Fixed an issue where there was a timeout while waiting for the `runai_allocated_gpu_count_per_project` metric to return values.
RUN-10474	Fixed an issue where the `runai-conatiner-toolkit-exporter` DaemonSet fails to start.

Added the ability to identify Kubeflow notebooks and display them in the Jobs table.
Added the ability to schedule Kubelow workloads.
Added functionality that displays Jobs that only belong to the user that is logged in.
Added and refined alerts to the state of Run:ai components, schedule latency, and warnings for out of memory on Jobs.
Added the ability to work with restricted PSA policy.

Internal ID	Description
RUN-12650	Fixed an issue that used an incorrect metric in analytics GPU ALLOCATION PER NODE panel. Now the correct allocation is in percentage.
RUN-12602	Fixed an issue in `runaiconfig` where the `WorkloadServices` spec has memory requests/limits and cpu requests/limits and gets overwritten with the system default.
RUN-12585	Fixed an issue where the workload-controller creates a delay in running jobs.
RUN-12031	Fixed an issue when upgrading from 2.9 to 2.13 where the Scheduler pod fails to upgrade due to the change of owner.
RUN-11091	Fixed an issue where the Departments feature is disabled, you are not able to schedule non-preemable jobs.

Internal ID	Description
RUN-11321	Fixed an issue where metrics always showed CPU Memory Utilization and CPU Compute Utilization as 0.
RUN-11307	Fixed an issue where node affinity might change mid way through a job. Node affinity in now calculated only once at job submission.
RUN-11129	Fixed an issue where CRDs are not automatically upgraded when upgrading from 2.9 to 2.13.

Internal ID	Description
RUN-11476	Fixed an issue with analytics node pool filter in Allocated GPUs per Project panel.

Internal ID	Description
RUN-11408	Added to the Run:ai job-controller 2 configurable parameters `QPS` and `Burst` which are applied as environment variables in the job-controller Deployment object.

Added filters to the historic quota ratio widget on the Quota management dashboard.

Internal ID	Description
RUN-11080	Fixed an issue in OpenShift environments where log in via SSO with the `kubeadmin` user, gets blank pages for every page.
RUN-11119	Fixed an issue where values that should be the Order of priority column are in the wrong column.
RUN-11120	Fixed an issue where the Projects table does not show correct metrics when Run:ai version 2.13 is paired with a Run:ai 2.8 cluster.
RUN-11121	Fixed an issue where the wrong over quota memory alert is shown in the Quota management pane in project edit form.
RUN-11272	Fixed an issue in OpenShift environments where the selection in the cluster drop down in the main UI does not match the cluster selected on the login page.

July 2023

Internal ID	Description
RUN-11089	Fixed an issue when creating an environment, commands in the Runtime settings pane and are not persistent and cannot be found in other assets (for example in a new Training).

Made an improvement so that occurrences of labels that are not in use anymore are deleted.

N/A