Fixed an issue after an upgrade to 2.13 where distributed PyTorch jobs were not able to run due to PVCs being assigned to only worker pods.
RUN-16626
Fixed an issue in SSO environments, where Workspaces created using a template were assigned the template creator's UID/GID and not the Workspace creator's UID/GID.
RUN-16357
Fixed an issue where pressing the Project link in Jobs screen redirects the view to the Projects of a different cluster in multi-cluster environments.
Fixed an issue where projects will appear with a status of empty while waiting for the project controller to update its status. This was caused because the cluster-sync works faster than the project controller.
Fixed an issue when a cluster is not connected the actions in the Workspace and Training pages are still enabled. After the corrections, the actions will be disabled.
Fixed an issue where users are unable to login after upgrading the control plane from 2.9.16 to 2.13.16. To correct the issue, secrets need to be upgraded manually in keycloak.
Added the prevention of selecting tenant or department scopes for credentials, and the prevention of selecting s3, PVC, and Git data sources if the cluster version does not support these.
Quota management is now enabled by default.
Internal ID
Description
RUN-12923
Fixed an issue in upgrading due to a misconfigured Docker image for airgapped systems in 2.13.19. The helm chart contained an error, and the image is not used even though it is packaged as part of the tar.
RUN-12928, RUN-12968
Fixed an issue in upgrading Prometheus due to a misconfigured image for airgapped systems in 2.13.19. The helm chart contained an error, and the image is not used even though it is packaged as part of the tar.
RUN-12751
Fixed an issue when upgrading from 2.9 to 2.13 results with a missing engine-config file.
RUN-12717
Fixed an issue where the user that is logged in as researcher manager can't see the clusters.
RUN-12642
Fixed an issue where assets-sync could not restart due to failing to get token from control plane.
RUN-12191
Fixed an issue where there was a timeout while waiting for the runai_allocated_gpu_count_per_project metric to return values.
RUN-10474
Fixed an issue where the runai-conatiner-toolkit-exporter DaemonSet fails to start.
Fixed an issue that used an incorrect metric in analytics GPU ALLOCATION PER NODE panel. Now the correct allocation is in percentage.
RUN-12602
Fixed an issue in runaiconfig where the WorkloadServices spec has memory requests/limits and cpu requests/limits and gets overwritten with the system default.
RUN-12585
Fixed an issue where the workload-controller creates a delay in running jobs.
RUN-12031
Fixed an issue when upgrading from 2.9 to 2.13 where the Scheduler pod fails to upgrade due to the change of owner.
RUN-11091
Fixed an issue where the Departments feature is disabled, you are not able to schedule non-preemable jobs.
Added to the Run:ai job-controller 2 configurable parameters QPS and Burst which are applied as environment variables in the job-controller Deployment object.
Fixed an issue in OpenShift environments where log in via SSO with the kubeadmin user, gets blank pages for every page.
RUN-11119
Fixed an issue where values that should be the Order of priority column are in the wrong column.
RUN-11120
Fixed an issue where the Projects table does not show correct metrics when Run:ai version 2.13 is paired with a Run:ai 2.8 cluster.
RUN-11121
Fixed an issue where the wrong over quota memory alert is shown in the Quota management pane in project edit form.
RUN-11272
Fixed an issue in OpenShift environments where the selection in the cluster drop down in the main UI does not match the cluster selected on the login page.
Fixed an issue when creating an environment, commands in the Runtime settings pane and are not persistent and cannot be found in other assets (for example in a new Training).