Skip to content

Hot Fixes for 2.13

The following is a list of the known and fixed issues for Run:ai V2.13.

Version 2.13.48 - March 14, 2024

Internal ID Description
RUN-16787 Fixed an issue after an upgrade to 2.13 where distributed PyTorch jobs were not able to run due to PVCs being assigned to only worker pods.
RUN-16626 Fixed an issue in SSO environments, where Workspaces created using a template were assigned the template creator's UID/GID and not the Workspace creator's UID/GID.
RUN-16357 Fixed an issue where pressing the Project link in Jobs screen redirects the view to the Projects of a different cluster in multi-cluster environments.

Version 2.13.43 - February 15, 2024

Internal ID Description
RUN-14946 Fixed an issue where Dashboards are displaying the hidden Grafana path.

Version 2.13.37

Internal ID Description
RUN-13300 Fixed an issue where projects will appear with a status of empty while waiting for the project controller to update its status. This was caused because the cluster-sync works faster than the project controller.

Version 2.13.35 - December 19, 2023

Release content

  • Added the ability to set node affinity for Prometheus.

Fixed issues

Internal ID Description
RUN-14472 Fixed an issue where template updates were not being applied to the workload.
RUN-14434 Fixed an issue where runai_allocated_gpu_count_per_gpu was multiplied by seven.
RUN-13956 Fixed an issue when changing an existing template created a Promise error on existing job templates.
RUN-13825 Fixed an issue when deleting a job that is allocated a fraction of a GPU, an associated configmap is not deleted.
RUN-13343 Fixed an issue in pod status calculation.

Version 2.13.31

Internal ID Description
RUN-11367 Fixed an issue where a double click on SSO Users redirects to a blank screen.
RUN-10560 Fixed an issue where the RunaiDaemonSetRolloutStuck alert did not work.

Version 2.13.25

Internal ID Description
RUN-13171 Fixed an issue when a cluster is not connected the actions in the Workspace and Training pages are still enabled. After the corrections, the actions will be disabled.

Version 2.13.21

Internal ID Description
RUN-12563 Fixed an issue where users are unable to login after upgrading the control plane from 2.9.16 to 2.13.16. To correct the issue, secrets need to be upgraded manually in keycloak.

Version 2.13.20 - September 28, 2023

Release content

  • Added the prevention of selecting tenant or department scopes for credentials, and the prevention of selecting s3, PVC, and Git data sources if the cluster version does not support these.
  • Quota management is now enabled by default.
Internal ID Description
RUN-12923 Fixed an issue in upgrading due to a misconfigured Docker image for airgapped systems in 2.13.19. The helm chart contained an error, and the image is not used even though it is packaged as part of the tar.
RUN-12928, RUN-12968 Fixed an issue in upgrading Prometheus due to a misconfigured image for airgapped systems in 2.13.19. The helm chart contained an error, and the image is not used even though it is packaged as part of the tar.
RUN-12751 Fixed an issue when upgrading from 2.9 to 2.13 results with a missing engine-config file.
RUN-12717 Fixed an issue where the user that is logged in as researcher manager can't see the clusters.
RUN-12642 Fixed an issue where assets-sync could not restart due to failing to get token from control plane.
RUN-12191 Fixed an issue where there was a timeout while waiting for the runai_allocated_gpu_count_per_project metric to return values.
RUN-10474 Fixed an issue where the runai-conatiner-toolkit-exporter DaemonSet fails to start.

Version 2.13.19 - September 27, 2023

Release content

  • Added the ability to identify Kubeflow notebooks and display them in the Jobs table.
  • Added the ability to schedule Kubelow workloads.
  • Added functionality that displays Jobs that only belong to the user that is logged in.
  • Added and refined alerts to the state of Run:ai components, schedule latency, and warnings for out of memory on Jobs.
  • Added the ability to work with restricted PSA policy.

Fixed issues

Internal ID Description
RUN-12650 Fixed an issue that used an incorrect metric in analytics GPU ALLOCATION PER NODE panel. Now the correct allocation is in percentage.
RUN-12602 Fixed an issue in runaiconfig where the WorkloadServices spec has memory requests/limits and cpu requests/limits and gets overwritten with the system default.
RUN-12585 Fixed an issue where the workload-controller creates a delay in running jobs.
RUN-12031 Fixed an issue when upgrading from 2.9 to 2.13 where the Scheduler pod fails to upgrade due to the change of owner.
RUN-11091 Fixed an issue where the Departments feature is disabled, you are not able to schedule non-preemable jobs.

Version 2.13.13

Internal ID Description
RUN-11321 Fixed an issue where metrics always showed CPU Memory Utilization and CPU Compute Utilization as 0.
RUN-11307 Fixed an issue where node affinity might change mid way through a job. Node affinity in now calculated only once at job submission.
RUN-11129 Fixed an issue where CRDs are not automatically upgraded when upgrading from 2.9 to 2.13.

Version 2.13.12 - August 7, 2023

Internal ID Description
RUN-11476 Fixed an issue with analytics node pool filter in Allocated GPUs per Project panel.

Version 2.13.11

Internal ID Description
RUN-11408 Added to the Run:ai job-controller 2 configurable parameters QPS and Burst which are applied as environment variables in the job-controller Deployment object.

Version 2.13.7 - July 2023

Release content

  • Added filters to the historic quota ratio widget on the Quota management dashboard.

Fixed issues

Internal ID Description
RUN-11080 Fixed an issue in OpenShift environments where log in via SSO with the kubeadmin user, gets blank pages for every page.
RUN-11119 Fixed an issue where values that should be the Order of priority column are in the wrong column.
RUN-11120 Fixed an issue where the Projects table does not show correct metrics when Run:ai version 2.13 is paired with a Run:ai 2.8 cluster.
RUN-11121 Fixed an issue where the wrong over quota memory alert is shown in the Quota management pane in project edit form.
RUN-11272 Fixed an issue in OpenShift environments where the selection in the cluster drop down in the main UI does not match the cluster selected on the login page.

Version 2.13.4

Release date

July 2023

Fixed issues

Internal ID Description
RUN-11089 Fixed an issue when creating an environment, commands in the Runtime settings pane and are not persistent and cannot be found in other assets (for example in a new Training).

Version 2.13.1 - July 2023

Release content

  • Made an improvement so that occurrences of labels that are not in use anymore are deleted.

Fixed issues

N/A