What’s New in Version 2.20¶
Release Content ¶
The Run:ai v2.20 What's New provides a detailed summary of the latest features, enhancements, and updates introduced in this version. They serve as a guide to help users, administrators, and researchers understand the new capabilities and how to leverage them for improved workload management, resource optimization, and more.
Important
For a complete list of deprecations, see Deprecation notifications. Deprecated features and capabilities will be available for two versions ahead of the notification.
Researchers¶
Workloads - Workspaces and Training¶
-
Stop/run actions for distributed workloads - You can now stop and run distributed workloads from the UI, CLI, and API. Scheduling rules for training workloads also apply to distributed workloads. This enhances control over distributed workloads, enabling greater flexibility and resource management. From cluster v2.20 onward
-
Visibility into idle GPU devices - Idle GPU devices are now displayed in the UI and API showing the number of allocated GPU devices that have been idle for more than 5 minutes. This provides better visibility into resource utilization, enabling more efficient workload management.
-
Configurable workload completion with multiple runs - You can now define the number of runs a training workload must complete to be considered finished directly in the UI, API, and CLI v2. Running training workloads multiple times improves the reliability and validity of training results. Additionally, you can configure how many runs can be scheduled in parallel, helping to significantly reduce training time and simplifying the process of managing jobs that require multiple runs. See Train models using a standard training workload for more details. From cluster v2.20 onward
-
Configurable grace period for workload preemption - You can now set a grace period in the UI, API and CLI v2 providing a buffer time for preempted workloads to reach a safe checkpoint before being forcibly preempted for standard and distributed training workloads. The grace period can be configured between 0 seconds and 5 minutes. This aims to minimize data loss and avoid unnecessary retraining, ensuring the latest checkpoints are saved. From cluster v2.20 onward
-
Pod deletion policy for terminal workloads - You can now specify which pods should be deleted when a distributed workload reaches a terminal state (completed/failed) using cleanPodPolicy in CLI v2 and API. This enhancement provides greater control over resource cleanup and helps maintain a more organized and efficient cluster environment. See cleanPodPolicy for more details.
Workload Assets¶
-
Instructions for environment variables - You can now add instructions to environment variables when creating new environments via the UI and API. In addition, Run:ai's environments now include default instructions. Adding instructions provides guidance enabling anyone using the environment to set the environment variable values correctly. From cluster v2.20 onward
-
Enhanced environments and compute resource management - The action bar now contains "Make a Copy" and "Edit" while the "Rename" option has been removed. A new "Last Updated" column has also been added for easier tracking of asset modifications. From cluster v2.20 onward
-
Enhanced data sources and credentials tables - Added a new "Kubernetes name" column to data sources and credentials tables for visibility into Kubernetes resource associations. The credentials table now includes an "Environments" column displaying the environments associated with the credential. From cluster v2.20 onward
Authenitication and authorization¶
- User applications for API authentication - You can now create your own applications for API integrations with Run:ai. Each application includes client credentials which can be used to obtain an authentication token to utilize for subsequent API calls. See User applications for more details. From cluster v2.20 onward
Scheduler¶
-
Support for multiple fractional GPUs in a single workload - Run:ai now supports submitting workloads that utilize multiple fractional GPUs within a single workload using the UI and CLI. This feature enhances GPU utilization, increases scheduling probability in shorter timeframes, and allows workloads to consume only the memory they need. It maximizes quota usage and enables more workloads to share the same GPUs effectively. See Multi-GPU fractions and Multi-GPU dynamic fractions for more details. Beta for Dynamic Fractions From cluster v2.20 onward
-
Support for GPU memory swap with multiple GPUs per workload - Run:ai now supports GPU memory swap for workloads utilizing multiple GPUs. By leveraging GPU memory swap, you can maximize GPU utilization and serve more workloads using the same hardware. The swap scheduler on each node ensures that all GPUs of a distributed model run simultaneously, maintaining synchronization across GPUs. Workload configurations combine swap settings with multi-GPU dynamic fractions, providing flexibility and efficiency for managing large-scale workloads. See Multi-GPU memory swap. Beta From cluster v2.20 onward
Command Line Interface (CLI v2)¶
-
Support for Windows OS - CLI v2 now supports Windows operating systems, enabling you to leverage the full capabilities of the CLI. From cluster v2.18 onward
-
Unified training command structure - Unified the
distributed
command into thetraining
command to align with the Run:ai UI. Thetraining
command now includes a new sub-command to support distributed workloads, ensuring a more consistent and streamlined user experience across both the CLI v2 and UI. -
New command for Kubernetes access - Added a new CLI v2 command,
runai kubconfig set
, allowing users to set the kubeconfig file with Run:ai authorization token. This enhancement enables users to gain access to the Kubernetes cluster, simplifying authentication and integration with Run:ai-managed environments. -
Added view workload labels - You can now view the labels associated with a workload when using the CLI v2
runai workload describe
command for all workload types. This enhancement provides better visibility into workload metadata.
ML Engineers¶
Workloads - Inference¶
-
Enhanced visibility into rolling updates for inference workloads - Run:ai now provides a phase message that provides detailed insights into the current state of the update, by hovering over the workload's status. This helps users to monitor and manage updates more effectively. See Rolling inference updates for more details. From cluster v2.20 onward
-
Inference serving endpoint configuration - You can now define an inference serving endpoint directly within the environment using the Run:ai UI. From cluster v2.19 onward
-
Persistent token management for Hugging Face models - Run:ai allows users to save their Hugging Face tokens persistently as part of their credentials within the Run:ai UI. Once saved, tokens can be easily selected from a list of stored credentials, removing the need to manually enter them each time. This enhancement improves the process of deploying Hugging Face models, making it more efficient and user-friendly. See Deploy inference workloads from Hugging Face for more details. From cluster v2.13 onward
-
Deploy and manage NVIDIA NIM models in inference workloads - Run:ai now supports NVIDIA NIM models, enabling you to easily deploy and manage these models when submitting inference workloads. You can select a NIM model and leverage NVIDIA’s hardware optimizations directly through the Run:ai UI. This feature also allows you to take advantage of Run:ai capabilities such as autoscaling and GPU fractioning. See Deploy inference workloads with NVIDIA NIM for more details.
-
Customizable autoscaling plans for inference workloads - Run:ai allows advanced users practicing autoscaling for inference workloads to fine-tune their autoscaling plans using the Update inference spec API. This feature enables you to achieve optimal behavior to meet fluctuating request demands. Experimental From cluster v2.20 onward
Platform Administrator¶
Analytics¶
- New Reports view for analytics - The new Reports enables generating and organizing large data in a structured, CSV-formatted layout. With this feature, you can monitor resource consumption, identify trends, and make informed decisions to optimize their AI workloads with greater efficiency.
Authorization and authentication¶
- Client credentials for applications - Applications now use client credentials - Client ID and Client secret - to obtain an authentication token, aligned with OAuth standard. See Applications for more details. From cluster v2.20 onward
Node pools¶
-
Enhanced metric graphs for node pools - Enhanced metric graphs in the DETAILS tab for node pools by aligning these graphs with the dashboard and the node pools API. As part of this improvement, the following columns have been removed from the Node pools table.
- Node GPU Allocation
- GPU Utilization Distribution
- GPU Utilization
- GPU Memory Utilization
- CPU Utilization
- CPU Memory Utilization
Organizations - Projects/Departments¶
-
Enhanced project deletion - Deleting a project will now attempt to delete the project's associated workloads and assets, allowing better management of your organization's assets. From cluster v2.20 onward
-
Enhanced resource prioritization for projects and departments - Run:ai has introduced advanced prioritization capabilities to manage resources between projects or between departments more effectively using the Projects and Departments APIs. From cluster v2.20 onward
This feature allows administrators to:
- Prioritize resource allocation and reclaim between different projects and departments.
- Prioritize projects within the same department.
- Set priorities per node-pool for both projects and departments.
- Implement distinct SLAs by assigning strict priority levels to over-quota resources.
-
Updated over quota naming - Renamed over quota priority to over quota weight to reflect its actual functionality.
Policy¶
-
Added policy-based default field values - Administrators can now set default values for fields that are automatically calculated based on the values of other fields using defaultFrom. This ensures that critical fields in the workload submission form are populated automatically if not provided by the user. From cluster v2.20 onward
This feature supports various field types:
- Integer fields (e.g.,
cpuCoresRequest
), - Number fields (e.g.,
gpuPortionRequest
), - Quantity fields (e.g.,
gpuMemoryRequest
)
- Integer fields (e.g.,
Data sources¶
- Improved control over data source and storage class visibility - Run:ai now provides administrators with the ability to control the visibility of data source types and storage in the UI. Data source types that are restricted by policy will no longer appear during workload submission or when creating new data source assets. Additionally, administrators can configure storage classes as internal using the Storage class configuration API. From cluster v2.20 onward
Email notifications¶
- Added email notifications API - Email notifications can now be configured via API in addition to the UI, enabling integration with external tools. See NotificationChannels API for more details.
Infrastructure Administrator¶
NVIDIA Data Center GPUs - Grace-Hopper¶
- Support for ARM-Based Grace-Hopper Superchip (GH200) - Run:ai now supports the ARM-based Grace-Hopper Superchip (GH200). Due to a limitation in version 2.20 with ARM64, the Run:ai control plane services must be scheduled on non-ARM based CPU nodes. This limitation will be addressed in a future release. See Self-Hosted installation over Kubernetes for more details. From cluster v2.20 onward
System requirements¶
- Run:ai now supports Kubernetes version 1.32.
- Run:ai now supports OpenShift version 4.17.
- Kubernetes version 1.28 is no longer supported.
- OpenShift versions 4.12 to 4.13 are no longer supported.
Advanced cluster configurations¶
-
Exclude nodes in mixed node clusters - Run:ai now allows you to exclude specific nodes in a mixed node cluster using the
nodeSelectorTerms
flag. See Advanced Cluster Configurations for more details. From cluster v2.20 onward -
Advanced configuration options for cluster services - Introduced new cluster configuration options for setting node affinity and tolerations for Run:ai cluster services. These configuration ensure that the Run:ai cluster services are scheduled on the desired nodes. See Advanced Cluster Configurations for more details. From cluster v2.20 onward
global.affinity
global.tolerations
daemonSetsTolerations
-
Added Argo workflows auto-pod grouping - Introduced a new cluster configuration option,
gangScheduleArgoWorkflow
, to modify the default behavior for grouping ArgoWorkflow pods, allowing you to prevent pods from being grouped into a single pod-group. See Advanced Cluster Configurations for more details. Cluster v2.20 and v2.18 -
Added cloud auto-scaling for memory fractions - Run:ai now supports auto-scaling for workloads using memory fractions in cloud environments. Using
gpuMemoryToFractionRatio
configuration option allows a failed scheduling attempt for a memory fractions workload to create Run:ai scaling pods, triggering the auto-scaler. See Advanced Cluster Configurations for more details. From cluster v2.19 onward -
Added stale gang eviction timeout for improved stability - Run:ai has introduced a default timeout of 60 seconds for gang eviction in gang scheduling workloads using
defaultStalenessGracePeriod
. This timeout allows both the workload controller and the scheduler sufficient time to remediate the workload, improving the stability of large training jobs. See Advanced Cluster Configurations for more details. From cluster v2.18 onward -
Added custom labels for built-in alerts - Administrators can now add their own custom labels to the built-in alerts from Prometheus by setting
spec.prometheus.additionalAlertLabels
in their cluster. See Advanced Cluster Configurations for mode details. From cluster v2.20 onward -
Enhanced configuration flexibility for cluster replica management - Administrators can now use the
spec.global.replicaCount
to manage replicas for Run:ai services. See Advanced Cluster Configurations for more details. From cluster v2.20 onward
Run:ai built-in alerts¶
- Added two new Run:ai built-in alerts for Kubernetes nodes hosting GPU workloads. The unknown state alert notifies when the node's health and readiness cannot be determined, and the low memory alert warns when the node has insufficient memory to support current or upcoming workloads. From cluster v2.20 onward
Run:ai Developer¶
Metrics and Telemtry¶
-
Additional metrics and telemetry are available via the API. For more details, see Metrics API:
-
Metrics (over time)
- Project
- GPU_QUOTA
- CPU_QUOTA_MILLICORES
- CPU_MEMORY_QUOTA_MB
- GPU_ALLOCATION
- CPU_ALLOCATION_MILLICORES
- CPU_MEMORY_ALLOCATION_MB
- Department
- GPU_QUOTA
- CPU_QUOTA_MILLICORES
- CPU_MEMORY_QUOTA_MB
- GPU_ALLOCATION
- CPU_ALLOCATION_MILLICORES
- CPU_MEMORY_ALLOCATION_MB
- Project
-
Telemetry (current time)
- Project
- GPU_QUOTA
- CPU_QUOTA
- MEMORY_QUOTA
- GPU_ALLOCATION
- CPU_ALLOCATION
- MEMORY_ALLOCATION
- GPU_ALLOCATION_NON_PREEMPTIBLE
- CPU_ALLOCATION_NON_PREEMPTIBLE
- MEMORY_ALLOCATION_NON_PREEMPTIBLE
- Department
- GPU_QUOTA
- CPU_QUOTA
- MEMORY_QUOTA
- GPU_ALLOCATION
- CPU_ALLOCATION
- MEMORY_ALLOCATION
- GPU_ALLOCATION_NON_PREEMPTIBLE
- CPU_ALLOCATION_NON_PREEMPTIBLE
- MEMORY_ALLOCATION_NON_PREEMPTIBLE
- Project
-
Deprecation notifications¶
Ongoing Dynamic MIG deprecation process¶
The Dynamic MIG deprecation process started in version 2.19. Run:ai supports standard MIG profiles as detailed in Configuring NVIDIA MIG profiles.
- Before upgrading to version 2.20, workloads submitted with Dynamic MIG and their associated node configurations must be removed
- In version 2.20, MIG was removed from the Run:ai UI under compute resources.
- In Q2/25 all ‘Dynamic MIG’ APIs and CLI commands will be fully deprecated.
CLI v1 deprecation¶
CLI V1 is deprecated and no new features will be developed for it. It will remain available for use for the next two releases to ensure a smooth transition for all users. We recommend switching to CLI v2, which provides feature parity, backwards compatibility, and ongoing support for new enhancements. CLI v2 is designed to deliver a more robust, efficient, and user-friendly experience.
Legacy Jobs view deprecation¶
Starting with version 2.20, the legacy Jobs view will be discontinued in favor of the more advanced Workloads view. The legacy submission form will still be accessible via the Workload manager view for a smoother transition.
appID and appSecret deprecation¶
Deprecating appID and appSecret parameters used for requesting an API token. It will remain available for use for the next two releases. To create application tokens, use your client credentials - Client ID and Client secret.