Whats New 2021
December 8th 2021¶
To comply with organizational policies and enhance the Run:AI platform security, Run:AI now supports Single Sign-On (SSO). This functionality is currently in beta and is available for new customer tenants only. For further details on SSO see Single Sign-On.
To optimize resource management and utilization of Nvidia GPUs based on Ampere architecture, such as A100, Run:AI now supports dynamic creation and allocation of MIG partitions. This functionality is currently in beta. For further details on dynamic allocation of MIG partitions see Dynamic MIG.
Run:AI now supports AI workloads running in containerized clusters based on the VMWare Tanzu orchestrator. For further details on supported orchestrators see the prerequisites document.
- A new "Status History" tab has been added to the job details view. The new tab shows the details of each status change of each job and allows researchers to analyze how to improve experiments as well as equip administrators with a tool to analyze running and historical jobs. In addition, the details of the reason a job is in the current status are available when hovering over the job status on the jobs table.
- To improve the ability to monitor the Run:AI environment, Run:AI components now expose alerts indicating whether the component is running. For further details on cluster monitoring see Cluster Monitoring
User Experience (UX) enhancements:
- Run:AI cluster version is now available in the clusters list.
- Researchers can now submit and integrate with Git directly from the user interface.
October 29th 2021¶
The Run:AI cluster now enforces the access definitions of the user and lists only jobs under permitted projects. For example,
runai list jobs will only show jobs from projects to which the researcher has access to.
The Run:AI CLI
runai list projects option now displays the quota definitions of each project.
The Run:AI CLI port forwarding option now supports any IP address.
The Run:AI CLI binary download is now signed with a checksum, to allow customers to validate the origin of the CLI and align with security best practices and standards.
The Run:AI Researcher User Interface now supports setting GPU Memory as well as volumes in NFS servers.
The Run:AI metrics used in the Dashboards are now officially documented and can be accessed via APIs as documented here.
Run:AI now officially supports integration with Seldon Core. For more details read here.
Run:AI now support VMWare Tanzu Kubernetes.
August 30th 2021¶
Run:AI now supports a self-hosted installation. With the self-hosted installation the Run:AI control-plane (or backend) which typically resides on the cloud, is deployed at the customer's data center. For further details on supported installation types see Installation Types.
The Run:AI self-hosted installation requires a dedicated license, and has different pricing than the SaaS installation. For more details contact your Run:AI account manager.
NFS volumes can now be mounted directly to containers run by Run:AI while submitting jobs via Run:AI. See the
--nfs-server flag of runai submit.
To ease the manageability of user templates, Run:AI now supports global user templates. Global user templates are user templates that are managed by the Run:AI administrator and are available for all the projects within a specific cluster. The purpose of global user templates is to help define and enforce cross-organization resource policies.
To simplify researchers' job submission via the Run:AI Researcher User Interface (UI), the UI now supports autocomplete, which is based on pre-defined values, as configured by the Administrator using the administrative templates.
Run:AI extended the usage of Cluster name, as defined by the Administrator while configuring clusters at Run:AI. The Cluster name is now populated to the Run:AI dashboards as well as the Researcher UI.
The original command line, which was used for running a Job, is now shown under the Job details under the General tab.
August 4th 2021¶
Researcher User Interface (UI) enhancements:
- Revised user interface and user experience
- Researchers can create templates for the ease of jobs submission. Templates can be saved and used at the project level
- Researchers can be easily re-submit jobs from the Submit page or directly from the jobs list on the Jobs page
- Administrators can create administrative templates which set cluster-wide defaults, constraints, and defaults for the submission of Jobs. For further details see Configure Command-Line Interface Templates.
- Different teams can collaborate and share templates by exporting and importing templates in the Submit screen
Researcher Command Line Interface (CLI) enhancements:
- Jobs can be manually suspended and resumed using the new commands:
- A new command was added:
runai top job
Kubeflow integration is now supported. The new integration allows building ML pipelines in Kubeflow Pipelines as well as Kubeflow Notebooks and run the workloads via the Run:AI platform. For further details see Integrate Run:AI with Kubeflow.
Mlflow integration is now supported. For further details see Integrate Run:AI with MLflow.
Run:AI Projects are implemented as Kubernetes namespaces. Run:AI now supports customizable namespace names. For further details see Manual Creation of Namespaces.
May 10th 2021¶
Usability improvements of Run:AI Command-line interface (CLI). The CLI now supports autocomplete for all options and parameters.
Usability improvements of the Administration user interface navigation menu now allow for easier navigation.
Run:AI can be installed when Kubernetes has Pod Security Policy (PSP) enabled.
April 20th 2021¶
Job List and Node list now show the GPU type (e.g. v-100).
April 18th, 2021¶
Inference workloads are now supported. For further details see Inference Overview.
JupyterHub integration is now supported. For further details see JupyterHub Integration
NVIDIA MIG is now supported. You can use the NVIDIA MIG technology to partition A-100 GPUs. Each partition will be considered as a single GPU by the Run:AI system and all the Run:AI functionality is supported in the partition level, including GPU Fractions.
April 1st, 2021¶
Run:AI now supports Kubernetes 1.20
March 24th 2021¶
Job List and Node list now show CPU utilization and CPU memory utilization.
February 14th, 2021¶
The Job list now shows per-Job graphs for GPU utilization, GPU memory.
The Node list now shows per-Node graphs for GPU utilization, GPU memory.
January 22nd, 2021¶
New Analytics dashboard with emphasis on CPU, CPU Memory, GPU, and GPU Memory. Allows better diagnostics of resource misuse.
January 15th, 2021¶
New developer documentation area has been created. In it:
- New documentation for Researcher REST API.
- New documentation for Administration Rest API.
- Kubernetes-based API for job creation.
January 9th 2021¶
A new Researcher user interface is now available. See researcher UI setup.
January 2nd, 2021¶
Run:AI Clusters now support Azure Managed Kubernetes Service (AKS)