Run:ai Documentation Library
Reference
Initializing search
GitHub
Home
Infrastructure Administrator
Platform Administrator
Researcher
Developer
Run:ai Documentation Library
GitHub
Home
Home
Overview
System Components
Whats New
Whats New
Version 2.18
Version 2.17
Version 2.16
Version 2.15
Version 2.13
Changelog
Changelog
Hot Fixes for 2.17
Hot Fixes for 2.16
Hot Fixes for 2.15
Hot Fixes for 2.13
Data Privacy
Infrastructure Administrator
Infrastructure Administrator
Overview
Installation
Installation
Installation Types
Classic (SaaS)
Classic (SaaS)
Introduction
System Requirements
Network Requirements
Cluster Install
Customize Installation
Cluster Upgrade
Cluster Uninstall
NVIDIA DGX Bundle
Self-hosted
Self-hosted
Overview
Kubernetes-based
Kubernetes-based
Prerequisites
Preparations
Install Control Plane
Install a Cluster
Install additional Clusters
Manually Create Projects
Next Steps
Upgrade
Uninstall
OpenShift-based
OpenShift-based
Prerequisites
Preparations
Install Control Plane
Install a Cluster
Install additional Clusters
Manually Create Projects
Next Steps
Upgrade
Uninstall
Researcher Setup
Researcher Setup
Introduction
Install the V1 CLI
Install the V2 CLI
Configuration
Configuration
Overview
Set Node Roles
Clusters
Shared storage
Set Default Scheduler
Review Kubernetes Access provided to Run:ai
External access to Containers
Install Administrator CLI
Node Affinity with Cloud Node Pools
Local Certificate Authority
Backup & Restore
High Availability
Scaling
Email and System Notifications
Maintenance
Maintenance
Node Downtime
System Monitoring
Audit Log
Setup cluster wide PVC
Group Nodes
Authentication & Authorization
Authentication & Authorization
Overview
Single Sign-On
Single Sign-On
Setup SSO with SAML
Setup SSO with OpenID Connect
Setup SSO with OpenShift
Users
Applications
Roles
Access Rules
Researcher Authentication
User Identity in Container
Troubleshooting
Troubleshooting
Troubleshooting
Diagnostics
Platform Administrator
Platform Administrator
Overview
Authentication & Authorization
Authentication & Authorization
Users
Applications
Roles
Access Rules
System Configuration
System Configuration
Administrator Messages
Managing AI Intiatives
Managing AI Intiatives
Overview
Managing your Organization
Managing your Organization
Projects
Departments
Scheduling Rules
Managing your resources
Managing your resources
Nodes
Node Pools
Review your performance
Review your performance
Dashboard Analysis
Workloads
Workloads
Overview
Workload Assets
Workload Assets
Overview
Environments
Compute Resources
Data Sources
Data Sources
Overview
PVC Data Source
Templates
Credentials
Credentials
Credentials
Secrets
Data Volumes
Submitting Workloads
Policies
Policies
Overview
Policies V2
Policies V1
Best Practices
Best Practices
From Docker to Run:ai
Researcher
Researcher
Overview
Quickstart Guides
Quickstart Guides
Run:ai Quickstart Guides
Train
Train
Training
Distributed Training
Build
Build
Basics
Build with Connected Ports
Jupyter Notebook
Visual Studio Code Web
Inference
GPU Allocation
GPU Allocation
GPU Fractions
Dynamic MIG
Scheduling
Scheduling
Over-Quota, Basic Fairness & Bin-Packing
Queue Fairness
Workloads
Workloads
Workload Assets
Workload Assets
Overview
Environments
Compute Resources
Data Sources
Data Sources
Overview
PVC Data Source
Templates
Credentials
Credentials
Credentials
Secrets
Data Volumes
Workspaces
Workspaces
Workspace
Create a Workspace
Trainings
Inference
Statuses
Command Line Interface
Command Line Interface
CLI V2
CLI V2
Overview
CLI Reference
CLI V1
CLI V1
Introduction
runai attach
runai bash
runai config
runai delete
runai describe
runai exec
runai list
runai login
runai logout
runai logs
runai port-forward
runai resume
runai submit
runai submit-dist mpi
runai submit-dist pytorch
runai submit-dist tf
runai submit-dist xgboost
runai suspend
runai top node
runai update
runai version
runai whoami
Best Practices
Best Practices
Bare-Metal to Docker Images
Convert a Workload to Run Unattended
Save Deep Learning Checkpoints
Environment Variables
Email Notifications
Scheduling
Scheduling
The Run:ai Scheduler
Allocation of GPU Fractions
Dynamic GPU Fractions
Optimize performance with the Node Level Scheduler
GPU Time Slicing
GPU Memory Swap
Allocation of CPU and Memory
Job Statuses
Scheduling Strategies
Scheduling workloads to AWS placement groups
Tools
Tools
Visual Studio Code
PyCharm
X11 & PyCharm
Jupyter Notebook
TensorBoard
Use Cases
Developer
Developer
Overview
API Authentication
REST API
Cluster API (Deprecated)
Cluster API (Deprecated)
Overview
Submit Workload via YAML
Submit Workload via HTTP/REST
Reference
Metrics
Metrics
Metrics via API
(Deprecated) Metrics via Prometheus
Kubernetes Workloads Integration
Reference
For a full reference for the YAML API parameters see the
YAML Reference
document.
Back to top