Run:ai schedules Workloads. Run:ai workloads are comprised of:
- The Kubernetes object (Job, Deployment, etc) which is used to launch the container, inside which the data science code runs.
- A set of additional resources that are required to run the Workload. Examples: a service entry point that allows access to the Job, a persistent volume claim to access data on the network, and more.
All of these components are created together and deleted together when the Workload ends.
Run:ai currently supports the following Workloads types:
|Workload Type||Kubernetes Name||Description|
|Interactive|| ||Submit an interactive workload|
|Training|| ||Submit a training workload|
|Distributed Training|| ||Submit a distributed training workload using TensorFlow, PyTorch or MPI|
|Inference|| ||Submit an inference workload|
A Workload will typically have a list of values (sometimes called flags), such as name, image, and resources. A full list of values is available in the runai-submit Command-line reference.
How to Submit¶
A Workload can be submitted via various channels:
- The Run:ai user interface.
- The Run:ai command-line interface, via the runai submit command.
- The Run:ai Cluster API.
As an administrator, you can set Policies on Workloads. Policies allow administrators to impose restrictions and set default values for Researcher Workloads. For more information see Workload Policies.