Run:ai schedules Workloads. Run:ai workloads contain:
- The Kubernetes resource (Job, Deployment, etc) that is used to launch the container inside which the data science code runs.
- A set of additional resources that is required to run the Workload. Examples: a service entry point that allows access to the Job, a persistent volume claim to access data on the network and more.
Run:ai supports the following Workloads types:
|Submit an interactive workload
|Submit a training workload
|Submit a distributed training workload using TensorFlow, PyTorch or MPI
|Submit an inference workload
A Workload will typically have a list of values, such as name, image, and resources. A full list of values is available in the runai-submit Command-line reference.
You can also find the exact YAML syntax run:
(and similarly for other Workload types).
To get information on a specific value (e.g.
node type), you can also run:
RESOURCE: nodeType <Object>
Specifies nodes (machines) or a group of nodes on which the workload will
run. To use this feature, your Administrator will need to label nodes as
explained in the Group Nodes guide at
https://docs.run.ai/admin/researcher-setup/limit-to-node-group. This flag
can be used in conjunction with Project-based affinity. In this case, the
flag is used to refine the list of allowable node groups set in the
Project. For more information consult the Projects guide at
How to Submit¶
A Workload can be submitted via various channels:
- The Run:ai user interface.
- The Run:ai command-line interface, via the runai submit command.
- The Run:ai Cluster API.
An Administrator can set Policies for Workload submission. Policies serve two purposes:
- To constrain the values a researcher can specify.
- To provide default values.
For example, an administrator can,
- Set a maximum of 5 GPUs per Workload.
- Provide a default value of 1 GPU for each container.
Each workload type has a matching kind of workload policy. For example, an
InteractiveWorkload has a matching
A Policy of each type can be defined per-project. There is also a global policy that applies to any project that does not have a per-project policy.
For further details on policies, see Policies.