Workloads Overview¶
Workloads¶
Run:ai schedules Workloads. Run:ai workloads contain:
- The Kubernetes resource (Job, Deployment, etc) that is used to launch the container inside which the data science code runs.
- A set of additional resources that is required to run the Workload. Examples: a service entry point that allows access to the Job, a persistent volume claim to access data on the network and more.
Run:ai supports the following Workloads types:
Workload Type | Kubernetes Name | Description |
---|---|---|
Interactive | InteractiveWorkload | Submit an interactive workload |
Training | TrainingWorkload | Submit a training workload |
Distributed Training | DistributedWorkload | Submit a distributed training workload using TensorFlow, PyTorch or MPI |
Inference | InferenceWorkload | Submit an inference workload |
Values¶
A Workload will typically have a list of values, such as name, image, and resources. A full list of values is available in the runai-submit Command-line reference.
You can also find the exact YAML syntax run:
(and similarly for other Workload types).
To get information on a specific value (e.g. node type
), you can also run:
Result:
KIND: TrainingWorkload
VERSION: run.ai/v2alpha1
RESOURCE: nodeType <Object>
DESCRIPTION:
Specifies nodes (machines) or a group of nodes on which the workload will
run. To use this feature, your Administrator will need to label nodes as
explained in the Group Nodes guide at
https://docs.run.ai/admin/researcher-setup/limit-to-node-group. This flag
can be used in conjunction with Project-based affinity. In this case, the
flag is used to refine the list of allowable node groups set in the
Project. For more information consult the Projects guide at
https://docs.run.ai/admin/admin-ui-setup/project-setup.
FIELDS:
value <string>
How to Submit¶
A Workload can be submitted via various channels:
- The Run:ai user interface.
- The Run:ai command-line interface, via the runai submit command.
- The Run:ai Cluster API.
Policies¶
An Administrator can set Policies for Workload submission. Policies serve two purposes:
- To constrain the values a researcher can specify.
- To provide default values.
For example, an administrator can,
- Set a maximum of 5 GPUs per Workload.
- Provide a default value of 1 GPU for each container.
Each workload type has a matching kind of workload policy. For example, an InteractiveWorkload
has a matching InteractivePolicy
A Policy of each type can be defined per-project. There is also a global policy that applies to any project that does not have a per-project policy.
For further details on policies, see Policies.