Skip to content

Workloads Overview

Workloads

Run:ai schedules Workloads. Run:ai workloads contain:

  • The Kubernetes resource (Job, Deployment, etc) that is used to launch the container inside which the data science code runs.
  • A set of additional resources that is required to run the Workload. Examples: a service entry point that allows access to the Job, a persistent volume claim to access data on the network and more.

Run:ai supports the following Workloads types:

Workload Type Kubernetes Name Description
Interactive InteractiveWorkload Submit an interactive workload
Training TrainingWorkload Submit a training workload
Distributed Training DistributedWorkload Submit a distributed training workload using TensorFlow, PyTorch or MPI
Inference InferenceWorkload Submit an inference workload

Values

A Workload will typically have a list of values, such as name, image, and resources. A full list of values is available in the runai-submit Command-line reference.

You can also find the exact YAML syntax run:

kubectl explain TrainingWorkload.spec

(and similarly for other Workload types).

To get information on a specific value (e.g. node type), you can also run:

kubectl explain TrainingWorkload.spec.nodeType

Result:

KIND:     TrainingWorkload
VERSION:  run.ai/v2alpha1

RESOURCE: nodeType <Object>

DESCRIPTION:
     Specifies nodes (machines) or a group of nodes on which the workload will
     run. To use this feature, your Administrator will need to label nodes as
     explained in the Group Nodes guide at
     https://docs.run.ai/admin/researcher-setup/limit-to-node-group. This flag
     can be used in conjunction with Project-based affinity. In this case, the
     flag is used to refine the list of allowable node groups set in the
     Project. For more information consult the Projects guide at
     https://docs.run.ai/admin/admin-ui-setup/project-setup.

FIELDS:
   value    <string>

How to Submit

A Workload can be submitted via various channels:

Policies

An Administrator can set Policies for Workload submission. Policies serve two purposes:

  1. To constrain the values a researcher can specify.
  2. To provide default values.

For example, an administrator can,

  • Set a maximum of 5 GPUs per Workload.
  • Provide a default value of 1 GPU for each container.

Each workload type has a matching kind of workload policy. For example, an InteractiveWorkload has a matching InteractivePolicy

A Policy of each type can be defined per-project. There is also a global policy that applies to any project that does not have a per-project policy.

For further details on policies, see Policies.