Integrate Run:AI with Kubeflow¶
Kubeflow is a platform for data scientists who want to build and experiment with ML pipelines. Kubeflow is also for ML engineers and operational teams who want to deploy ML systems to various environments for development, testing, and production-level serving.
This document describes the process of using Kubeflow in conjunction with Run:AI. Kubeflow submits jobs that are scheduled via Run:AI.
Use the default installation to install Kubeflow.
Install Run:AI Cluster¶
When installing Run:AI, customize the cluster installation as follows:
falseas it conflicts with Kubeflow.
false, as Kubeflow uses its own namespace convention.
Create Run:AI Projects¶
Kubeflow uses the namespace convension
kubeflow-<username>. Use the 4 steps here to set up Run:AI projects and link them with Kubeflow namespaces.
Verify that the association has worked by running:
kubectl get rolebindings -n <KUBEFLOW-NAMESPACE>
See that role bindings starting with
runai- were created.
Kubeflow, Users and Kubernetes Namespaces¶
Kubeflow has a multi-user architecture. A user has a Kubeflow profile which maps to a Kubernetes Namespace. This is similar to the Run:AI concept where a Run:AI Project is mapped to a Kubernetes namespace.
When starting a Kubeflow Notebook, you select a
Kubeflow configuration. A Kubeflow configuration allows you to inject additional settings into the notebook, such as environment variables. To use Kubeflow with Run:AI you will use configurations to inject:
- The name of the Run:AI project
- Allocation of a fraction of a GPU, if required
To use Run:AI with whole GPUs (no fractions), apply the following configuration:
apiVersion: kubeflow.org/v1alpha1 kind: PodDefault metadata: name: runai-non-fractional namespace: <KUBEFLOW-NAMESPACE> spec: desc: "Use Run:AI scheduler (whole GPUs)" env: - name: RUNAI_PROJECT value: "<PROJECT>" selector: matchLabels: runai-non-fractional: "true" # key must be identical to metadata.name
<KUBEFLOW-NAMESPACE> is the name of the namespace associated with the Kubeflow user and
<PROJECT> is the name of the Run:AI project.
Within the Kubeflow Notebook creation form, select the new configuration as well as the number of GPUs required.
The Kubeflow Notebook creation form only allows the selection of 1, 2, 4, or 8 GPUs. It is not possible to select a portion of a GPU (e.g. 0.5). As such, within the form, select
None in the GPU box together with the following configuration:
apiVersion: kubeflow.org/v1alpha1 kind: PodDefault metadata: name: runai-half-gpu namespace: <KUBEFLOW-NAMESPACE> spec: desc: "Allocate 0.5 GPUs via Run:AI scheduler" env: - name: RUNAI_PROJECT value: "<PROJECT>" - name: RUNAI_GPU_FRACTION value: "0.5" selector: matchLabels: runai-half-gpu: "true" # key must be identical to metadata.name
Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers.
As with Kubeflow Notebooks, the goal of this section is to run pipelines jobs within the context of Run:AI.
To create a Kubeflow pipeline, you:
- Write code using the Kubeflow Pipeline SDK.
- Package it into a single compressed file.
- Upload the file into Kubeflow and set it up.
The example code provided here shows how to augment pipeline code to use Run:AI
To the pipeline code add:
_training = training_op() ... _training.add_pod_label('runai', 'true') _training.add_pod_label('project', '<PROJECT>')
<Project> is the Run:AI project name. See example code here
Compile the code by running:
dsl-compile --py kubeflow-runai-one-gpu.py --output kubeflow-runai-one-gpu.tar.gz
To allocate half a GPU, add the following to the pipeline code:
_training = training_op() ... _training.add_pod_label('runai', 'true') _training.add_pod_label('project', '<PROJECT>') _training.add_pod_annotation('gpu-fraction', '0.5')
<Project> is the Run:AI project name. See example code here.
Compile the code as described above.