Preparing for a Run:ai OpenShift Installation¶
The following section provides IT with the information needed to prepare for a Run:ai installation. This includes Third-party dependencies which must be met as well as access control that must be granted for Run:ai components.
Create OpenShift Projects¶
Run:ai uses three projects. One for the control plane (
runai-backend) and two for the cluster itself (
Prepare Run:ai Installation Artifacts¶
Run:ai Software Files¶
SSH into a node with
oc access (
oc is the OpenShift command-line) to the cluster and
Run:ai Administration CLI¶
If helm v3 does not yet exist on the machine, install it now:
See https://helm.sh/docs/intro/install/ on how to install Helm. Run:ai works with Helm version 3 only (not helm 2).
Mark Run:ai System Workers¶
The Run:ai Control plane should be installed on a set of dedicated Run:ai system worker nodes rather than GPU worker nodes. To set system worker nodes run:
To avoid single-point-of-failure issues, we recommend assigning more than one node in production environments.
Do not select the Kubernetes master as a runai-system node. This may cause Kubernetes to stop working (specifically if Kubernetes API Server is configured on 443 instead of the default 6443).
Install NVIDIA Dependencies¶
You must have Cluster Administrator rights to install these dependencies.
Before installing Run:ai, you must install NVIDIA software on your OpenShift cluster to enable GPUs. NVIDIA has provided detailed documentation. Follow the instructions to install the two operators
Node Feature Discovery and
NVIDIA GPU Operator from the OpenShift web console.
When done, verify that the GPU Operator is installed by running:
(the GPU Operator namespace may differ in different operator versions).
As part of the installation, you will be required to install the Control plane and Cluster Helm Charts. The Helm Charts require Kubernetes administrator permissions. You can review the exact permissions provided by using the
--dry-run on both helm charts.
Continue with installing the Run:ai Control Plane.