Install a Cluster
You must have Cluster Administrator rights to install these dependencies.
Before installing Run:ai, you must install NVIDIA software on your OpenShift cluster to enable GPUs. NVIDIA has provided detailed documentation. Follow the instructions to install the two operators
Node Feature Discovery and
NVIDIA GPU Operator from the OpenShift web console.
When done, verify that the GPU Operator is installed by running:
(the GPU Operator namespace may differ in different operator versions).
Create OpenShift Projects¶
Run:ai cluster installation uses several namespaces (or projects in OpenShift terminology). Run the following:
The last namespace (
runai-scale-adjust) is only required if the cluster is a cloud cluster and is configured for auto-scaling.
Perform the cluster installation instructions explained here. When creating a new cluster, select the OpenShift target platform.
To install a specific version, add
--version <version> to the install command. You can find available versions by running
helm search repo -l runai-cluster.
Perform the cluster installation instructions explained here. When creating a new cluster, select the OpenShift target platform. The
helm install command should use the
runai-cluster tar file:
Make the following changes to the configuration file you have downloaded:
| || ||Set to |
| || ||Set to |
| ||Default is ||Allow the use of NVIDIA MPS. MPS is useful with Inference workloads. Requires extra permissions|
| || ||Controls the usage of Fractions. Requires extra cluster permissions|
| ||Default password already set||[email protected] password. Need to change only if you have changed the password here|
--dry-run flag to gain an understanding of what is being installed before the actual installation. For more details see understanding cluster access roles.
(Optional) Prometheus Adapter for Inference¶
The Prometheus adapter is required if you are using Inference workloads and require a custom metric for autoscaling. The following additional steps are required for it to work:
openshift-monitoringnamespace to the
kubectl get cm prometheus-adapter-prometheus-config --namespace=openshift-monitoring -o yaml \ | sed 's/namespace: openshift-monitoring/namespace: monitoring/' \ | kubectl create -f - kubectl get cm serving-certs-ca-bundle --namespace=openshift-monitoring -o yaml \ | sed 's/namespace: openshift-monitoring/namespace: monitoring/' \ | kubectl create -f -
- Allow Prometheus Adapter
serviceaccountto create a
SecurityContextwith RunAsUser 10001:
Continue to create Run:ai Projects.