Skip to content

(Optional) Customize Cluster Installation

The Run:AI Admin UI cluster creation wizard requires the download of a Helm values file runai-<cluster-name>.yaml. The file may be edited to customize the cluster installation.

Configuration Flags

Key Default Description
runai-operator.config.global.openshift false Set to true with OpenShift
runai-operator.config.init-ca.enabled true Set to false with OpenShift
pspEnabled false Set to true when using PodSecurityPolicy
runai-operator.config.project-controller.createNamespaces true Set to falseif unwilling to provide Run:AI the ability to create namespaces. When set to false, will requires an additional manual step when creating new Run:AI Projects
runai-operator.config.project-controller.createRoleBindings true Set to false when using OpenShift. When set to false, will require an additional manual step when assigning users to Run:AI Projects
runai-operator.config.project-controller.clusterWideSecret true Set to false when using PodSecurityPolicy or OpenShift
runai-operator.config.mps-server.enabled false Set to true to allow the use of NVIDIA MPS. MPS is useful with Inference workloads
runai-operator.config.runai-container-toolkit.enabled true Controls the usage of Fractions.
gpu-feature-discovery.enabled true Set to false to not install GPU Feature Discovery (assumes a prior install outside Run:AI scope)
kube-prometheus-stack.enabled true Set to false to not install Prometheus (assumes a prior install outside Run:AI scope). Requires additional configuration of Prometheus to add Run:AI related exporter rules

Feature Discovery

The Run:AI Cluster installation installs by default two pre-requisites: Kubernetes Node Feature Discovery (NFD) and NVIDIA GPU Feature Discovery (GFD).

  • If your Kubernetes cluster already has GFD installed, you will want to set gpu-feature-discovery.enabled to false.
  • NFD is a prerequisite of GFD. If GFD is not installed, but NFD is already installed, you can disable NFD installation by setting gpu-feature-discovery.nfd.deploy to false.

Prometheus

The Run:AI Cluster installation installs Prometheus by default. If your Kubernetes cluster already has Prometheus installed, set kube-prometheus-stack.enabled to false.

When choosing false, an extra configuration step will be required to add the Run:AI Prometheus rules and to push metrics to the Run:AI Administration User Interface. Please contact Run:AI Customer support.

Understanding Custom Access Roles

To review the access roles created by the Run:AI Cluster installation, see Understanding Access Roles


Last update: May 14, 2021