Skip to content

(Optional) Customize Cluster Installation

The Run:ai cluster creation wizard requires the download of a Helm values file runai-<cluster-name>.yaml. The file may be edited to customize the cluster installation.

Configuration Flags

Key Default Description
pspEnabled false Set to true when using PodSecurityPolicy
ingress-nginx.podSecurityPolicy.enabled Set to true when using PodSecurityPolicy
runai-operator.config.project-controller.createNamespaces true Set to falseif unwilling to provide Run:ai the ability to create namespaces. When set to false, will requires an additional manual step when creating new Run:ai Projects
runai-operator.config.project-controller.createRoleBindings true Set to false when using OpenShift. When set to false, will require an additional manual step when assigning users to Run:ai Projects
runai-operator.config.project-controller.clusterWideSecret true Set to false when using PodSecurityPolicy or OpenShift
runai-operator.config.mps-server.enabled false Set to true to allow the use of NVIDIA MPS. MPS is useful with Inference workloads
runai-operator.config.runai-container-toolkit.enabled true Controls the usage of Fractions. docker Defines the container runtime of the cluster (supports docker and containerd). Set to containerd when using Tanzu
runai-operator.config.nvidiaDcgmExporter.namespace gpu-operator The namespace where dcgm-exporter (or gpu-operator) was installed.
runai-operator.config.nvidiaDcgmExporter.installedFromGpuOperator true Indicated whether the dcgm-exporter was installed via gpu-operator or not.
gpu-feature-discovery.enabled true Set to false to not install GPU Feature Discovery (assumes a prior install outside Run:ai scope). Flag is only relevant to Run:ai version 2.4 or lower
kube-prometheus-stack.enabled true Set to false when the cluster has an existing Prometheus installation. that is not based the Prometheus operator . This setting requires Run:ai customer support.
kube-prometheus-stack.prometheusOperator.enabled true Set to false when the cluster has an existing Prometheus installation based on the Prometheus operator and Run:ai should use the existing one rather than install a new one

Feature Discovery

Not relevant

The Run:ai Cluster installation installs by default two pre-requisites: Kubernetes Node Feature Discovery (NFD) and NVIDIA GPU Feature Discovery (GFD).

  • If your Kubernetes cluster already has GFD installed, you will want to set gpu-feature-discovery.enabled to false.
  • NFD is a prerequisite of GFD. If GFD is not installed, but NFD is already installed, you can disable NFD installation by setting gpu-feature-discovery.nfd.deploy to false.


The Run:ai Cluster installation uses Promethues. There are 3 alternative configurations:

  1. (The default) Run:ai installs Prometheus.
  2. Run:ai uses an existing Prometheus installation based on the Prometheus operator.
  3. Run:ai uses an existing Prometheus installation based on a regular Prometheus installation.

For option 2, disable the flag kube-prometheus-stack.prometheusOperator.enabled. For option 3, please contact Run:ai Customer support.

Understanding Custom Access Roles

To review the access roles created by the Run:ai Cluster installation, see Understanding Access Roles

Manual Creation of Namespaces

Run:ai Projects are implemented as Kubernetes namespaces. By default, the administrator creates a new Project via the Administration user interface which then triggers the creation of a Kubernetes namespace named runai-<PROJECT-NAME>. There are a couple of use cases that customers will want to disable this feature:

  • Some organizations prefer to use their internal naming convention for Kubernetes namespaces, rather than Run:ai's default runai-<PROJECT-NAME> convention.
  • When PodSecurityPolicy is enabled, some organizations will not allow Run:ai to automatically create Kubernetes namespaces.

Follow the following process to achieve this

  1. Disable the namespace creation functionality. See the runai-operator.config.project-controller.createNamespaces flag above.
  2. Create a Project using the Run:ai User Interface.
  3. Create the namespace if needed by running: kubectl create ns <NAMESPACE>. The suggested Run:ai default is runai-<PROJECT-NAME>.
  4. Label the namespace to connect it to the Run:ai Project by running kubectl label ns <NAMESPACE> runai/queue=<PROJECT_NAME>

where <PROJECT_NAME> is the name of the project you have created in the Run:ai user interface above and <NAMESPACE> is the name you chose for your namespace.

Last update: April 17, 2022