Skip to content

(Optional) Customize Cluster Installation

The Run:AI Admin UI cluster creation wizard requires the download of a Helm values file runai-<cluster-name>.yaml. The file may be edited to customize the cluster installation.

Configuration Flags

Key Default Description
pspEnabled false Set to true when using PodSecurityPolicy
runai-operator.config.project-controller.createNamespaces true Set to falseif unwilling to provide Run:AI the ability to create namespaces. When set to false, will requires an additional manual step when creating new Run:AI Projects
runai-operator.config.project-controller.createRoleBindings true Set to false when using OpenShift. When set to false, will require an additional manual step when assigning users to Run:AI Projects
runai-operator.config.project-controller.clusterWideSecret true Set to false when using PodSecurityPolicy or OpenShift
runai-operator.config.mps-server.enabled false Set to true to allow the use of NVIDIA MPS. MPS is useful with Inference workloads
runai-operator.config.runai-container-toolkit.enabled true Controls the usage of Fractions.
gpu-feature-discovery.enabled true Set to false to not install GPU Feature Discovery (assumes a prior install outside Run:AI scope)
kube-prometheus-stack.enabled true Set to false when the cluster has an existing Prometheus installation. that is not based the Prometheus operator . This setting requires Run:AI customer support.
kube-prometheus-stack.prometheusOperator.enabled true Set to false when the cluster has an existing Prometheus installation based on the Prometheus operator and Run:AI should use the existing one rather than install a new one

Feature Discovery

The Run:AI Cluster installation installs by default two pre-requisites: Kubernetes Node Feature Discovery (NFD) and NVIDIA GPU Feature Discovery (GFD).

  • If your Kubernetes cluster already has GFD installed, you will want to set gpu-feature-discovery.enabled to false.
  • NFD is a prerequisite of GFD. If GFD is not installed, but NFD is already installed, you can disable NFD installation by setting gpu-feature-discovery.nfd.deploy to false.

Prometheus

The Run:AI Cluster installation uses Promethues. There are 3 alternative configurations:

  1. (The default) Run:AI installs Prometheus.
  2. Run:AI uses an existing Prometheus installation based on the Prometheus operator.
  3. Run:AI uses an existing Prometheus installation based on a regular Prometheus installation.

For option 2, disable the flag kube-prometheus-stack.prometheusOperator.enabled. For option 3, please contact Run:AI Customer support.

Understanding Custom Access Roles

To review the access roles created by the Run:AI Cluster installation, see Understanding Access Roles

Manual Creation of Namespaces

Run:AI Projects are implemented as Kubernetes namespaces. By default, the administrator creates a new Project via the Administration user interface which then triggers the creation of a Kubernetes namespace named runai-<PROJECT-NAME>. There are a couple of use cases that customers will want to disable this feature:

  • Some organizations prefer to use their internal naming convention for Kubernetes namespaces, rather than Run:AI's default runai-<PROJECT-NAME> convention.
  • When PodSecurityPolicy is enabled, some organizations will not allow Run:AI to automatically create Kubernetes namespaces.

Follow the following process to achieve this

  1. Disable the namespace creation functionality. See the runai-operator.config.project-controller.createNamespaces flag above.
  2. Create a Project using the Administrator User Interface.
  3. Create the namespace if needed by running: kubectl create ns <NAMESPACE>. The suggested Run:AI default is runai-<PROJECT-NAME>.
  4. Label the namespace to connect it to the Run:AI Project by running kubectl label ns <NAMESPACE> runai/queue=<PROJECT_NAME>

where <PROJECT_NAME> is the name of the project you have created in the Administrator UI above and <NAMESPACE> is the name you chose for your namespace.


Last update: November 17, 2021