(Optional) Customize Cluster Installation¶

The Run:ai cluster creation wizard requires the download of a Helm values file runai-<cluster-name>.yaml. The file may be edited to customize the cluster installation.

Configuration Flags¶

Key	Default	Description
`runai-operator.config.project-controller.createNamespaces`	`true`	Set to `false`if unwilling to provide Run:ai the ability to create namespaces. When set to false, will requires an additional manual step when creating new Run:ai Projects
`runai-operator.config.project-controller.clusterWideSecret`	`true`	Set to `false` when using PodSecurityPolicy or OpenShift
`runai-operator.config.mps-server.enabled`	`false`	Set to `true` to allow the use of NVIDIA MPS. MPS is useful with Inference workloads
`runai-operator.config.global.runtime`	`docker`	Defines the container runtime of the cluster (supports `docker` and `containerd`). Set to `containerd` when using Tanzu
`runai-operator.config.global.nvidiaDcgmExporter.namespace`	`gpu-operator`	The namespace where dcgm-exporter (or gpu-operator) was installed
`runai-operator.config.global.nvidiaDcgmExporter.installedFromGpuOperator`	`true`	Indicated whether the dcgm-exporter was installed via gpu-operator or not
`spec.prometheus.spec.retention`	2h	The interval of time where Prometheus will save Run:ai metrics. Promethues is only used as an intermediary to another metrics storage facility and metrics are typically moved within tens of seconds, so changing this setting is mostly for debugging purposes.
`spec.prometheus.spec.retentionSize`	Not set	The amount of storage allocated for metrics by Prometheus. For more information see Prometheus Storage.
`spec.prometheus.spec.imagePullSecrets`	Not set	An optional list of references to secrets in the runai namespace to use for pulling Prometheus images (relevant for air-gapped installations).

Understanding Custom Access Roles¶

To review the access roles created by the Run:ai Cluster installation, see Understanding Access Roles.

Manual Creation of Namespaces¶

Run:ai Projects are implemented as Kubernetes namespaces. By default, the administrator creates a new Project via the Administration user interface which then triggers the creation of a Kubernetes namespace named runai-<PROJECT-NAME>. There are a couple of use cases that customers will want to disable this feature:

Some organizations prefer to use their internal naming convention for Kubernetes namespaces, rather than Run:ai's default runai-<PROJECT-NAME> convention.
Some organizations will not allow Run:ai to automatically create Kubernetes namespaces.

Follow these steps to achieve this:

Disable the namespace creation functionality. See the runai-operator.config.project-controller.createNamespaces flag above.
Create a Project using the Run:ai User Interface.
Create the namespace if needed by running: kubectl create ns <NAMESPACE>. The suggested Run:ai default is runai-<PROJECT-NAME>.
Label the namespace to connect it to the Run:ai Project by running kubectl label ns <NAMESPACE> runai/queue=<PROJECT_NAME>, where <PROJECT_NAME> is the name of the project you have created in the Run:ai user interface above and <NAMESPACE> is the name you chose for your namespace.