(Optional) Customize Cluster Installation¶
The Run:ai cluster creation wizard requires the download of a Helm values file runai-<cluster-name>.yaml
. The file may be edited to customize the cluster installation.
Configuration Flags¶
Key | Default | Description |
---|---|---|
runai-operator.config.project-controller.createNamespaces | true | Set to false if unwilling to provide Run:ai the ability to create namespaces. When set to false, will requires an additional manual step when creating new Run:ai Projects |
runai-operator.config.project-controller.clusterWideSecret | true | Set to false when using PodSecurityPolicy or OpenShift |
runai-operator.config.mps-server.enabled | false | Set to true to allow the use of NVIDIA MPS. MPS is useful with Inference workloads |
runai-operator.config.global.runtime | docker | Defines the container runtime of the cluster (supports docker and containerd ). Set to containerd when using Tanzu |
runai-operator.config.global.nvidiaDcgmExporter.namespace | gpu-operator | The namespace where dcgm-exporter (or gpu-operator) was installed |
runai-operator.config.global.nvidiaDcgmExporter.installedFromGpuOperator | true | Indicated whether the dcgm-exporter was installed via gpu-operator or not |
spec.prometheus.spec.retention | 2h | The interval of time where Prometheus will save Run:ai metrics. Promethues is only used as an intermediary to another metrics storage facility and metrics are typically moved within tens of seconds, so changing this setting is mostly for debugging purposes. |
spec.prometheus.spec.retentionSize | Not set | The amount of storage allocated for metrics by Prometheus. For more information see Prometheus Storage. |
spec.prometheus.spec.imagePullSecrets | Not set | An optional list of references to secrets in the runai namespace to use for pulling Prometheus images (relevant for air-gapped installations). |
Understanding Custom Access Roles¶
To review the access roles created by the Run:ai Cluster installation, see Understanding Access Roles.
Manual Creation of Namespaces¶
Run:ai Projects are implemented as Kubernetes namespaces. By default, the administrator creates a new Project via the Administration user interface which then triggers the creation of a Kubernetes namespace named runai-<PROJECT-NAME>
. There are a couple of use cases that customers will want to disable this feature:
- Some organizations prefer to use their internal naming convention for Kubernetes namespaces, rather than Run:ai's default
runai-<PROJECT-NAME>
convention. - Some organizations will not allow Run:ai to automatically create Kubernetes namespaces.
Follow these steps to achieve this:
- Disable the namespace creation functionality. See the
runai-operator.config.project-controller.createNamespaces
flag above. - Create a Project using the Run:ai User Interface.
- Create the namespace if needed by running:
kubectl create ns <NAMESPACE>
. The suggested Run:ai default isrunai-<PROJECT-NAME>
. - Label the namespace to connect it to the Run:ai Project by running
kubectl label ns <NAMESPACE> runai/queue=<PROJECT_NAME>
, where<PROJECT_NAME>
is the name of the project you have created in the Run:ai user interface above and<NAMESPACE>
is the name you chose for your namespace.