Skip to content

Self-Hosted installation over Kubernetes - Prerequisites

Before proceeding with this document, please review the installation types documentation to understand the difference between air-gapped and connected installations.

Run:ai Components

As part of the installation process you will install:

  • A control-plane managing cluster
  • One or more clusters

Both the control plane and clusters require Kubernetes. Typically the control plane and first cluster are installed on the same Kubernetes cluster but this is not a must.

Installer machine

The machine running the installation script (typically the Kubernetes master) must have:

  • At least 50GB of free space.
  • Docker installed.

Helm

Run:ai requires Helm 3.14 or later. To install Helm, see Installing Helm. If you are installing an air-gapped version of Run:ai, The Run:ai tar file contains the helm binary.

Cluster hardware requirements

The Run:ai control plane services require the following resources:

Component Required Capacity
CPU 10 cores
Memory 12GB
Disk space 110GB

If Run:ai cluster is planned to be installed on the same cluster as the Run:ai control plane: Ensure the control plane requirements are in addition to the Run:ai cluster hardware requirements.

ARM Limitation

The control plane does not support CPU nodes with ARM64k architecture. To schedule the Run:ai control plane services on supported nodes, use the global.affinity configuration paramter as detailed in Additional Run:ai configurations.

Run:ai software requirements

Cluster Nodes

See Run:ai Cluster prerequisites operating system requirements.

Nodes are required to be synchronized by time using NTP (Network Time Protocol) for proper system functionality.

Kubernetes

See Run:ai Cluster prerequisites Kubernetes distribution requirements.

The Run:ai control plane operating system prerequisites are identical.

The Run:ai control-plane requires a default storage class to create persistent volume claims for Run:ai storage. The storage class, as per Kubernetes standards, controls the reclaim behavior: whether the Run:ai persistent data is saved or deleted when the Run:ai control plane is deleted.

Note

For a simple (nonproduction) storage class example see Kubernetes Local Storage Class. The storage class will set the directory /opt/local-path-provisioner to be used across all nodes as the path for provisioning persistent volumes.

Then set the new storage class as default:

kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Install prerequisites

Ingress Controller

The Run:ai control plane installation assumes an existing installation of NGINX as the ingress controller. You can follow the Run:ai Cluster prerequisites Kubernetes ingress controller installation.

NVIDIA GPU Operator

See Run:ai Cluster prerequisites NVIDIA GPU operator requirements.

The Run:ai control plane, when installed without a Run:ai cluster, does not require the NVIDIA prerequisites.

Prometheus

See Run:ai Cluster prerequisites Prometheus requirements.

The Run:ai control plane, when installed without a Run:ai cluster, does not require the Prometheus prerequisites.

Inference (optional)

See Run:ai Cluster prerequisites Inference requirements.

The Run:ai control plane, when installed without a Run:ai cluster, does not require the Inference prerequisites.

External Postgres database (optional)

The Run:ai control plane installation includes a default PostgreSQL database. However, you may opt to use an existing PostgreSQL database if you have specific requirements or preferences. Please ensure that your PostgreSQL database is version 16 or higher.

Next steps

Continue to Preparing for a Run:ai Kubernetes Installation .