Skip to content

Prerequisites

Before proceeding with this document, please review the installation types documentation to understand the difference between air-gapped and connected installations.

Control-plane and clusters

As part of the installation process you will install:

  • A control-plane managing cluster
  • One or more Run:ai clusters

Both the control plane and clusters require Kubernetes. Typically the control plane and first cluster are installed on the same Kubernetes cluster but this is not a must.

Hardware Requirements

See Cluster prerequisites hardware requirements.

In addition, the control plane installation of Run:ai requires the configuration of Kubernetes Persistent Volumes of a total size of 110GB.

Run:ai Software

You should receive a file: runai-gcr-secret.yaml from Run:ai Customer Support. The file provides access to the Run:ai Container registry.

You should receive a single file runai-air-gapped-<version>.tar.gz from Run:ai customer support

Run:ai Software Prerequisites

Operating System

See Run:ai Cluster prerequisites operating system requirements.

The Run:ai control plane operating system prerequisites are identical.

Kubernetes

See Run:ai Cluster prerequisites Kubernetes requirements.

The Run:ai control plane operating system prerequisites are identical.

The Run:ai control-plane requires a default storage class to create persistent volume claims for Run:ai storage. The storage class, as per Kubernetes standards, controls the reclaim behavior: whether the Run:ai persistent data is saved or deleted when the Run:ai control plane is deleted.

Note

For a simple (nonproduction) storage class example see Kubernetes Local Storage Class. The storage class will set the directory /opt/local-path-provisioner to be used across all nodes as the path for provisioning persistent volumes.

Then set the new storage class as default:

kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

NVIDIA Prerequisites

See Run:ai Cluster prerequisites NVIDIA requirements.

The Run:ai control plane, when installed without a Run:ai cluster, does not require the NVIDIA prerequisites.

Prometheus Prerequisites

See Run:ai Cluster prerequisites Prometheus requirements.

The Run:ai control plane, when installed without a Run:ai cluster, does not require the Prometheus prerequisites.

(Optional) Inference Prerequisites

See Run:ai Cluster prerequisites Inference requirements.

The Run:ai control plane, when installed without a Run:ai cluster, does not require the Inference prerequisites.

Helm

Run:ai requires Helm 3.10 or later. To install Helm, see https://helm.sh/docs/intro/install/. If you are installing an air-gapped version of Run:ai, The Run:ai tar file contains the helm binary.

Network Requirements

Ingress Controller

The Run:ai control plane installation assumes an existing installation of NGINX as the ingress controller. You can follow the Run:ai Cluster prerequisites ingress controller installation.

Domain name

The Run:ai control plane requires a domain name (FQDN). You must supply a domain name as well as a trusted certificate for that domain.

  • When installing the first Run:ai cluster on the same Kubernetes cluster as the control plane, the Run:ai cluster URL will be the same as the control-plane URL.
  • When installing the Run:ai cluster on a separate Kubernetes cluster, follow the Run:ai domain name requirements.

Installer Machine

The machine running the installation script (typically the Kubernetes master) must have:

  • At least 50GB of free space.
  • Docker installed.

Other

  • (Airgapped installation only) Private Docker Registry. Run:ai assumes the existence of a Docker registry for images. Most likely installed within the organization. The installation requires the network address and port for the registry (referenced below as <REGISTRY_URL>).
  • (Optional) SAML Integration as described under single sign-on.

Pre-install Script

Once you believe that the Run:ai prerequisites are met, we highly recommend installing and running the Run:ai pre-install diagnostics script. The tool:

  • Tests the below requirements as well as additional failure points related to Kubernetes, NVIDIA, storage, and networking.
  • Looks at additional components installed and analyze their relevance to a successful Run:ai installation.

To use the script download the latest version of the script and run:

chmod +x preinstall-diagnostics-<platform>
./preinstall-diagnostics-<platform> --domain <dns-entry>

If the script fails, or if the script succeeds but the Kubernetes system contains components other than Run:ai, locate the file runai-preinstall-diagnostics.txt in the current directory and send it to Run:ai technical support.

For more information on the script including additional command-line flags, see here.