Skip to content

Prerequisites

Before proceeding with this document, please review the installation types documentation to understand the difference between air-gapped and connected installations.

Control-plane and clusters

As part of the installation process you will install:

  • A control-plane managing cluster
  • One or more Run:ai clusters

Both the control plane and clusters require Kubernetes. Typically the control plane and first cluster are installed on the same Kubernetes cluster but this is not a must.

Hardware Requirements

See Cluster prerequisites hardware requirements.

In addition, the control plane installation of Run:ai requires the configuration of Kubernetes Persistent Volumes of a total size of 110GB.

Run:ai Software

You should receive a file: runai-gcr-secret.yaml from Run:ai Customer Support. The file provides access to the Run:ai Container registry.

You should receive a single file runai-air-gapped-<version>.tar.gz from Run:ai customer support

Run:ai Software Prerequisites

Operating System

See Run:ai Cluster prerequisites operating system requirements.

The Run:ai control plane operating system prerequisites are identical.

Kubernetes

See Run:ai Cluster prerequisites Kubernetes requirements.

The Run:ai control plane operating system prerequisites are identical.

The Run:ai control-plane requires a default storage class to create persistent volume claims for Run:ai storage. The storage class, as per Kubernetes standards, controls the reclaim behavior: whether the Run:ai persistent data is saved or deleted when the Run:ai control plane is deleted.

Note

For a simple (nonproduction) storage class example see Kubernetes Local Storage Class. The storage class will set the directory /opt/local-path-provisioner to be used across all nodes as the path for provisioning persistent volumes.

Then set the new storage class as default:

kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

(Air-gapped only) Local Certificate Authority

In Air-gapped environments, you must prepare the public key of your local certificate authority as described here. It will need to be installed in Kubernetes for the installation to succeed.

NVIDIA Prerequisites

See Run:ai Cluster prerequisites NVIDIA requirements.

The Run:ai control plane, when installed without a Run:ai cluster, does not require the NVIDIA prerequisites.

Prometheus Prerequisites

See Run:ai Cluster prerequisites Prometheus requirements.

The Run:ai control plane, when installed without a Run:ai cluster, does not require the Prometheus prerequisites.

(Optional) Inference Prerequisites

See Run:ai Cluster prerequisites Inference requirements.

The Run:ai control plane, when installed without a Run:ai cluster, does not require the Inference prerequisites.

Helm

Run:ai requires Helm 3.10 or later. To install Helm, see https://helm.sh/docs/intro/install/. If you are installing an air-gapped version of Run:ai, The Run:ai tar file contains the helm binary.

Network Requirements

Ingress Controller

The Run:ai control plane installation assumes an existing installation of NGINX as the ingress controller. You can follow the Run:ai Cluster prerequisites ingress controller installation.

Domain name

The Run:ai control plane requires a domain name (FQDN). You must supply a domain name as well as a trusted certificate for that domain.

  • When installing the first Run:ai cluster on the same Kubernetes cluster as the control plane, the Run:ai cluster URL will be the same as the control-plane URL.
  • When installing the Run:ai cluster on a separate Kubernetes cluster, follow the Run:ai domain name requirements.
  • If your network is air-gapped, you will need to provide the Run:ai control-plane and cluster with information about the local certificate authority.

Installer Machine

The machine running the installation script (typically the Kubernetes master) must have:

  • At least 50GB of free space.
  • Docker installed.

Other

  • (Airgapped installation only) Private Docker Registry. Run:ai assumes the existence of a Docker registry for images. Most likely installed within the organization. The installation requires the network address and port for the registry (referenced below as <REGISTRY_URL>).
  • (Optional) SAML Integration as described under single sign-on.

Pre-install Script

Once you believe that the Run:ai prerequisites are met, we highly recommend installing and running the Run:ai pre-install diagnostics script. The tool:

  • Tests the below requirements as well as additional failure points related to Kubernetes, NVIDIA, storage, and networking.
  • Looks at additional components installed and analyze their relevance to a successful Run:ai installation.

To use the script download the latest version of the script and run:

chmod +x preinstall-diagnostics-<platform>
./preinstall-diagnostics-<platform> --domain <dns-entry>

If the script fails, or if the script succeeds but the Kubernetes system contains components other than Run:ai, locate the file runai-preinstall-diagnostics.txt in the current directory and send it to Run:ai technical support.

For more information on the script including additional command-line flags, see here.