Skip to content

Prerequisites

Before proceeding with this document, please review the installation types documentation to understand the difference between air-gapped and connected installations.

Hardware Requirements

(Production only) Run:ai System Nodes: To reduce downtime and save CPU cycles on expensive GPU Machines, we recommend that production deployments will contain two or more worker machines, designated for Run:ai Software. The nodes do not have to be dedicated to Run:ai, but for Run:ai purposes we would need:

  • 4 CPUs
  • 8GB of RAM
  • 120GB of Disk space

The control plane installation of Run:ai requires the configuration of Kubernetes Persistent Volumes of a total size of 110GB.

Run:ai Software Prerequisites

You should receive a file: runai-gcr-secret.yaml from Run:ai Customer Support. The file provides access to the Run:ai Container registry.

You should receive a single file runai-<version>.tar from Run:ai customer support

Kubernetes

Run:ai requires Kubernetes. Supported versions are 1.21 through 1.24. Kubernetes 1.25 is not yet supported.

If you are using OpenShift, please refer to our OpenShift installation instructions.

Run:ai Supports Kubernetes Pod Security Policy if used.

NVIDIA Prerequisites

Run:ai requires the installation of NVIDIA software. See installation details here

Kubernetes Dependencies

Prometheus

The Run:ai Cluster installation installs Prometheus. However, it can also connect to an existing Prometheus installed by the organization. In the latter case, it's important to:

Network

  • Shared Storage. Network address and a path to a folder in a Network File System
  • All Kubernetes cluster nodes should be able to mount NFS folders. Usually, this requires the installation of the nfs-common package on all machines (sudo apt install nfs-common or similar)
  • IP Address. An available, internal IP Address that is accessible from Run:ai Users' machines (referenced below as <RUNAI_IP_ADDRESS>)
  • DNS entry Create a DNS A record such as runai.<company-name> or similar. The A record should point to <RUNAI_IP_ADDRESS>
  • A certificate for the endpoint. The certificate(s) must be signed by the organization's root CA.

Installer Machine

The machine running the installation script (typically the Kubernetes master) must have:

  • At least 50GB of free space.
  • Docker installed.

Other

  • (Airgapped installation only) Private Docker Registry. Run:ai assumes the existence of a Docker registry for images. Most likely installed within the organization. The installation requires the network address and port for the registry (referenced below as <REGISTRY_URL>).
  • (Optional) SAML Integration.

Pre-install Script

Once you believe that the Run:ai prerequisites are met, we highly recommend installing and running the Run:ai pre-install diagnostics script. The tool:

  • Tests the below requirements as well as additional failure points related to Kubernetes, NVIDIA, storage, and networking.
  • Looks at additional components installed and analyzes their relevancy to a successful Run:ai installation.

To use the script download the latest version of the script and run:

chmod +x preinstall-diagnostics-<platform>
./preinstall-diagnostics-<platform> --domain <dns-entry>

If the script fails, or if the script succeeds but the Kubernetes system contains components other than Run:ai, locate the file runai-preinstall-diagnostics.txt in the current directory and send it to Run:ai technical support.

For more information on the script including additional command-line flags, see here.


Last update: 2022-11-21
Created: 2021-08-03