Skip to content

Prerequisites

Before proceeding with this document, please review the installation types documentation to understand the difference between air-gapped and connected installations.

Hardware Requirements

(Production only) Run:AI System Nodes: To reduce downtime and save CPU cycles on expensive GPU Machines, we recommend that production deployments will contain two or more worker machines, designated for Run:AI Software. The nodes do not have to be dedicated to Run:AI, but for Run:AI purposes we would need:

  • 4 CPUs
  • 8GB of RAM
  • 120GB of Disk space

The backend installation of Run:AI will require the configuration of Kubernetes Persistent Volumes of a total size of 110GB.

Run:AI Software Prerequisites

You should receive a single file runai-<version>.tar from Run:AI customer support

You should receive a file: runai-gcr-secret.yaml from Run:AI Customer Support. The file provides access to the Run:AI Container registry.

Kubernetes

Run:AI requires Kubernetes 1.19 or above. Kubernetes 1.21 is recommended (as of September 2021). Kubernetes 1.22 is not supported.

If you are using OpenShift, please refer to our OpenShift installation instructions.

Run:AI Supports Kubernetes Pod Security Policy if used.

NVIDIA Prerequisites

Run:AI requires the installation of NVIDIA software. These can be done in one of two ways:

  • (Recommended) Use the NVIDIA GPU Operator on Kubernetes. To install the NVIDIA GPU Operator use the Getting Started guide. Follow the Helm based Installation.
  • For each GPU node in the cluster, install NVIDIA CUDA Drivers, as well as the software stack, described here

Kubernetes Dependencies

Prometheus

The Run:AI Cluster installation installs Prometheus. However, it can also connect to an existing Prometheus installed by the organization. In the latter case, it's important to:

Feature Discovery

The Run:AI Cluster installation installs Kubernetes Node Feature Discovery (NFD) and NVIDIA GPU Feature Discovery (GFD). If your cluster has these dependencies already installed, you can use installation flags to prevent Run:AI from installing these dependencies.

Network

  • Shared Storage. Network address and a path to a folder in a Network File System
  • All Kubernetes cluster nodes should be able to mount NFS folders. Usually, this requires the installation of the nfs-common package on all machines (sudo apt install nfs-common or similar)
  • IP Address. An available, internal IP Address that is accessible from Run:AI Users' machines (referenced below as <RUNAI_IP_ADDRESS>)
  • DNS entry Create a DNS A record such as runai.<company-name> or similar. The A record should point to <RUNAI_IP_ADDRESS>
  • A certificate for the endpoint. The certificate(s) must be signed by the organization's root CA.

Installer Machine

The machine running the installation script (typically the Kubernetes master) must have:

  • At least 50GB of free space.
  • Docker installed.

Other

  • (Airgapped installation only) Private Docker Registry. Run:AI assumes the existence of a Docker registry for images. Most likely installed within the organization. The installation requires the network address and port for the registry (referenced below as <REGISTRY_URL>).
  • (Optional) SAML Integration.

Last update: October 10, 2021