Prerequisites
Before proceeding with this document, please review the installation types documentation to understand the difference between air-gapped and connected installations.
Control-plane and clusters¶
As part of the installation process you will install:
- A control-plane managing cluster
- One or more clusters
Both the control plane and clusters require Kubernetes. Typically the control plane and first cluster are installed on the same Kubernetes cluster but this is not a must.
Important
In OpenShift environments, adding a cluster connecting to a remote control plane currently requires the assistance of customer support.
Hardware Requirements¶
See Cluster prerequisites hardware requirements.
Run:ai Software¶
You should receive a file: runai-gcr-secret.yaml
from Run:ai Customer Support. The file provides access to the Run:ai Container registry.
You should receive a single file runai-<version>.tar
from Run:ai customer support
Run:ai Software Prerequisites¶
Operating System¶
OpenShift has specific operating system requirements that can be found in the RedHat documentation.
OpenShift¶
Run:ai supports OpenShift. Supported versions are 4.8 through 4.11.
- OpenShift must be configured with a trusted certificate. Run:ai installation relies on OpenShift to create certificates for subdomains.
- OpenShift must have a configured identity provider.
- OpenShift must have Entitlement. Entitlement is the RedHat OpenShift licensing mechanism. Without entitlement, you will not be able to install the NVIDIA drivers used by the GPU Operator. For further information see here, or the equivalent NVIDIA documentation. Entitlement is not required anymore if you are using OpenShift 4.9.9 or above
NVIDIA Prerequisites¶
See Run:ai Cluster prerequisites installing NVIDIA dependencies in OpenShift.
The Run:ai control plane, when installed without a Run:ai cluster, does not require the NVIDIA prerequisites.
Information on how to download the GPU Operator for air-gapped installation can be found in the NVIDIA GPU Operator pre-requisites.
(Optional) Inference Prerequisites¶
See Run:ai Cluster prerequisites Inference requirements.
The Run:ai control plane, when installed without a Run:ai cluster, does not require the Inference prerequisites.
Installer Machine¶
The machine running the installation script (typically the Kubernetes master) must have:
- At least 50GB of free space.
- Docker installed.
Other¶
- (Airgapped installation only) Private Docker Registry. Run:ai assumes the existence of a Docker registry for images. Most likely installed within the organization. The installation requires the network address and port for the registry (referenced below as
<REGISTRY_URL>
).
Pre-install Script¶
Once you believe that the Run:ai prerequisites are met, we highly recommend installing and running the Run:ai pre-install diagnostics script. The tool:
- Tests the below requirements as well as additional failure points related to Kubernetes, NVIDIA, storage, and networking.
- Looks at additional components installed and analyzes their relevancy to a successful Run:ai installation.
To use the script download the latest version of the script and run:
If the script fails, or if the script succeeds but the Kubernetes system contains components other than Run:ai, locate the file runai-preinstall-diagnostics.txt
in the current directory and send it to Run:ai technical support.
For more information on the script including additional command-line flags, see here.
Created: 2023-03-26