Skip to content


Below are the prerequisites of a cluster installed with Run:AI.

Software Requirements


Run:AI requires Kubernetes 1.19 or above. Kubernetes 1.21 is recommended (as of July 2021). Kubernetes 1.22 is not supported.

If you are using RedHat OpenShift. The minimal version is OpenShift 4.6.

Run:AI Supports Kubernetes Pod Security Policy if used.


Kubernetes networking is an add-on rather than a core part of Kubernetes. Different add-ons have different network requirements. You should consult the documentation of the specific add-on on which ports to open. It is however important to note that unless special provisions are made, Kubernetes assumes all cluster nodes can interconnect using all ports.


Run:AI requires the installation of NVIDIA software. These can be done in one of two ways. Installing the GPU Operator or installing NVIDIA software on each node. For more information, see the Cluster Installation documentation.


Run:AI requires Prometheus. The Run:AI Cluster installation will, by default, install Prometheus, but it can also connect to an existing Prometheus installed by the organization. In the latter case, it's important to:

Hardware Requirements

(see picture below)

  • (Production only) Run:AI System Nodes: To reduce downtime and save CPU cycles on expensive GPU Machines, we recommend that production deployments will contain two or more worker machines, designated for Run:AI Software. The nodes do not have to be dedicated to Run:AI, but for Run:AI purposes we would need:

    • 4 CPUs
    • 8GB of RAM
    • 50GB of Disk space
  • Shared data volume: Run:AI uses Kubernetes to abstract away the machine on which a container is running:

    • Researcher containers: The Researcher's containers need to be able to access data from any machine in a uniform way, to access training data and code as well as save checkpoints, weights, and other machine-learning-related artifacts.
    • The Run:AI system needs to save data on a storage device that is not dependent on a specific node.

    Typically, this is achieved via Network File Storage (NFS) or Network-attached storage (NAS). NFS is usually the preferred method for Researchers which may require multi-read/write capabilities.

  • Docker Registry With Run:AI, Workloads are based on Docker images. For container images to run on any machine, these images must be downloaded from a docker registry rather than reside on the local machine (though this also is possible). You can use a public registry such as docker hub or set up a local registry on-prem (preferably on a dedicated machine). Run:AI can assist with setting up the repository.

  • Kubernetes: Though out of scope for this document, Production Kubernetes installation requires separate nodes for the Kubernetes master.


User requirements

Usage of containers and images: The individual Researcher's work should be based on container images.

Network Requirements

Run:AI user interface runs from the cloud. All container nodes must be able to connect to the Run:AI cloud. Inbound connectivity (connecting from the cloud into nodes) is not required. If outbound connectivity is proxied/limited, the following exceptions should be applied:

During Installation

Run:AI requires an installation over the Kubernetes cluster. The installation access the web to download various images and registries. Some organizations place limitations on what you can pull from the internet. The following list shows the various solution components and their origin:

Name Description URLs Ports

Run:AI Repository

The Run:AI Package Repository is hosted on Run:AI’s account on Google Cloud


Docker Images Repository

Various Run:AI images


Docker Images Repository

Various third party Images


Post Installation

In addition, once running, Run:AI will send metrics to two sources:

Name Description URLs Ports


Grafana Metrics Server



Run:AI Cloud instance



Authentication Provider


Last update: September 30, 2021