Walk-through: Launch Workloads with GPU Fractions¶
Run:AI provides a Fractional GPU sharing system for containerized workloads on Kubernetes. The system supports workloads running CUDA programs and is especially suited for lightweight AI tasks such as inference and model building. The fractional GPU system transparently gives data science and AI engineering teams the ability to run multiple workloads simultaneously on a single GPU, enabling companies to run more workloads such as computer vision, voice recognition and natural language processing on the same hardware, lowering costs.
Run:AI’s fractional GPU system effectively creates virtualized logical GPUs, with their own memory and computing space that containers can use and access as if they were self-contained processors. This enables several workloads to run in containers side-by-side on the same GPU without interfering with each other. The solution is transparent, simple, and portable; it requires no changes to the containers themselves.
A typical use-case could see 2-8 jobs running on the same GPU, meaning you could do eight times the work with the same hardware.
To complete this walk-through you must have:
- Run:AI software is installed on your Kubernetes cluster. See: Installing Run:AI on an on-premise Kubernetes Cluster
- Run:AI CLI installed on your machine. See: Installing the Run:AI Command-Line Interface
Step by Step Walk-through¶
- Open the Run:AI user interface at app.run.ai
- Go to "Projects"
- Add a project named "team-a"
- Allocate 1 GPU to the project
At the command-line run:
runai project set team-a runai submit frac05 -i gcr.io/run-ai-demo/quickstart -g 0.5 --interactive runai submit frac03 -i gcr.io/run-ai-demo/quickstart -g 0.3
The jobs are based on a sample docker image
gcr.io/run-ai-demo/quickstartthe image contains a startup script that runs a deep learning TensorFlow-based workload.
- We named the jobs frac05 and frac03 respectively.
- Note that fractions may or may not use the
--interactiveflag. Setting the flag means that the job will not automatically finish. Rather, it is the researcher's responsibility to delete the job. Fractions support both Interactive and non-interactive jobs.
- The jobs are assigned to team-a with an allocation of a single GPU.
Follow up on the job's status by running:
Note that both jobs were allocated to the same node.
When both jobs are running, bash into one of them:
runai bash frac05
Now, inside the container, run:
- The total memory is circled in red. It should be 50% of the GPUs memory size. In the picture above we see 8GB which is half of the 16GB of Tesla V100 GPUs.
- The script running on the container is limited by 8GB. In this case, TensorFlow, which tends to allocate almost all of the GPU memory has allocated 7.7GB RAM (and not close to 16 GB). Overallocation beyond 8GB will lead to an out-of-memory exception