Quickstart: Launch Workloads with GPU Fractions¶

Introduction¶

Run:ai provides a Fractional GPU sharing system for containerized workloads on Kubernetes. The system supports workloads running CUDA programs and is especially suited for lightweight AI tasks such as inference and model building. The fractional GPU system transparently gives data science and AI engineering teams the ability to run multiple workloads simultaneously on a single GPU, enabling companies to run more workloads such as computer vision, voice recognition and natural language processing on the same hardware, lowering costs.

Run:ai’s fractional GPU system effectively creates logical GPUs, with their own memory and computing space that containers can use and access as if they were self-contained processors. This enables several workloads to run in containers side-by-side on the same GPU without interfering with each other. The solution is transparent, simple, and portable; it requires no changes to the containers themselves.

A typical use-case could see a couple of Workloads running on the same GPU, meaning you could multiply the work with the same hardware.

The purpose of this article is to provide a quick ramp-up to running a training Workload with fractions of a GPU.

There are various ways to submit a Workload:

Run:ai command-line interface (CLI)
Run:ai user interface
Run:ai API

Prerequisites¶

To complete this Quickstart, the Platform Administrator will need to provide you with:

Researcher access to Run:ai
To a Project named "team-a"
With at least 1 GPU assigned to the project.
A link to the Run:ai Console. E.g. https://acme.run.ai.
To complete this Quickstart via the CLI, you will need to have the Run:ai CLI installed on your machine. There are two available CLI variants:
- The older V1 CLI. See installation here
- A newer V2 CLI, supported with clusters of version 2.18 and up. See installation here

Step by Step Walkthrough¶

CLI V2CLI V1 (Deprecated)User InterfaceAPI

Run runai login and enter your credentials.

Browse to the provided Run:ai user interface and log in with your credentials.

To use the API, you will need to obtain a token. Please follow the api authentication article.

Run Workload¶

Open a terminal and run:

CLI V2CLI V1 (Deprecated)User InterfaceAPI

runai project set team-a
runai training submit frac05 -i runai.jfrog.io/demo/quickstart --gpu-portion-request 0.5
runai training submit frac05-2 -i runai.jfrog.io/demo/quickstart --gpu-portion-request 0.5

runai config project team-a   
runai submit frac05 -i runai.jfrog.io/demo/quickstart -g 0.5
runai submit frac05-2 -i runai.jfrog.io/demo/quickstart -g 0.5

In the Run:ai UI select Workloads
Select New Workload and then Training
You should already have Cluster, Project and a start from scratch Template selected. Enter frac05 as the name and press CONTINUE.
Select NEW ENVIRONMENT. Enter quickstart as the name and runai.jfrog.io/demo/quickstart as the image. Then select CREATE ENVIRONMENT.
When the previous screen comes up, select half-gpu under the Compute resource.
Select CREATE TRAINING.
Follow the process again to submit a second workload called frac05-2.

Note

For more information on submitting Workloads and creating Assets via the user interface, see Workload documentation.

curl -L 'https://<COMPANY-URL>/api/v1/workloads/trainings' \ # (1)
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \ # (2)
-d '{ 
    "name": "frac05", 
    "projectId": "<PROJECT-ID>", '\ # (3)
    "clusterId": "<CLUSTER-UUID>", \ # (4)
    "spec": {
        "image": "runai.jfrog.io/demo/quickstart",
        "compute": {
        "gpuRequestType": "portion",
        "gpuPortionRequest" : 0.5
        }
    }
}'

<COMPANY-URL> is the link to the Run:ai user interface. For example acme.run.ai
<TOKEN> is an API access token. see above on how to obtain a valid token.
<PROJECT-ID> is the the ID of the team-a Project. You can get the Project ID via the Get Projects API
<CLUSTER-UUID> is the unique identifier of the Cluster. You can get the Cluster UUID by adding the "Cluster ID" column to the Clusters view.

Note

The above API snippet will only work with Run:ai clusters of 2.18 and above. For older clusters, use, the now deprecated Cluster API.
For more information on the Training Submit API see API Documentation

The Workloads are based on a sample docker image runai.jfrog.io/demo/quickstart the image contains a startup script that runs a deep learning TensorFlow-based workload.
We named the Workloads frac05 and frac05-2 respectively.
The Workloads are assigned to team-a with an allocation of half a GPU.

List Workloads¶

Follow up on the Workload's progress by running:

CLI V2CLI V1 (Deprecated)User Interface

runai training list

The result:

Workload               Type        Status      Project     Preemptible      Running/Requested Pods     GPU Allocation
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
frac05      Training    Running  team-a      Yes              0/1                        0.00
frac05-2    Training    Running  team-a      Yes              0/1                        0.00

runai list jobs

The result:

Showing jobs for project team-a
NAME      STATUS   AGE  NODE                  IMAGE                          TYPE   PROJECT  USER   GPUs Allocated (Requested)  PODs Running (Pending)  SERVICE URL(S)
frac05    Running  9s   runai-cluster-worker  runai.jfrog.io/demo/quickstart  Train  team-a   yaron  0.50 (0.50)                 1 (0)
frac05-2  Running  8s   runai-cluster-worker  runai.jfrog.io/demo/quickstart  Train  team-a   yaron  0.50 (0.50)                 1 (0)

Open the Run:ai user interface.
Under Workloads you can view the two new Training Workloads

View Partial GPU memory¶

To verify that the Workload sees only parts of the GPU memory run:

CLI V2CLI V1 (Deprecated)

runai training exec frac05 nvidia-smi

runai exec frac05 nvidia-smi

The result:

Notes:

The total memory is circled in red. It should be 50% of the GPUs memory size. In the picture above we see 8GB which is half of the 16GB of Tesla V100 GPUs.
The script running on the container is limited by 8GB. In this case, TensorFlow, which tends to allocate almost all of the GPU memory has allocated 7.7GB RAM (and not close to 16 GB). Overallocation beyond 8GB will lead to an out-of-memory exception

Use Exact GPU Memory¶

Instead of requesting a fraction of the GPU, you can ask for specific GPU memory requirements. For example:

CLI V2CLI V1 (Deprecated)User Interface

runai training submit -i runai.jfrog.io/demo/quickstart --gpu-memory-request 5G

runai submit  -i runai.jfrog.io/demo/quickstart --gpu-memory 5G

As part of the Workload submission, Create a new Compute Resource, with 1 GPU Device and 5GB of GPU memory per device. See picture below:

Which will provide 5GB of GPU memory.

Quickstart: Launch Workloads with GPU Fractions¶

Introduction¶

Prerequisites¶

Step by Step Walkthrough¶

Login¶

Run Workload¶

List Workloads¶

View Partial GPU memory¶

Use Exact GPU Memory¶