Quickstart: Launch Interactive Build Workloads¶

Introduction¶

Deep learning workloads can be divided into two generic types:

Interactive "build" sessions. With these types of workloads, the data scientist opens an interactive session, via bash, Jupyter notebook, remote PyCharm, or similar and accesses GPU resources directly.
Unattended "training" sessions. With these types of workloads, the data scientist prepares a self-running workload and sends it for execution. During the execution, the customer can examine the results.

With this Quickstart you will learn how to:

Use the Run:ai command-line interface (CLI) to start a deep learning Build workload
Open an ssh session to the Build workload
Stop the Build workload

It is also possible to open ports to specific services within the container. See "Next Steps" at the end of this article.

To complete this Quickstart you must have:

Run:ai software installed on your Kubernetes cluster. See: Installing Run:ai on a Kubernetes Cluster
Run:ai CLI installed on your machine. See: Installing the Run:ai Command-Line Interface

At the command-line run:

runai config project team-a
runai submit build1 -i ubuntu -g 1 --interactive -- sleep infinity

The job is based on a sample docker image ubuntu
We named the job build1.
Note the interactive flag which means the job will not have a start or end. It is the Researcher's responsibility to close the job.
The job is assigned to team-a with an allocation of a single GPU.
The command provided is sleep infinity. You must provide a command or the container will start and then exit immediately. Alternatively, replace these flags with --attach to attach immediately to a session.

Follow up on the job's status by running:

runai list jobs

The result:

Typical statuses you may see:

ContainerCreating - The docker container is being downloaded from the cloud repository
Pending - the job is waiting to be scheduled
Running - the job is running

A full list of Job statuses can be found here

To get additional status on your job run:

runai describe job build1

Run:

runai bash build1

This should provide a direct shell into the computer

Run the following:

runai delete job build1

This would stop the training workload. You can verify this by running runai list jobs again.