Environment Variables inside a Run:ai Workload¶
Identifying a Job¶
There may be use cases where your container may need to uniquely identify the Job it is currently running in. A typical use case is for saving Job artifacts under a unique name. Run:ai provides pre-defined environment variables you can use. These variables are guaranteed to be unique even if the Job is preempted or evicted and then runs again.
Run:ai provides the following environment variables:
JOB_NAME- the name of the Job.
JOB_UUID- a unique identifier for the Job.
Note that the Job can be deleted and then recreated with the same name. A Job UUID will be different even if the Job names are the same.
Identifying a Pod¶
With Hyperparameter Optimization, experiments are run as Pods within the Job. Run:ai provides the following environment variables to identify the Pod.
POD_INDEX- An index number (0, 1, 2, 3....) for a specific Pod within the Job. This is useful for Hyperparameter Optimization to allow easy mapping to individual experiments. The Pod index will remain the same if restarted (due to a failure or preemption). Therefore, it can be used by the Researcher to identify experiments.
POD_UUID- a unique identifier for the Pod. if the Pod is restarted, the Pod UUID will change.
Run:ai provides an environment variable, visible inside the container, to help identify the number of GPUs allocated for the container. Use
There may be use cases where your container may need to identify the node it is currently running on. Run:ai provides an environment variable, visible inside the container, to help identify the name of the node on which the pod was scheduled. Use
Usage Example in Python¶