Best Practice: Identifying your Job from within the Container¶
Motivation¶
There may be use cases where your container may need to uniquely identify the Job it is currently running in. A typical use case is for saving Job artifacts under a unique name.
Run:AI provides environment variables you can use. These variables are guaranteed to be unique even if the Job is preempted or evicted and then runs again.
Identifying a Job¶
Run:AI provides the following environment variables:
JOB_NAME
- the name of the Job.JOB_UUID
- a unique identifier for the Job.
Note that the Job can be deleted and then recreated with the same name. A Job UUID will be different even if the Job names are the same.
Identifying a Pod¶
With Hyperparameter Optimization, experiments are run as Pods within the Job. Run:AI provides the following environment variables to identify the Pod.
POD_INDEX
- An index number (0, 1, 2, 3....) for a specific Pod within the Job. This is useful for Hyperparameter Optimization to allow easy mapping to individual experiments. The Pod index will remain the same if restarted (due to a failure or preemption). Therefore, it can be used by the Researcher to identify experiments.POD_UUID
- a unique identifier for the Pod. if the Pod is restarted, the Pod UUID will change.
Usage Example in Python¶
import os
jobName = os.environ['JOB_NAME']
jobUUID = os.environ['JOB_UUID']
Last update: January 3, 2021