Connecting to TensorBoard¶
Once you launch a Deep Learning workload using Run:ai, you may want to view its progress. A popular tool for viewing progress is TensorBoard.
The document below explains how to use TensorBoard to view the progress or a Run:ai Job.
Submit a Workload¶
The code shows:
- A reference to a log directory:
- A registered Keras callback for TensorBoard:
logs directory must be saved on a Network File Server such that it can be accessed by the TensorBoard Job. For example, by running the Job as follows:
Note the volume flag (
-v) and working directory flag (
--working-dir). The logs directory will be created on
Submit a TensorBoard Workload¶
There are two ways to submit a TensorBoard Workload: via the Command-line interface or the user interface
Submit via the User interface¶
- Within the user interface go to the Job list.
New Jobon the top right.
Interactiveat the top.
- Add an image that supports TensorBoard. For example:
- Select the
- Add a mounted volume on which TensorBoard logs exist. The example above uses
/mnt/nfs_share/john. Map to
TensorBoard Logs Directory.
Submit the Job. When running, select the job and press
Connect on the top right.
Submit via the Command-line interface¶
Run the following:
The terminal will show the following:
The job 'tb' has been submitted successfully You can run `runai describe job tb -p team-a` to check the job status INFO Waiting for job to start Waiting for job to start INFO Job started Open access point(s) to service from localhost:8888 Forwarding from 127.0.0.1:8888 -> 8888 Forwarding from [::1]:8888 -> 8888
Browse to http://localhost:8888/ to view TensorBoard.
A single TensorBoard Job can be used to view multiple deep learning Jobs, provided it has access to the logs directory for these Jobs.
You can also submit a TensorBoard Job via the user interface. In which case, instead of
portforward you will need to select a different service type. If the URL to the TensorBoard job includes a path, you may need to use the TensorBoard flag
--path_prefix. For example, if your access point is acme.com/tensorboard1 add