Inference Setup¶
Warning
Inference API is deprecated. See Cluster API for its replacement.
Inference Jobs are an integral part of Run:ai and do not require setting up per se. However, Running multiple production-grade processes on a single GPU is best performed with an NVIDIA technology called Multi-Process Service or MPS
By default, MPS is not enabled on GPU nodes.
Enable MPS¶
To enable the MPS server on all nodes, you must edit the cluster installation values file:
- When installing the Run:ai cluster, edit the values file.
- On an existing installation, use the upgrade cluster instructions to modify the values file.
Use:
Wait for the MPS server to start running:
When the MPS server pod has started to run, restart the nvidia-device-plugin
pods:
To enable the MPS server on selected nodes, please contact Run:ai customer support.
Verify MPS is Enabled¶
Run:
-
Verify that all mps-server pods are in
Running
state. -
Submit a workload with MPS enabled using the --mps flag. Then run:
- Identify the node on which the workload is running. In the
get pods
command above find the pod running on the same node and then run:
You should see activity in the log