Inference API is deprecated. See Cluster API for its replacement.
Inference Jobs are an integral part of Run:ai and do not require setting up per se. However, Running multiple production-grade processes on a single GPU is best performed with an NVIDIA technology called Multi-Process Service or MPS
By default, MPS is not enabled on GPU nodes.
To enable the MPS server on all nodes, you must edit the cluster installation values file:
- When installing the Run:ai cluster, edit the values file.
- On an existing installation, use the upgrade cluster instructions to modify the values file.
Wait for the MPS server to start running:
When the MPS server pod has started to run, restart the
To enable the MPS server on selected nodes, please contact Run:ai customer support.
Verify MPS is Enabled¶
Verify that all mps-server pods are in
Submit a workload with MPS enabled using the --mps flag. Then run:
- Identify the node on which the workload is running. In the
get podscommand above find the pod running on the same node and then run:
You should see activity in the log