Submit an inference Workload via YAML¶
Inference API is deprecated. See Cluster API for its replacement.
<WORKLOAD-NAME>. The name of the Workload. The name must comply with Kubernetes naming conventions for DNS Label names. With fractional workloads, the name is limited to 18 characters.
<IMAGE-NAME>. The name of the docker image to use. Example:
<USER-NAME>The name of the user submitting the Workload. The name is used for display purposes only when Run:ai is installed in an unauthenticated mode.
<REQUESTED-GPUs>. An integer number of GPUs you request to be allocated for the Workload. Examples: 1, 2
<NAMESAPCE>The name of the Project's namespace. This is usually
Submit Inference Workloads Allocating Full GPUs¶
Copy the following into a file while substituting the parameters:
apiVersion: apps/v1 kind: Deployment metadata: name: <WORKLOAD-NAME> namespace: <NAMESPACE> spec: replicas: 1 selector: matchLabels: app: <WORKLOAD-NAME> template: metadata: labels: app: <WORKLOAD-NAME> annotations: user: <USER-NAME> spec: schedulerName: runai-scheduler containers: - image: <IMAGE-NAME> name: <WORKLOAD-NAME> ports: - containerPort: <TARGET-PORT> resources: limits: nvidia.com/gpu: <REQUESTED-GPUs> --- apiVersion: v1 kind: Service metadata: labels: app: <WORKLOAD-NAME> name: <WORKLOAD-NAME> spec: type: NodePort ports: - port: <TARGET-PORT> targetPort: <TARGET-PORT> selector: app: <WORKLOAD-NAME>
This example also contains the creation of a service. The service is used to connect to the inference server. It is not mandatory, but for most inference cases the service will be needed as well.
To submit the Workload, run:
Submit Inference Workloads Allocating Fractions of a GPU¶
<REQUESTED-GPUs> with a fraction in quotes. e.g.
Workloads with NVIDIA MPS require a change in the above YAML.
To use MPS, your administrator must first enable it. See the setup document.
To delete a Run:ai Inference workload, delete the Workload: