Submit an inference Workload via YAML¶
Warning
Inference API is deprecated. See Cluster API for its replacement.
Parameters:
<WORKLOAD-NAME>
. The name of the Workload. The name must comply with Kubernetes naming conventions for DNS Label names. With fractional workloads, the name is limited to 18 characters.<IMAGE-NAME>
. The name of the docker image to use. Example:gcr.io/run-ai-demo/quickstart-inference-marian
<USER-NAME>
The name of the user submitting the Workload. The name is used for display purposes only when Run:ai is installed in an unauthenticated mode.<REQUESTED-GPUs>
. An integer number of GPUs you request to be allocated for the Workload. Examples: 1, 2<NAMESAPCE>
The name of the Project's namespace. This is usuallyrunai-<PROJECT-NAME>
Submit Inference Workloads Allocating Full GPUs¶
Copy the following into a file while substituting the parameters:
apiVersion: apps/v1
kind: Deployment
metadata:
name: <WORKLOAD-NAME>
namespace: <NAMESPACE>
spec:
replicas: 1
selector:
matchLabels:
app: <WORKLOAD-NAME>
template:
metadata:
labels:
app: <WORKLOAD-NAME>
annotations:
user: <USER-NAME>
spec:
schedulerName: runai-scheduler
containers:
- image: <IMAGE-NAME>
name: <WORKLOAD-NAME>
ports:
- containerPort: <TARGET-PORT>
resources:
limits:
nvidia.com/gpu: <REQUESTED-GPUs>
---
apiVersion: v1
kind: Service
metadata:
labels:
app: <WORKLOAD-NAME>
name: <WORKLOAD-NAME>
spec:
type: NodePort
ports:
- port: <TARGET-PORT>
targetPort: <TARGET-PORT>
selector:
app: <WORKLOAD-NAME>
Note
This example also contains the creation of a service. The service is used to connect to the inference server. It is not mandatory, but for most inference cases the service will be needed as well.
To submit the Workload, run:
Submit Inference Workloads Allocating Fractions of a GPU¶
Replace <REQUESTED-GPUs>
with a fraction in quotes. e.g.
NVIDIA MPS¶
Workloads with NVIDIA MPS require a change in the above YAML.
Important
To use MPS, your administrator must first enable it. See the setup document.
Delete Workloads¶
To delete a Run:ai Inference workload, delete the Workload:
Last update: May 10, 2022