Runai distributed submit
runai distributed submit¶
submit distributed
Examples¶
runai distributed submit <distributed_name> -p=<project_name> -i=runai.jfrog.io/demo/quickstart -f XGBoost/PyTorch/TF/MPI
Options¶
--allow-privilege-escalation Allow the job to gain additional privileges after starting
--annotation stringArray Set of annotations to populate into the container running the workspace
--auto-deletion-time-after-completion duration The length of time (like 5s, 2m, or 3h, higher than zero) after which a completed job is automatically deleted (default 0s)
--backoff-limit int The number of times the job will be retried before failing
--capability stringArray The POSIX capabilities to add when running containers. Defaults to the default set of capabilities granted by the container runtime.
-c, --command If true, override the image's entrypoint with the command supplied after '--'
--configmap-map-volume stringArray Mount ConfigMap as a volume. Use the fhe format name=CONFIGMAP_NAME,path=PATH
--cpu-core-limit float CPU core limit (e.g. 0.5, 1)
--cpu-core-request float CPU core request (e.g. 0.5, 1)
--cpu-memory-limit string CPU memory limit to allocate for the job (e.g. 1G, 500M)
--cpu-memory-request string CPU memory to allocate for the job (e.g. 1G, 500M)
--create-home-dir Create a temporary home directory. Defaults to true when --run-as-user is set, false otherwise
-e, --environment stringArray Set environment variables in the container
--existing-pvc stringArray Mount an existing persistent volume. Use the format: claimname=CLAIM_NAME,path=PATH
--extended-resource stringArray Request access to an extended resource. Use the format: resource_name=quantity
--external-url stringArray Expose URL from the job container. Use the format: container=9443,url=https://external.runai.com,authusers=user1,authgroups=group1
-f, --framework string The distributed training framework used in the workload.
--git-sync stringArray Specifies git repositories to mount into the container. Use the format: name=NAME,repository=REPO,path=PATH,secret=SECRET,rev=REVISION
-g, --gpu-devices-request int32 GPU units to allocate for the job (e.g. 1, 2)
--gpu-memory-limit string GPU memory limit to allocate for the job (e.g. 1G, 500M)
--gpu-memory-request string GPU memory to allocate for the job (e.g. 1G, 500M)
--gpu-portion-limit float GPU portion limit, must be no less than the gpu-memory-request (between 0 and 1, e.g. 0.5, 0.2)
--gpu-portion-request float GPU portion request (between 0 and 1, e.g. 0.5, 0.2)
--gpu-request-type string GPU request type (portion|memory|migProfile)
-h, --help help for submit
--host-ipc Whether to enable host IPC. (Default: false)
--host-network Whether to enable host networking. (Default: false)
--host-path stringArray Volumes to mount into the container. Use the format: path=PATH,mount=MOUNT,mount-propagation=None|HostToContainer,readwrite
-i, --image string The image for the workload
--image-pull-policy string Set image pull policy. One of: Always, IfNotPresent, Never. Defaults to Always (default "Always")
--label stringArray Set of labels to populate into the container running the workspace
--large-shm Request large /dev/shm device to mount
--master-args Arguments to pass to the master pod container command. If used together with --master-command, overrides the image's entrypoint of the master pod container with the given command
--master-environment stringArray Set environment variables in the container
--master-extended-resource stringArray Request access to an extended resource. Use the format: resource_name=quantity
--master-gpu-devices-request int32 GPU units to allocate for the job (e.g. 1, 2)
--master-gpu-portion-limit float GPU portion limit, must be no less than the gpu-memory-request (between 0 and 1, e.g. 0.5, 0.2)
--master-gpu-portion-request float GPU portion request (between 0 and 1, e.g. 0.5, 0.2)
--master-no-pvcs Do not mount any persistent volumes in the master pod
--max-replicas int32 Maximum number of replicas for an elastic PyTorch job
--mig-profile string MIG profile to allocate for the job (1g.5gb, 2g.10gb, 3g.20gb, 4g.20gb, 7g.40gb)
--min-replicas int32 Minimum number of replicas for an elastic PyTorch job
--name-prefix string Set defined prefix for the workload name and add index as a suffix
--new-pvc stringArray Mount a persistent volume, create it if it does not exist. Use the format: claimname=CLAIM_NAME,storageclass=STORAGE_CLASS,size=SIZE,path=PATH,accessmode-rwo,accessmode-rom,accessmode-rwm,ro,ephemeral
--nfs stringArray s3 storage details. Use the format: path=PATH,server=SERVER,mountpath=MOUNT_PATH,readwrite
--no-master Do not create a separate pod for the master
--node-pools stringArray List of node pools to use for scheduling the job, ordered by priority
--node-type string Enforce node type affinity by setting a node-type label
--port stringArray Expose ports from the job container. Use the format: service-type=NodePort,container=80,external=8080
--preferred-pod-topology-key string If possible, all pods of this job will be scheduled onto nodes that have a label with this key and identical values
-p, --project string Specify the project to which the command applies. By default, commands apply to the default project. To change the default project use ‘runai config project <project name>’
--required-pod-topology-key string Enforce scheduling pods of this job onto nodes that have a label with this key and identical values
--run-as-group int Run in the context of the current CLI group rather than the root group
--run-as-user int Run in the context of the current CLI user rather than the root user
--s3 stringArray s3 storage details. Use the format: name=NAME,bucket=BUCKET,path=PATH,accesskey=ACCESS_KEY,url=URL
--seccomp-profile string Indicates which kind of seccomp profile will be applied to the container, options: RuntimeDefault|Unconfined|Localhost
--slots-per-worker int32 Number of slots to allocate for each worker
--supplemental-groups string Comma seperated list of groups that the user running the container belongs to, in addition to the group indicated by --run-as-gid
--toleration stringArray Toleration details. Use the format: operator=Equal|Exists,key=KEY,[value=VALUE],[effect=NoSchedule|NoExecute|PreferNoSchedule],[seconds=SECONDS]
--user-group-source string Indicate the way to determine the user and group ids of the container, options: fromTheImage|fromIdpToken|fromIdpToken
--workers int32 the number of workers that will be allocated for running the workload
--working-dir string Set the container's working directory
Options inherited from parent commands¶
--config-file string config file name; can be set by environment variable RUNAI_CLI_CONFIG_FILE (default "config.json")
--config-path string config path; can be set by environment variable RUNAI_CLI_CONFIG_PATH (default "~/.runai/")
-d, --debug enable debug mode
-q, --quiet enable quiet mode, suppress all output except error messages
--verbose enable verbose mode
SEE ALSO¶
- runai distributed - distributed management