Running Spark jobs with Run:AI¶
Spark has two modes for running jobs on kubernetes:
- Using a CLI tool called
spark-submitthat submits raw pods.
- CRD with operator.
To run a Spark job on Kubernetes using the CLI:
- Download a pre-built spark with hadoop image from here.
- Open the file, then go to its root to submit the jobs.
Ensure that your Kubernetes cluster has a service account with permissions in the namespace that you want to run the jobs in. Use the following commands to launch the Spark demo:
Change the namespace to
We need to build docker images and push them to either a public repository or load them to kind.
To build the images run:
Then push the docker image to your repository:
To submit a job:
- Set the value of the API server of the kubernetes cluster you are working with in the
kubectl config viewto search for your cluster.
- Copy the value of the server field (for example, https://127.0.0.1:46443).
To run a simple job with the default scheduler use the following:
./bin/spark-submit --master k8s://$K8S\_SERVER --deploy-mode cluster --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.kubernetes.namespace=spark-demo \ --conf spark.executor.instances=5 \ --conf spark.kubernetes.container.image=spark:v3.2.1 \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ local:///opt/spark/examples/jars/spark-examples\_2.12-3.4.0.jar 10
The command will first create a pod called driver" and then it will create 5 executor (worker) pods that will do the actual work of running the job. The executor pods will have the driver as their Kubernetes owner.
Submitting jobs using
To submit a job with
runai-scheduler in project
<project_name> add or change these flags:
To schedule the executors on GPUs, add the following flags:
With GPU fractions add the annotaiton to the executor pods: