Skip to content

Email notifications

The notifications service listens to events on the Kubernetes cluster and passes notifications of those events via email. The service can be configured to send the notifications to one or more pre configured email addresses, or to the email address of the user that submitted the workload.

Note

In order to send notifications dynamically to the user who submitted the workload, the user should be logged in to the Run:ai UI or CLI.

The service can also be configured using a regular expression to send notifications only for specific namespaces on the cluster. This enables notifications only for specific Run:ai projects. The default configuration sends notifications for all the namespaces starting with runai-.

Prerequisites

  1. The service should be installed on each cluster used with Run:ai. The installation will be done separately from the Run:ai cluster installation using a new helm chart.
  2. As a part of the installation, the customer should provide their SMTP server address as well as credentials for it.

Available notifications

Configure the notifications service to send events using the relevant kind and event reason. The following Run:ai notifications are available:

Event Kind Reason Description Additional info
Pod scheduled Pod Scheduled a pod was scheduled on a node Pod, Job, Project, Namespace, User
Pod evicted PodGroup Evict a pod was evicted to make room for another pod with higher priority, or to reclaim resources that belong to other project or department Pod, Job, Project, Namespace, User
Pod unschedulable Pod Unschedulable a pod was determined as unschedulable and couldn't be scheduled on any node in the cluster Pod, Job, Project, Namespace, User
Failed scheduling pod Pod FailedScheduling binding a pod to a node failed Pod, Job, Project, Namespace, User

Tip

You can configure the notifications service to send event messages about additional Kubernetes events using the relevant kind and event reason.

Installation

Install the notification service using the following commands:

  1. Set the helm repo to point to the notification service package using the following command:
helm repo add runai-notifications-service https://storage.googleapis.com/runai-notifications-service/

helm repo update
  1. Check for the latest version using the following command
helm repo search runai-notifications-service 
  1. Install the latest version using the following command:
helm install runai-notifications-service/notifications-service --version 0.0.0

Configuration

The notification service is configured using a configmap file. The following is an example of a configmap file. Each of the tables below references a section in the configmap file.

service configuration

This section defines the number of events that will be sent by the service. Use the following table to configure options in the service section of the configmap file.

Component Field Description Default
service service.concurrent_limit maximum number of events the service will handle in parallel 50
service service.cached_events queue size for events before blocking the listener 1000

listener configuration

This section defines the objects and events that the service will listen to and send as notifications. Use the following table to configure options in the listener section of the configmap file.

Component Field Description Default
kubelistener listener.relevant_objects white list of Kubernetes components for notifications relevant_objects:
kind:
Podreasons:UnschedulableScheduled

kind:
PodGroupreasons: - Evict
kubelistener listener.relevant_namespaces white list of namespaces that the service should listen to for events (regex) runai-.*

enrich configuration

Note

This section of the configmap is for internal use only. Keep the default values.

Component Field Default
KubeMetadata enricher.kubeMetadata.lables release: workloadDisplayName

training.kubeflow.org/job-name: workloadDisplayName

serving.knative.dev/service: workloadDisplayName

project: project
KubeMetadata enricher.kubeMetadata.annotations "user": "user"

notify configuration

This section defines the notification configuration of the service and contains the details for the SMTP server and the recipients list. Use the following table to configure options in the notify section of the configmap file.

Component Field Description Default
Email notify.email.smtp_host (M) SMTP server host address Empty
Email notify.email.smtp_port (M) SMTP server port 587
Email notify.email.from_display_name (M) email's "From" display name Run:ai
Email notify.email.from (M) a valid domain source email address test@run.ai
Email notify.email.user (M) SMTP server user login user
Email notify.email.password (M) SMTP server user's password password
Email notify.email.direct_notifications (together with Recipients) when set to true, email notifications will be sent dynamically to the user who submitted the workload false
Email notify.email.recipients (together with Direct Notifications) additional email address recipients list for all the events - broadcast Empty list

(M) = mandatory to include in the configmap file.

Example configmap file

The following file is an example of a configmap file for the notification service.

  service:
    concurrent_limit: 50
    cached_events: 1000
  listener:
    relevant_namespaces:
    - runai.*
    relevant_objects:
    - kind: Pod
      reasons:
        - Unschedulable
        - Scheduled
    - kind: PodGroup
      reasons:
        - Evict
  enrich:
    kubeMetadata:
      labels:
        "release": "workloadDisplayName"
        "training.kubeflow.org/job-name": "workloadDisplayName"
        "serving.knative.dev/service": "workloadDisplayName"
        "project": "project"
      annotations:
        "user": "user"
  notify:
    email:
      template_path: path/email.html  # Internal use only.
      from: [email protected]
      user: smtp_user
      password: smtp_password
      smtp_host: smtp.mail.com
      smtp_port: 587
      from_display_name: Company Name
      direct_notifications: true
      recipients:
      - [email protected]