Monitoring Cluster Health
Introduction¶
This documentation outlines the steps required to set up Alertmanager within the Prometheus Operator ecosystem. It also provides guidance on configuring Prometheus to send alerts to Alertmanager and customizing Alertmanager to trigger alerts based on specific Run.ai conditions.
Prerequisites¶
- A Kubernetes cluster with the necessary permissions and manage resources.
kubectl
command-line tool installed and configured to interact with the cluster.- Basic knowledge of Kubernetes resources and manifests.
- up and running Prometheus Operator
- Up and running Run.ai environment
Validate Prometheus Operator Installed¶
-
Verify that the Prometheus Operator deployment is running:
kubectl get deployment prometheus-operator -n runai
You should see output indicating the deployment's status, including the number of replicas and their current state.
-
Check if Prometheus instances are running:
kubectl get prometheus -n runai
You should see the Prometheus instance(s) listed along with their status.
Enabling Alertmanager¶
-
Create an
AlertmanagerConfig
file that triggers alerts on Run.ai events:cat <<EOF | kubectl apply -f apiVersion: monitoring.coreos.com/v1alpha1 kind: AlertmanagerConfig metadata: name: runai namespace: runai labels: alertmanagerConfig: runai EOF
-
Create the Alertmanager CustomResource to enable Alertmanager:
cat <<EOF | kubectl apply -f - apiVersion: monitoring.coreos.com/v1 kind: Alertmanager metadata: name: runai namespace: runai spec: replicas: 1 alertmanagerConfigSelector: matchLabels: alertmanagerConfig: runai EOF
-
Exposing the Alertmanager Service
cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Service metadata: name: alertmanager-runai namespace: runai spec: type: NodePort ports: - name: web nodePort: 30903 port: 9093 protocol: TCP targetPort: web selector: alertmanager: runai EOF
Configuring Prometheus to Send Alerts¶
-
Edit the Prometheus configuration:
kubectl edit prometheus runai -n runai
-
Add the following to the
alerting
section:alerting: alertmanagers: - namespace: runai name: alertmanager-runai port: web
-
Save and exit the editor. The configuration will be automatically reloaded.
Configuring Alertmanager for Custom Email Alerts¶
-
Add your smtp password as a secret:
cat <<EOF | kubectl apply -f apiVersion: v1 kind: Secret metadata: name: smtp-password namespace: runai stringData: password: "your_smtp_password" EOF
-
Edit the Alertmanager configuration:
kubectl edit alertmanagerconfig -n runai
-
Add to the
spec
section, a new receiver configuration to send alerts via email:receivers: - name: 'email' emailConfigs: - to: '[email protected]' from: '[email protected]' smarthost: 'smtp.yourmailprovider.com:587' authUsername: 'your_username' authPassword: name: smtp-password key: password
Note
Different receivers can be configured using Alertmanager receiver-integration-settings.
-
Add to the
spec
section, a new route that forwards Run.ai alerts to the mail receiver:route: continue: true groupBy: - alertname groupWait: 30s groupInterval: 5m repeatInterval: 1h matchers: - matchType: =~ name: alertname value: Runai.* receiver: email
-
Save and exit the editor. The configuration will be automatically reloaded.
Alert Messages¶
Alerts help you troubleshoot your system and give you a better understanding of currently occurring issues that affect performance. For more insight into the meaning of the alert messages, see Prometheus Alerts.