Configuring alerts and notifications for core platform monitoring
You can configure a local or external Alertmanager instance to route alerts from Prometheus to endpoint receivers. You can also attach custom labels to all time series and alerts to add useful metadata information.
Configuring external Alertmanager instances
The OpenShift Container Platform monitoring stack includes a local Alertmanager instance that routes alerts from Prometheus.
You can add external Alertmanager instances to route alerts for core OpenShift Container Platform projects.
If you add the same external Alertmanager configuration for multiple clusters and disable the local instance for each cluster, you can then manage alert routing for multiple clusters by using a single external Alertmanager instance.
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have created the
cluster-monitoring-configConfigMapobject. -
You have installed the OpenShift CLI (
oc).
-
Edit the
cluster-monitoring-configconfig map in theopenshift-monitoringproject:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config -
Add an
additionalAlertmanagerConfigssection with configuration details underdata/config.yaml/prometheusK8s:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: additionalAlertmanagerConfigs: - <alertmanager_specification>- Substitute
<alertmanager_specification>with authentication and other configuration details for additional Alertmanager instances. Currently supported authentication methods are bearer token (bearerToken) and client TLS (tlsConfig).The following sample config map configures an additional Alertmanager for Prometheus by using a bearer token with client TLS authentication:
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: additionalAlertmanagerConfigs: - scheme: https pathPrefix: / timeout: "30s" apiVersion: v1 bearerToken: name: alertmanager-bearer-token key: token tlsConfig: key: name: alertmanager-tls key: tls.key cert: name: alertmanager-tls key: tls.crt ca: name: alertmanager-tls key: tls.ca staticConfigs: - external-alertmanager1-remote.com - external-alertmanager1-remote2.com
- Substitute
-
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Disabling the local Alertmanager
A local Alertmanager that routes alerts from Prometheus instances is enabled by default in the openshift-monitoring project of the OpenShift Container Platform monitoring stack.
If you do not need the local Alertmanager, you can disable it by configuring the cluster-monitoring-config config map in the openshift-monitoring project.
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have created the
cluster-monitoring-configconfig map. -
You have installed the OpenShift CLI (
oc).
-
Edit the
cluster-monitoring-configconfig map in theopenshift-monitoringproject:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config -
Add
enabled: falsefor thealertmanagerMaincomponent underdata/config.yaml:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: enabled: false -
Save the file to apply the changes. The Alertmanager instance is disabled automatically when you apply the change.
Configuring secrets for Alertmanager
The OpenShift Container Platform monitoring stack includes Alertmanager, which routes alerts from Prometheus to endpoint receivers. If you need to authenticate with a receiver so that Alertmanager can send alerts to it, you can configure Alertmanager to use a secret that contains authentication credentials for the receiver.
For example, you can configure Alertmanager to use a secret to authenticate with an endpoint receiver that requires a certificate issued by a private Certificate Authority (CA).
You can also configure Alertmanager to use a secret to authenticate with a receiver that requires a password file for Basic HTTP authentication.
In either case, authentication details are contained in the Secret object rather than in the ConfigMap object.
Adding a secret to the Alertmanager configuration
You can add secrets to the Alertmanager configuration by editing the cluster-monitoring-config config map in the openshift-monitoring project.
After you add a secret to the config map, the secret is mounted as a volume at /etc/alertmanager/secrets/<secret_name> within the alertmanager container for the Alertmanager pods.
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have created the
cluster-monitoring-configconfig map. -
You have created the secret to be configured in Alertmanager in the
openshift-monitoringproject. -
You have installed the OpenShift CLI (
oc).
-
Edit the
cluster-monitoring-configconfig map in theopenshift-monitoringproject:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config -
Add a
secrets:section underdata/config.yaml/alertmanagerMainwith the following configuration:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: secrets: - <secret_name_1> - <secret_name_2>- This section contains the secrets to be mounted into Alertmanager. The secrets must be located within the same namespace as the Alertmanager object.
- The name of the
Secretobject that contains authentication credentials for the receiver. If you add multiple secrets, place each one on a new line.The following sample config map settings configure Alertmanager to use two
Secretobjects namedtest-secret-basic-authandtest-secret-api-token:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: secrets: - test-secret-basic-auth - test-secret-api-token
-
Save the file to apply the changes. The new configuration is applied automatically.
Attaching additional labels to your time series and alerts
You can attach custom labels to all time series and alerts leaving Prometheus by using the external labels feature of Prometheus.
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have created the
cluster-monitoring-configConfigMapobject. -
You have installed the OpenShift CLI (
oc).
-
Edit the
cluster-monitoring-configconfig map in theopenshift-monitoringproject:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config -
Define labels you want to add for every metric under
data/config.yaml:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: externalLabels: <key>: <value>- Substitute
<key>: <value>with key-value pairs where<key>is a unique name for the new label and<value>is its value.Warning
-
Do not use
prometheusorprometheus_replicaas key names, because they are reserved and will be overwritten. -
Do not use
clusteras a key name. Using it can cause issues where you are unable to see data in the developer dashboards.
For example, to add metadata about the region and environment to all time series and alerts, use the following example:
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: externalLabels: region: eu environment: prod -
- Substitute
-
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Configuring alert notifications
In OpenShift Container Platform 4.19, you can view firing alerts in the Alerting UI. You can configure Alertmanager to send notifications about default platform alerts by configuring alert receivers.
Important
Alertmanager does not send notifications by default. It is strongly recommended to configure Alertmanager to receive notifications by configuring alert receivers through the web console or through the alertmanager-main secret.
Configuring alert routing for default platform alerts
You can configure Alertmanager to send notifications to receive important alerts coming from your cluster. Customize where and how Alertmanager sends notifications about default platform alerts by editing the default configuration in the alertmanager-main secret in the openshift-monitoring namespace.
Note
All features of a supported version of upstream Alertmanager are also supported in an OpenShift Container Platform Alertmanager configuration. To check all the configuration options of a supported version of upstream Alertmanager, see Alertmanager configuration (Prometheus documentation).
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have installed the OpenShift CLI (
oc).
-
Extract the currently active Alertmanager configuration from the
alertmanager-mainsecret and save it as a localalertmanager.yamlfile:$ oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' | base64 --decode > alertmanager.yaml -
Open the
alertmanager.yamlfile. -
Edit the Alertmanager configuration:
-
Optional: Change the default Alertmanager configuration:
Example of the default Alertmanager secret YAMLglobal: resolve_timeout: 5m http_config: proxy_from_environment: true route: group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: default routes: - matchers: - "alertname=Watchdog" repeat_interval: 2m receiver: watchdog receivers: - name: default - name: watchdog- If you configured an HTTP cluster-wide proxy, set the
proxy_from_environmentparameter totrueto enable proxying for all alert receivers. - Specify how long Alertmanager waits while collecting initial alerts for a group of alerts before sending a notification.
- Specify how much time must elapse before Alertmanager sends a notification about new alerts added to a group of alerts for which an initial notification was already sent.
- Specify the minimum amount of time that must pass before an alert notification is repeated.
If you want a notification to repeat at each group interval, set the
repeat_intervalvalue to less than thegroup_intervalvalue. The repeated notification can still be delayed, for example, when certain Alertmanager pods are restarted or rescheduled.
- If you configured an HTTP cluster-wide proxy, set the
-
Add your alert receiver configuration:
# ... receivers: - name: default - name: watchdog - name: <receiver> <receiver_configuration> # ...- The name of the receiver.
- The receiver configuration. The supported receivers are PagerDuty, webhook, email, Slack, and Microsoft Teams.
Example of configuring PagerDuty as an alert receiver
# ... receivers: - name: default - name: watchdog - name: team-frontend-page pagerduty_configs: - routing_key: xxxxxxxxxx http_config: proxy_from_environment: true authorization: credentials: xxxxxxxxxx # ... - Defines the PagerDuty integration key.
- Optional: Add the custom HTTP configuration for a specific receiver. That receiver does not inherit the global HTTP configuration settings.
Example of configuring email as an alert receiver
# ... receivers: - name: default - name: watchdog - name: team-frontend-page email_configs: - to: myemail@example.com from: alertmanager@example.com smarthost: 'smtp.example.com:587' auth_username: alertmanager@example.com auth_password: password hello: alertmanager # ...1 Specify an email address to send notifications to. 2 Specify an email address to send notifications from. 3 Specify the SMTP server address used for sending emails, including the port number. 4 Specify the authentication credentials that Alertmanager uses to connect to the SMTP server. This example uses username and password. 5 Specify the hostname to identify to the SMTP server. If you do not include this parameter, the hostname defaults to localhost.Important
Alertmanager requires an external SMTP server to send email alerts. To configure email alert receivers, ensure you have the necessary connection details for an external SMTP server.
- Specify an email address to send notifications to.
- Specify an email address to send notifications from.
- Specify the SMTP server address used for sending emails, including the port number.
- Specify the authentication credentials that Alertmanager uses to connect to the SMTP server. This example uses username and password.
- Specify the hostname to identify to the SMTP server. If you do not include this parameter, the hostname defaults to
localhost.
-
Add the routing configuration:
# ... route: group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: default routes: - matchers: - "alertname=Watchdog" repeat_interval: 2m receiver: watchdog - matchers: - "<your_matching_rules>" receiver: <receiver> # ...- Use the
matcherskey name to specify the matching rules that an alert has to fulfill to match the node. If you define inhibition rules, usetarget_matcherskey name for target matchers andsource_matcherskey name for source matchers. - Specify labels to match your alerts.
- Specify the name of the receiver to use for the alerts.
Warning
Do not use the
match,match_re,target_match,target_match_re,source_match, andsource_match_rekey names, which are deprecated and planned for removal in a future release.Example of alert routing# ... route: group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: default routes: - matchers: - "alertname=Watchdog" repeat_interval: 2m receiver: watchdog - matchers: - "service=example-app" routes: - matchers: - "severity=critical" receiver: team-frontend-page # ...1 This example matches alerts from the example-appservice.2 You can create routes within other routes for more complex alert routing. The previous example routes alerts of
criticalseverity that are fired by theexample-appservice to theteam-frontend-pagereceiver. Typically, these types of alerts are paged to an individual or a critical response team. - This example matches alerts from the
example-appservice. - You can create routes within other routes for more complex alert routing.
- Use the
-
-
Apply the new configuration in the file:
$ oc -n openshift-monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run=client -o=yaml | oc -n openshift-monitoring replace secret --filename=- -
Verify your routing configuration by visualizing the routing tree:
$ oc exec alertmanager-main-0 -n openshift-monitoring -- amtool config routes show --alertmanager.url http://localhost:9093Example outputRouting tree: . └── default-route receiver: default ├── {alertname="Watchdog"} receiver: Watchdog └── {service="example-app"} receiver: default └── {severity="critical"} receiver: team-frontend-page
Configuring alert routing with the OpenShift Container Platform web console
You can configure alert routing through the OpenShift Container Platform web console to ensure that you learn about important issues with your cluster.
Note
The OpenShift Container Platform web console provides fewer settings to configure alert routing than the alertmanager-main secret. To configure alert routing with the access to more configuration settings, see "Configuring alert routing for default platform alerts".
-
You have access to the cluster as a user with the
cluster-admincluster role.
-
In the OpenShift Container Platform web console, go to Administration → Cluster Settings → Configuration → Alertmanager.
Note
Alternatively, you can go to the same page through the notification drawer. Select the bell icon at the top right of the OpenShift Container Platform web console and choose Configure in the AlertmanagerReceiverNotConfigured alert.
-
Click Create Receiver in the Receivers section of the page.
-
In the Create Receiver form, add a Receiver name and choose a Receiver type from the list.
-
Edit the receiver configuration:
-
For PagerDuty receivers:
-
Choose an integration type and add a PagerDuty integration key.
-
Add the URL of your PagerDuty installation.
-
Click Show advanced configuration if you want to edit the client and incident details or the severity specification.
-
-
For webhook receivers:
-
Add the endpoint to send HTTP POST requests to.
-
Click Show advanced configuration if you want to edit the default option to send resolved alerts to the receiver.
-
-
For email receivers:
-
Add the email address to send notifications to.
-
Add SMTP configuration details, including the address to send notifications from, the smarthost and port number used for sending emails, the hostname of the SMTP server, and authentication details.
Important
Alertmanager requires an external SMTP server to send email alerts. To configure email alert receivers, ensure you have the necessary connection details for an external SMTP server.
-
Select whether TLS is required.
-
Click Show advanced configuration if you want to edit the default option not to send resolved alerts to the receiver or edit the body of email notifications configuration.
-
-
For Slack receivers:
-
Add the URL of the Slack webhook.
-
Add the Slack channel or user name to send notifications to.
-
Select Show advanced configuration if you want to edit the default option not to send resolved alerts to the receiver or edit the icon and username configuration. You can also choose whether to find and link channel names and usernames.
-
-
-
By default, firing alerts with labels that match all of the selectors are sent to the receiver. If you want label values for firing alerts to be matched exactly before they are sent to the receiver, perform the following steps:
-
Add routing label names and values in the Routing labels section of the form.
-
Click Add label to add further routing labels.
-
-
Click Create to create the receiver.
Configuring different alert receivers for default platform alerts and user-defined alerts
You can configure different alert receivers for default platform alerts and user-defined alerts to ensure the following results:
-
All default platform alerts are sent to a receiver owned by the team in charge of these alerts.
-
All user-defined alerts are sent to another receiver so that the team can focus only on platform alerts.
You can achieve this by using the openshift_io_alert_source="platform" label that is added by the Cluster Monitoring Operator to all platform alerts:
-
Use the
openshift_io_alert_source="platform"matcher to match default platform alerts. -
Use the
openshift_io_alert_source!="platform"or'openshift_io_alert_source=""'matcher to match user-defined alerts.
Note
This configuration does not apply if you have enabled a separate instance of Alertmanager dedicated to user-defined alerts.