OADP monitoring
By using the OpenShift Container Platform monitoring stack, users and administrators can effectively perform the following tasks:
-
Monitor and manage clusters
-
Analyze the workload performance of user applications
-
Monitor services running on the clusters
-
Receive alerts if an event occurs
OADP monitoring setup
The OADP Operator leverages an OpenShift User Workload Monitoring provided by the OpenShift Monitoring Stack for retrieving metrics from the Velero service endpoint. The monitoring stack allows creating user-defined Alerting Rules or querying metrics by using the OpenShift Metrics query front end.
With enabled User Workload Monitoring, it is possible to configure and use any Prometheus-compatible third-party UI, such as Grafana, to visualize Velero metrics.
Monitoring metrics requires enabling monitoring for the user-defined projects and creating a ServiceMonitor resource to scrape those metrics from the already enabled OADP service endpoint that resides in the openshift-adp namespace.
Note
The OADP support for Prometheus metrics is offered on a best-effort basis and is not fully supported.
For more information about setting up the monitoring stack, see Configuring user workload monitoring.
-
You have access to an OpenShift Container Platform cluster using an account with
cluster-adminpermissions. -
You have created a cluster monitoring config map.
-
Edit the
cluster-monitoring-configConfigMapobject in theopenshift-monitoringnamespace by using the following command:$ oc edit configmap cluster-monitoring-config -n openshift-monitoring -
Add or enable the
enableUserWorkloadoption in thedatasection’sconfig.yamlfield by using the following command:apiVersion: v1 kind: ConfigMap data: config.yaml: | enableUserWorkload: true metadata: # ...- Add this option or set to
true
- Add this option or set to
-
Wait a short period to verify the User Workload Monitoring Setup by checking that the following components are up and running in the
openshift-user-workload-monitoringnamespace:$ oc get pods -n openshift-user-workload-monitoringExample outputNAME READY STATUS RESTARTS AGE prometheus-operator-6844b4b99c-b57j9 2/2 Running 0 43s prometheus-user-workload-0 5/5 Running 0 32s prometheus-user-workload-1 5/5 Running 0 32s thanos-ruler-user-workload-0 3/3 Running 0 32s thanos-ruler-user-workload-1 3/3 Running 0 32s -
Verify the existence of the
user-workload-monitoring-configConfigMap in theopenshift-user-workload-monitoring. If it exists, skip the remaining steps in this procedure.$ oc get configmap user-workload-monitoring-config -n openshift-user-workload-monitoringExample outputError from server (NotFound): configmaps "user-workload-monitoring-config" not found -
Create a
user-workload-monitoring-configConfigMapobject for the User Workload Monitoring, and save it under the2_configure_user_workload_monitoring.yamlfile name:Example outputapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | -
Apply the
2_configure_user_workload_monitoring.yamlfile by using the following command:$ oc apply -f 2_configure_user_workload_monitoring.yaml configmap/user-workload-monitoring-config created
Creating OADP service monitor
OADP provides an openshift-adp-velero-metrics-svc service, which is created when the Data Protection Application (DPA) is configured. The user workload monitoring service monitor must point to the defined service.
To get details about the service, complete the following steps.
-
Ensure that the
openshift-adp-velero-metrics-svcservice exists. It should containapp.kubernetes.io/name=velerolabel, which is used as selector for theServiceMonitorobject.$ oc get svc -n openshift-adp -l app.kubernetes.io/name=veleroExample outputNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE openshift-adp-velero-metrics-svc ClusterIP 172.30.38.244 <none> 8085/TCP 1h -
Create a
ServiceMonitorYAML file that matches the existing service label, and save the file as3_create_oadp_service_monitor.yaml. The service monitor is created in theopenshift-adpnamespace where theopenshift-adp-velero-metrics-svcservice resides.ExampleServiceMonitorobjectapiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: oadp-service-monitor name: oadp-service-monitor namespace: openshift-adp spec: endpoints: - interval: 30s path: /metrics targetPort: 8085 scheme: http selector: matchLabels: app.kubernetes.io/name: "velero" -
Apply the
3_create_oadp_service_monitor.yamlfile:$ oc apply -f 3_create_oadp_service_monitor.yamlExample outputservicemonitor.monitoring.coreos.com/oadp-service-monitor created
-
Confirm that the new service monitor is in an Up state by using the Administrator perspective of the OpenShift Container Platform web console. Wait a few minutes for the service monitor to reach the Up state.
-
Navigate to the Observe → Targets page.
-
Ensure the Filter is unselected or that the User source is selected and type
openshift-adpin theTextsearch field. -
Verify that the status for the Status for the service monitor is Up.
Figure 1. OADP metrics targets
-
Creating an alerting rule
The OpenShift Container Platform monitoring stack receives Alerts configured by using Alerting Rules. To create an Alerting rule for the OADP project, use one of the Metrics scraped with the user workload monitoring.
-
Create a
PrometheusRuleYAML file with the sampleOADPBackupFailingalert and save it as4_create_oadp_alert_rule.yaml:SampleOADPBackupFailingalertapiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: sample-oadp-alert namespace: openshift-adp spec: groups: - name: sample-oadp-backup-alert rules: - alert: OADPBackupFailing annotations: description: 'OADP had {{$value | humanize}} backup failures over the last 2 hours.' summary: OADP has issues creating backups expr: | increase(velero_backup_failure_total{job="openshift-adp-velero-metrics-svc"}[2h]) > 0 for: 5m labels: severity: warningIn this sample, the Alert displays under the following conditions:
-
During the last 2 hours, the number of new failing backups was greater than 0 and the state persisted for at least 5 minutes.
-
If the time of the first increase is less than 5 minutes, the Alert is in a
Pendingstate, after which it turns into aFiringstate.
-
-
Apply the
4_create_oadp_alert_rule.yamlfile, which creates thePrometheusRuleobject in theopenshift-adpnamespace:$ oc apply -f 4_create_oadp_alert_rule.yamlExample outputprometheusrule.monitoring.coreos.com/sample-oadp-alert created
-
After the Alert is triggered, you can view it in the following ways:
-
In the Developer perspective, select the Observe menu.
-
In the Administrator perspective under the Observe → Alerting menu, select User in the Filter box. Otherwise, by default only the Platform Alerts are displayed.
Figure 2. OADP backup failing alert
-
List of available metrics
Refer to the following table for a list of Velero metrics provided by OADP together with their Types:
| Metric name | Description | Type |
|---|---|---|
|
Size, in bytes, of a backup |
Gauge |
|
Current number of existent backups |
Gauge |
|
Total number of attempted backups |
Counter |
|
Total number of successful backups |
Counter |
|
Total number of partially failed backups |
Counter |
|
Total number of failed backups |
Counter |
|
Total number of validation failed backups |
Counter |
|
Time taken to complete backup, in seconds |
Histogram |
|
Total count of observations for a bucket in the histogram for the metric |
Counter |
|
Total count of observations for the metric |
Counter |
|
Total sum of observations for the metric |
Counter |
|
Total number of attempted backup deletions |
Counter |
|
Total number of successful backup deletions |
Counter |
|
Total number of failed backup deletions |
Counter |
|
Last time a backup ran successfully, Unix timestamp in seconds |
Gauge |
|
Total number of items backed up |
Gauge |
|
Total number of errors encountered during backup |
Gauge |
|
Total number of warned backups |
Counter |
|
Last status of the backup. A value of 1 is success, 0 is failure |
Gauge |
|
Current number of existent restores |
Gauge |
|
Total number of attempted restores |
Counter |
|
Total number of failed restores failing validations |
Counter |
|
Total number of successful restores |
Counter |
|
Total number of partially failed restores |
Counter |
|
Total number of failed restores |
Counter |
|
Total number of attempted volume snapshots |
Counter |
|
Total number of successful volume snapshots |
Counter |
|
Total number of failed volume snapshots |
Counter |
|
Total number of CSI attempted volume snapshots |
Counter |
|
Total number of CSI successful volume snapshots |
Counter |
|
Total number of CSI failed volume snapshots |
Counter |
Viewing metrics using the Observe UI
You can view metrics in the OpenShift Container Platform web console from the Administrator or Developer perspective, which must have access to the openshift-adp project.
-
Navigate to the Observe → Metrics page:
-
If you are using the Developer perspective, follow these steps:
-
Select Custom query, or click on the Show PromQL link.
-
Type the query and click Enter.
-
-
If you are using the Administrator perspective, type the expression in the text field and select Run Queries.
Figure 3. OADP metrics query
-