Monitoring {zero-trust-full}
By default, the SPIRE Server and SPIRE Agent components of the Zero Trust Workload Identity Manager emit metrics. You can configure OpenShift Monitoring to collect these metrics by using the Prometheus Operator format.
Enabling user workload monitoring
You can enable monitoring for user-defined projects by configuring user workload monitoring in the cluster.
-
You have access to the cluster as a user with the
cluster-admincluster role.
-
Create the
cluster-monitoring-config.yamlfile to define and configure theConfigMap:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | enableUserWorkload: true -
Apply the
ConfigMapby running the following command:$ oc apply -f cluster-monitoring-config.yaml
-
Verify that the monitoring components for user workloads are running in the
openshift-user-workload-monitoringnamespace:$ oc -n openshift-user-workload-monitoring get podExample outputNAME READY STATUS RESTARTS AGE prometheus-operator-6cb6bd9588-dtzxq 2/2 Running 0 50s prometheus-user-workload-0 6/6 Running 0 48s prometheus-user-workload-1 6/6 Running 0 48s thanos-ruler-user-workload-0 4/4 Running 0 42s thanos-ruler-user-workload-1 4/4 Running 0 42s
The status of the pods such as prometheus-operator, prometheus-user-workload, and thanos-ruler-user-workload must be Running.
Configuring metrics collection for SPIRE Server by using a ServiceMonitor
To collect custom metrics from the SPIRE Server, create a ServiceMonitor custom resource (CR). This configuration enables the Prometheus Operator to scrape metrics from the default endpoint, which helps you monitor your SPIRE deployment.
The SPIRE Server operand exposes metrics by default on port 9402 at the /metrics endpoint. You can configure metrics collection for the SPIRE Server by creating a ServiceMonitor custom resource (CR) that enables the Prometheus Operator to collect custom metrics.
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have installed the Zero Trust Workload Identity Manager.
-
You have deployed the SPIRE Server operand in the cluster.
-
You have enabled the user workload monitoring.
-
Create the
ServiceMonitorCR:-
Create the YAML file that defines the
ServiceMonitorCR:Exampleservicemonitor-spire-serverfileapiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app.kubernetes.io/name: server app.kubernetes.io/instance: spire name: spire-server-metrics namespace: zero-trust-workload-identity-manager spec: endpoints: - port: metrics interval: 30s path: /metrics selector: matchLabels: app.kubernetes.io/name: server app.kubernetes.io/instance: spire namespaceSelector: matchNames: - zero-trust-workload-identity-manager -
Create the
ServiceMonitorCR by running the following command:$ oc create -f servicemonitor-spire-server.yamlAfter the
ServiceMonitorCR is created, the user workload Prometheus instance begins metrics collection from the SPIRE Server. The collected metrics are labeled withjob="spire-server".
-
-
In the OpenShift Container Platform web console, navigate to Observe → Targets.
-
In the Label filter field, enter the following label to filter the metrics targets:
$ service=zero-trust-workload-identity-manager-metrics-service -
Confirm that the Status column shows
Upfor thespire-server-metricsentry.
Configuring metrics collection for SPIRE Agent by using a Service Monitor
The SPIRE Agent operand exposes metrics by default on port 9402 at the /metrics endpoint. You can configure metrics collection for the SPIRE Agent by creating a ServiceMonitor custom resource (CR), which enables the Prometheus Operator to collect custom metrics.
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have installed the Zero Trust Workload Identity Manager.
-
You have deployed the SPIRE Agent operand in the cluster.
-
You have enabled the user workload monitoring.
-
Create the
ServiceMonitorCR:-
Create the YAML file that defines the
ServiceMonitorCR:Exampleservicemonitor-spire-agent.yamlfileapiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app.kubernetes.io/name: agent app.kubernetes.io/instance: spire name: spire-agent-metrics namespace: zero-trust-workload-identity-manager spec: endpoints: - port: metrics interval: 30s path: /metrics selector: matchLabels: app.kubernetes.io/name: agent app.kubernetes.io/instance: spire namespaceSelector: matchNames: - zero-trust-workload-identity-manager -
Create the
ServiceMonitorCR by running the following command:$ oc create -f servicemonitor-spire-agent.yamlAfter the
ServiceMonitorCR is created, the user workload Prometheus instance begins metrics collection from the SPIRE Agent. The collected metrics are labeled withjob="spire-agent".
-
-
In the OpenShift Container Platform web console, navigate to Observe → Targets.
-
In the Label filter field, enter the following label to filter the metrics targets:
$ service=spire-agent -
Confirm that the Status column shows
Upfor thespire-agent-metricsentry.
Configuring metrics collection for the Operator by using a ServiceMonitor
The Zero Trust Workload Identity Manager exposes metrics by default on port 8443 at the /metrics service endpoint. You can configure metrics collection for the Operator by creating a ServiceMonitor custom resource (CR) that enables the Prometheus Operator to collect custom metrics. For more information, see "Configuring user workload monitoring".
The SPIRE Server operand exposes metrics by default on port 9402 at the /metrics endpoint. You can configure metrics collection for the SPIRE Server by creating a ServiceMonitor custom resource (CR) that enables the Prometheus Operator to collect custom metrics.
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have installed the Zero Trust Workload Identity Manager.
-
You have enabled the user workload monitoring.
-
Configure the Operator to use HTTP or HTTPS protocols for the metrics server.
-
Update the subscription object for Zero Trust Workload Identity Manager to configure the HTTP protocol by running the following command:
$ oc -n zero-trust-workload-identity-manager patch subscription zero-trust-workload-identity-manager-subscription --type='merge' -p '{"spec":{"config":{"env":[{"name":"METRICS_BIND_ADDRESS","value":":8080"}, {"name": "METRICS_SECURE", "value": "false"}]}}}' -
Verify the Zero Trust Workload Identity Manager pod is redeployed and that the configured values for
METRICS_BIND_ADDRESSandMETRICS_SECUREis updated by running the following command:$ oc set env --list deployment/zero-trust-workload-identity-manager-controller-manager -n zero-trust-workload-identity-manager | grep -e METRICS_BIND_ADDRESS -e METRICS_SECURE -e containerExample outputdeployments/zero-trust-workload-identity-manager-controller-manager, container manager METRICS_BIND_ADDRESS=:8080 METRICS_SECURE=false
-
-
Create the
Secretresource withkubernetes.io/service-account.nameannotation to inject the token required for authenticating with the metrics server.-
Create the
secret-zero-trust-workload-identity-manager.yamlYAML file:apiVersion: v1 kind: Secret metadata: labels: name: zero-trust-workload-identity-manager name: zero-trust-workload-identity-manager-metrics-auth namespace: zero-trust-workload-identity-manager annotations: kubernetes.io/service-account.name: zero-trust-workload-identity-manager-controller-manager type: kubernetes.io/service-account-token -
Create the
Secretresource by running the following command:$ oc apply -f secret-zero-trust-workload-identity-manager.yaml
-
-
Create the
ClusterRoleBindingresource required for granting permissions to access the metrics.-
Create the
clusterrolebinding-zero-trust-workload-identity-manager.yamlYAML file:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: name: zero-trust-workload-identity-manager name: zero-trust-workload-identity-manager-allow-metrics-access roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: zero-trust-workload-identity-manager-metrics-reader subjects: - kind: ServiceAccount name: zero-trust-workload-identity-manager-controller-manager namespace: zero-trust-workload-identity-manager -
Create the
ClusterRoleBindingresource by running the following command:$ oc apply -f clusterrolebinding-zero-trust-workload-identity-manager.yaml
-
-
Create the following
ServiceMonitorCR if the metrics server is configured to usehttp.-
Create the
servicemonitor-zero-trust-workload-identity-manager-http.yamlYAML file:apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: name: zero-trust-workload-identity-manager name: zero-trust-workload-identity-manager-metrics-monitor namespace: zero-trust-workload-identity-manager spec: endpoints: - authorization: credentials: name: zero-trust-workload-identity-manager-metrics-auth key: token type: Bearer interval: 60s path: /metrics port: metrics-http scheme: http scrapeTimeout: 30s namespaceSelector: matchNames: - zero-trust-workload-identity-manager selector: matchLabels: name: zero-trust-workload-identity-manager -
Create the
ServiceMonitorCR by running the following command:$ oc apply -f servicemonitor-zero-trust-workload-identity-manager-http.yaml
-
-
Create the following
ServiceMonitorCR if the metrics server is configured to usehttps.-
Create the
servicemonitor-zero-trust-workload-identity-manager-https.yamlYAML file:apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: name: zero-trust-workload-identity-manager name: zero-trust-workload-identity-manager-metrics-monitor namespace: zero-trust-workload-identity-manager spec: endpoints: - authorization: credentials: name: zero-trust-workload-identity-manager-metrics-auth key: token type: Bearer interval: 60s path: /metrics port: metrics-https scheme: https scrapeTimeout: 30s tlsConfig: ca: configMap: name: openshift-service-ca.crt key: service-ca.crt serverName: zero-trust-workload-identity-manager-metrics-service.zero-trust-workload-identity-manager.svc.cluster.local namespaceSelector: matchNames: - zero-trust-workload-identity-manager selector: matchLabels: name: zero-trust-workload-identity-manager -
Create the
ServiceMonitorCR by running the following command:$ oc apply -f servicemonitor-zero-trust-workload-identity-manager-https.yamlAfter the
ServiceMonitorCR is created, the user workload Prometheus instance begins metrics collection from the SPIRE Server. The collected metrics are labeled withjob="zero-trust-workload-identity-manager-metrics-service".
-
-
In the OpenShift Container Platform web console, navigate to Observe → Targets.
-
In the Label filter field, enter the following label to filter the metrics targets:
$ service=zero-trust-workload-identity-manager-metrics-service -
Confirm that the Status column shows
Upfor thezero-trust-workload-identity-managerentry.
Querying metrics for the Zero Trust Workload Identity Manager
As a cluster administrator, or as a user with view access to all namespaces, you can query SPIRE Agent and SPIRE Server metrics by using the OpenShift Container Platform web console or the command line. The query retrieves all the metrics collected from the SPIRE components that match the specified job labels.
-
You have access to the cluster as a user with the
cluster-adminrole. -
You have installed the Zero Trust Workload Identity Manager.
-
You have deployed the SPIRE Server and SPIRE Agent operands in the cluster.
-
You have enabled monitoring and metrics collection by creating
ServiceMonitorobjects.
-
In the OpenShift Container Platform web console, navigate to Observe → Metrics.
-
In the query field, enter the following PromQL expression to query SPIRE Server metrics:
{job="spire-server"} -
In the query field, enter the following PromQL expression to query SPIRE Agent metrics.
{job="spire-agent"}
Zero Trust Workload Identity Manager monitoring available metrics
Monitor the health and performance of Zero Trust Workload Identity Manager components by reviewing exposed metrics. This reference describes controller, certificate, and runtime metrics that help you maintain system health and troubleshoot errors.
The Zero Trust Workload Identity Manager exposes the following metrics:
- Controller runtime metrics
-
-
controller_runtime_active_workers: Number of currently used workers per controller -
controller_runtime_max_concurrent_reconciles: Maximum number of concurrent reconciles per controller -
controller_runtime_reconcile_errors_total: Total number of reconciliation errors per controller -
controller_runtime_reconcile_time_seconds: Length of time per reconciliation per controller -
controller_runtime_reconcile_total: Total number of reconciliations per controller
-
- Certificate watcher metrics
-
-
certwatcher_read_certificate_errors_total: Total number of certificate read errors -
certwatcher_read_certificate_total: Total number of certificates read
-
- Go runtime metrics
-
Standard Go runtime metrics including:
-
go_gc_duration_seconds: Garbage collection duration -
go_goroutines: Number of goroutines -
go_memstats_*: Memory statistics -
process_*: Process statistics
-
- Custom Operator metrics
-
The operator also exposes custom metrics related to:
-
SPIRE Server status and health
-
SPIRE Agent deployment status
-
SPIFFE CSI Driver status
-
OIDC Discovery Provider status
-
Workload identity management operations
-