Skip to content

Monitoring {zero-trust-full}

By default, the SPIRE Server and SPIRE Agent components of the Zero Trust Workload Identity Manager emit metrics. You can configure OpenShift Monitoring to collect these metrics by using the Prometheus Operator format.

Enabling user workload monitoring

You can enable monitoring for user-defined projects by configuring user workload monitoring in the cluster.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin cluster role.

Procedure
  1. Create the cluster-monitoring-config.yaml file to define and configure the ConfigMap:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        enableUserWorkload: true
  2. Apply the ConfigMap by running the following command:

    $ oc apply -f cluster-monitoring-config.yaml
Verification
  • Verify that the monitoring components for user workloads are running in the openshift-user-workload-monitoring namespace:

    $ oc -n openshift-user-workload-monitoring get pod
    Example output
    NAME                                   READY   STATUS    RESTARTS   AGE
    prometheus-operator-6cb6bd9588-dtzxq   2/2     Running   0          50s
    prometheus-user-workload-0             6/6     Running   0          48s
    prometheus-user-workload-1             6/6     Running   0          48s
    thanos-ruler-user-workload-0           4/4     Running   0          42s
    thanos-ruler-user-workload-1           4/4     Running   0          42s

The status of the pods such as prometheus-operator, prometheus-user-workload, and thanos-ruler-user-workload must be Running.

Configuring metrics collection for SPIRE Server by using a ServiceMonitor

To collect custom metrics from the SPIRE Server, create a ServiceMonitor custom resource (CR). This configuration enables the Prometheus Operator to scrape metrics from the default endpoint, which helps you monitor your SPIRE deployment.

The SPIRE Server operand exposes metrics by default on port 9402 at the /metrics endpoint. You can configure metrics collection for the SPIRE Server by creating a ServiceMonitor custom resource (CR) that enables the Prometheus Operator to collect custom metrics.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have installed the Zero Trust Workload Identity Manager.

  • You have deployed the SPIRE Server operand in the cluster.

  • You have enabled the user workload monitoring.

Procedure
  1. Create the ServiceMonitor CR:

    1. Create the YAML file that defines the ServiceMonitor CR:

      Example servicemonitor-spire-server file
      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
      labels:
        app.kubernetes.io/name: server
        app.kubernetes.io/instance: spire
      name: spire-server-metrics
      namespace: zero-trust-workload-identity-manager
      spec:
      endpoints:
      - port: metrics
        interval: 30s
        path: /metrics
      selector:
        matchLabels:
          app.kubernetes.io/name: server
          app.kubernetes.io/instance: spire
      namespaceSelector:
        matchNames:
        - zero-trust-workload-identity-manager
    2. Create the ServiceMonitor CR by running the following command:

      $ oc create -f servicemonitor-spire-server.yaml

      After the ServiceMonitor CR is created, the user workload Prometheus instance begins metrics collection from the SPIRE Server. The collected metrics are labeled with job="spire-server".

Verification
  1. In the OpenShift Container Platform web console, navigate to ObserveTargets.

  2. In the Label filter field, enter the following label to filter the metrics targets:

    $ service=zero-trust-workload-identity-manager-metrics-service
  3. Confirm that the Status column shows Up for the spire-server-metrics entry.

Configuring metrics collection for SPIRE Agent by using a Service Monitor

The SPIRE Agent operand exposes metrics by default on port 9402 at the /metrics endpoint. You can configure metrics collection for the SPIRE Agent by creating a ServiceMonitor custom resource (CR), which enables the Prometheus Operator to collect custom metrics.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have installed the Zero Trust Workload Identity Manager.

  • You have deployed the SPIRE Agent operand in the cluster.

  • You have enabled the user workload monitoring.

Procedure
  1. Create the ServiceMonitor CR:

    1. Create the YAML file that defines the ServiceMonitor CR:

      Example servicemonitor-spire-agent.yaml file
      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        labels:
          app.kubernetes.io/name: agent
          app.kubernetes.io/instance: spire
        name: spire-agent-metrics
        namespace: zero-trust-workload-identity-manager
      spec:
        endpoints:
        - port: metrics
          interval: 30s
          path: /metrics
        selector:
          matchLabels:
            app.kubernetes.io/name: agent
            app.kubernetes.io/instance: spire
        namespaceSelector:
          matchNames:
          - zero-trust-workload-identity-manager
    2. Create the ServiceMonitor CR by running the following command:

      $ oc create -f servicemonitor-spire-agent.yaml

      After the ServiceMonitor CR is created, the user workload Prometheus instance begins metrics collection from the SPIRE Agent. The collected metrics are labeled with job="spire-agent".

Verification
  1. In the OpenShift Container Platform web console, navigate to ObserveTargets.

  2. In the Label filter field, enter the following label to filter the metrics targets:

    $ service=spire-agent
  3. Confirm that the Status column shows Up for the spire-agent-metrics entry.

Configuring metrics collection for the Operator by using a ServiceMonitor

The Zero Trust Workload Identity Manager exposes metrics by default on port 8443 at the /metrics service endpoint. You can configure metrics collection for the Operator by creating a ServiceMonitor custom resource (CR) that enables the Prometheus Operator to collect custom metrics. For more information, see "Configuring user workload monitoring".

The SPIRE Server operand exposes metrics by default on port 9402 at the /metrics endpoint. You can configure metrics collection for the SPIRE Server by creating a ServiceMonitor custom resource (CR) that enables the Prometheus Operator to collect custom metrics.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin cluster role.

  • You have installed the Zero Trust Workload Identity Manager.

  • You have enabled the user workload monitoring.

Procedure
  1. Configure the Operator to use HTTP or HTTPS protocols for the metrics server.

    1. Update the subscription object for Zero Trust Workload Identity Manager to configure the HTTP protocol by running the following command:

      $ oc -n zero-trust-workload-identity-manager patch subscription zero-trust-workload-identity-manager-subscription --type='merge' -p '{"spec":{"config":{"env":[{"name":"METRICS_BIND_ADDRESS","value":":8080"}, {"name": "METRICS_SECURE", "value": "false"}]}}}'
    2. Verify the Zero Trust Workload Identity Manager pod is redeployed and that the configured values for METRICS_BIND_ADDRESS and METRICS_SECURE is updated by running the following command:

      $ oc set env --list deployment/zero-trust-workload-identity-manager-controller-manager -n zero-trust-workload-identity-manager | grep -e METRICS_BIND_ADDRESS -e METRICS_SECURE -e container
      Example output
      deployments/zero-trust-workload-identity-manager-controller-manager, container manager
      METRICS_BIND_ADDRESS=:8080
      METRICS_SECURE=false
  2. Create the Secret resource with kubernetes.io/service-account.name annotation to inject the token required for authenticating with the metrics server.

    1. Create the secret-zero-trust-workload-identity-manager.yaml YAML file:

      apiVersion: v1
      kind: Secret
      metadata:
       labels:
         name: zero-trust-workload-identity-manager
       name: zero-trust-workload-identity-manager-metrics-auth
       namespace: zero-trust-workload-identity-manager
       annotations:
         kubernetes.io/service-account.name: zero-trust-workload-identity-manager-controller-manager
      type: kubernetes.io/service-account-token
    2. Create the Secret resource by running the following command:

      $ oc apply -f secret-zero-trust-workload-identity-manager.yaml
  3. Create the ClusterRoleBinding resource required for granting permissions to access the metrics.

    1. Create the clusterrolebinding-zero-trust-workload-identity-manager.yaml YAML file:

      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
       labels:
         name: zero-trust-workload-identity-manager
       name: zero-trust-workload-identity-manager-allow-metrics-access
      roleRef:
       apiGroup: rbac.authorization.k8s.io
       kind: ClusterRole
       name: zero-trust-workload-identity-manager-metrics-reader
      subjects:
      - kind: ServiceAccount
        name: zero-trust-workload-identity-manager-controller-manager
        namespace: zero-trust-workload-identity-manager
    2. Create the ClusterRoleBinding resource by running the following command:

      $ oc apply -f clusterrolebinding-zero-trust-workload-identity-manager.yaml
  4. Create the following ServiceMonitor CR if the metrics server is configured to use http.

    1. Create the servicemonitor-zero-trust-workload-identity-manager-http.yaml YAML file:

      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        labels:
          name: zero-trust-workload-identity-manager
        name: zero-trust-workload-identity-manager-metrics-monitor
        namespace: zero-trust-workload-identity-manager
      spec:
        endpoints:
          - authorization:
              credentials:
                name: zero-trust-workload-identity-manager-metrics-auth
                key: token
              type: Bearer
            interval: 60s
            path: /metrics
            port: metrics-http
            scheme: http
            scrapeTimeout: 30s
        namespaceSelector:
          matchNames:
            - zero-trust-workload-identity-manager
        selector:
          matchLabels:
            name: zero-trust-workload-identity-manager
    2. Create the ServiceMonitor CR by running the following command:

      $ oc apply -f servicemonitor-zero-trust-workload-identity-manager-http.yaml
  5. Create the following ServiceMonitor CR if the metrics server is configured to use https.

    1. Create the servicemonitor-zero-trust-workload-identity-manager-https.yaml YAML file:

      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        labels:
          name: zero-trust-workload-identity-manager
        name: zero-trust-workload-identity-manager-metrics-monitor
        namespace: zero-trust-workload-identity-manager
      spec:
        endpoints:
          - authorization:
              credentials:
                name: zero-trust-workload-identity-manager-metrics-auth
                key: token
              type: Bearer
            interval: 60s
            path: /metrics
            port: metrics-https
            scheme: https
            scrapeTimeout: 30s
            tlsConfig:
              ca:
                configMap:
                  name: openshift-service-ca.crt
                  key: service-ca.crt
              serverName: zero-trust-workload-identity-manager-metrics-service.zero-trust-workload-identity-manager.svc.cluster.local
        namespaceSelector:
          matchNames:
            - zero-trust-workload-identity-manager
        selector:
          matchLabels:
            name: zero-trust-workload-identity-manager
    2. Create the ServiceMonitor CR by running the following command:

      $ oc apply -f servicemonitor-zero-trust-workload-identity-manager-https.yaml

      After the ServiceMonitor CR is created, the user workload Prometheus instance begins metrics collection from the SPIRE Server. The collected metrics are labeled with job="zero-trust-workload-identity-manager-metrics-service".

Verification
  1. In the OpenShift Container Platform web console, navigate to ObserveTargets.

  2. In the Label filter field, enter the following label to filter the metrics targets:

    $ service=zero-trust-workload-identity-manager-metrics-service
  3. Confirm that the Status column shows Up for the zero-trust-workload-identity-manager entry.

Querying metrics for the Zero Trust Workload Identity Manager

As a cluster administrator, or as a user with view access to all namespaces, you can query SPIRE Agent and SPIRE Server metrics by using the OpenShift Container Platform web console or the command line. The query retrieves all the metrics collected from the SPIRE components that match the specified job labels.

Prerequisites
  • You have access to the cluster as a user with the cluster-admin role.

  • You have installed the Zero Trust Workload Identity Manager.

  • You have deployed the SPIRE Server and SPIRE Agent operands in the cluster.

  • You have enabled monitoring and metrics collection by creating ServiceMonitor objects.

Procedure
  1. In the OpenShift Container Platform web console, navigate to ObserveMetrics.

  2. In the query field, enter the following PromQL expression to query SPIRE Server metrics:

    {job="spire-server"}
  3. In the query field, enter the following PromQL expression to query SPIRE Agent metrics.

    {job="spire-agent"}
Additional resources

Zero Trust Workload Identity Manager monitoring available metrics

Monitor the health and performance of Zero Trust Workload Identity Manager components by reviewing exposed metrics. This reference describes controller, certificate, and runtime metrics that help you maintain system health and troubleshoot errors.

The Zero Trust Workload Identity Manager exposes the following metrics:

Controller runtime metrics
  • controller_runtime_active_workers: Number of currently used workers per controller

  • controller_runtime_max_concurrent_reconciles: Maximum number of concurrent reconciles per controller

  • controller_runtime_reconcile_errors_total: Total number of reconciliation errors per controller

  • controller_runtime_reconcile_time_seconds: Length of time per reconciliation per controller

  • controller_runtime_reconcile_total: Total number of reconciliations per controller

Certificate watcher metrics
  • certwatcher_read_certificate_errors_total: Total number of certificate read errors

  • certwatcher_read_certificate_total: Total number of certificates read

Go runtime metrics

Standard Go runtime metrics including:

  • go_gc_duration_seconds: Garbage collection duration

  • go_goroutines: Number of goroutines

  • go_memstats_*: Memory statistics

  • process_*: Process statistics

Custom Operator metrics

The operator also exposes custom metrics related to:

  • SPIRE Server status and health

  • SPIRE Agent deployment status

  • SPIFFE CSI Driver status

  • OIDC Discovery Provider status

  • Workload identity management operations