Skip to content

Exposing custom metrics for virtual machines

OpenShift Container Platform includes a preconfigured, preinstalled, and self-updating monitoring stack that provides monitoring for core platform components. This monitoring stack is based on the Prometheus monitoring system. Prometheus is a time-series database and a rule evaluation engine for metrics.

In addition to using the OpenShift Container Platform monitoring stack, you can enable monitoring for user-defined projects by using the CLI and query custom metrics that are exposed for virtual machines through the node-exporter service.

Configuring the node exporter service

The node-exporter agent is deployed on every virtual machine in the cluster from which you want to collect metrics. Configure the node-exporter agent as a service to expose internal metrics and processes that are associated with virtual machines.

Prerequisites
  • Install the OpenShift CLI (oc).

  • Log in to the cluster as a user with cluster-admin privileges.

  • Create the cluster-monitoring-config ConfigMap object in the openshift-monitoring project.

  • Configure the user-workload-monitoring-config ConfigMap object in the openshift-user-workload-monitoring project by setting enableUserWorkload to true.

Procedure
  1. Create the Service YAML file. In the following example, the file is called node-exporter-service.yaml.

    kind: Service
    apiVersion: v1
    metadata:
      name: node-exporter-service 
      namespace: dynamation 
      labels:
        servicetype: metrics 
    spec:
      ports:
        - name: exmet 
          protocol: TCP
          port: 9100 
          targetPort: 9100 
      type: ClusterIP
      selector:
        monitor: metrics 
    1. The node-exporter service that exposes the metrics from the virtual machines.
    2. The namespace where the service is created.
    3. The label for the service. The ServiceMonitor uses this label to match this service.
    4. The name given to the port that exposes metrics on port 9100 for the ClusterIP service.
    5. The target port used by node-exporter-service to listen for requests.
    6. The TCP port number of the virtual machine that is configured with the monitor label.
    7. The label used to match the virtual machine’s pods. In this example, any virtual machine’s pod with the label monitor and a value of metrics will be matched.
  2. Create the node-exporter service:

    $ oc create -f node-exporter-service.yaml

Configuring a virtual machine with the node exporter service

Download the node-exporter file on to the virtual machine. Then, create a systemd service that runs the node-exporter service when the virtual machine boots.

Prerequisites
  • The pods for the component are running in the openshift-user-workload-monitoring project.

  • Grant the monitoring-edit role to users who need to monitor this user-defined project.

Procedure
  1. Log on to the virtual machine.

  2. Download the node-exporter file on to the virtual machine by using the directory path that applies to the version of node-exporter file.

    $ wget https://github.com/prometheus/node_exporter/releases/download/<version>/node_exporter-<version>.linux-<architecture>.tar.gz
  3. Extract the executable and place it in the /usr/bin directory.

    $ sudo tar xvf node_exporter-<version>.linux-<architecture>.tar.gz \
        --directory /usr/bin --strip 1 "*/node_exporter"
  4. Create a node_exporter.service file in this directory path: /etc/systemd/system. This systemd service file runs the node-exporter service when the virtual machine reboots.

    [Unit]
    Description=Prometheus Metrics Exporter
    After=network.target
    StartLimitIntervalSec=0
    
    [Service]
    Type=simple
    Restart=always
    RestartSec=1
    User=root
    ExecStart=/usr/bin/node_exporter
    
    [Install]
    WantedBy=multi-user.target
  5. Enable and start the systemd service.

    $ sudo systemctl enable node_exporter.service
    $ sudo systemctl start node_exporter.service
Verification
  • Verify that the node-exporter agent is reporting metrics from the virtual machine.

    $ curl http://localhost:9100/metrics

    Example output:

    go_gc_duration_seconds{quantile="0"} 1.5244e-05
    go_gc_duration_seconds{quantile="0.25"} 3.0449e-05
    go_gc_duration_seconds{quantile="0.5"} 3.7913e-05

Creating a custom monitoring label for virtual machines

To enable queries to multiple virtual machines from a single service, you can add a custom label in the virtual machine’s YAML file.

Prerequisites
  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

  • Access to the web console for stop and restart a virtual machine.

Procedure
  1. Edit the template spec of your virtual machine configuration file. In this example, the label monitor has the value metrics.

    spec:
      template:
        metadata:
          labels:
            monitor: metrics
  2. Stop and restart the virtual machine to create a new pod with the label name given to the monitor label.

Querying the node-exporter service for metrics

Metrics are exposed for virtual machines through an HTTP service endpoint under the /metrics canonical name. When you query for metrics, Prometheus directly scrapes the metrics from the metrics endpoint exposed by the virtual machines and presents these metrics for viewing.

Prerequisites
  • You have access to the cluster as a user with cluster-admin privileges or the monitoring-edit role.

  • You have enabled monitoring for the user-defined project by configuring the node-exporter service.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Obtain the HTTP service endpoint by specifying the namespace for the service:

    $ oc get service -n <namespace> <node-exporter-service>
  2. To list all available metrics for the node-exporter service, query the metrics resource.

    $ curl http://<172.30.226.162:9100>/metrics | grep -vE "^#|^$"

    Example output:

    node_arp_entries{device="eth0"} 1
    node_boot_time_seconds 1.643153218e+09
    node_context_switches_total 4.4938158e+07
    node_cooling_device_cur_state{name="0",type="Processor"} 0
    node_cooling_device_max_state{name="0",type="Processor"} 0
    node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0
    node_cpu_guest_seconds_total{cpu="0",mode="user"} 0
    node_cpu_seconds_total{cpu="0",mode="idle"} 1.10586485e+06
    node_cpu_seconds_total{cpu="0",mode="iowait"} 37.61
    node_cpu_seconds_total{cpu="0",mode="irq"} 233.91
    node_cpu_seconds_total{cpu="0",mode="nice"} 551.47
    node_cpu_seconds_total{cpu="0",mode="softirq"} 87.3
    node_cpu_seconds_total{cpu="0",mode="steal"} 86.12
    node_cpu_seconds_total{cpu="0",mode="system"} 464.15
    node_cpu_seconds_total{cpu="0",mode="user"} 1075.2
    node_disk_discard_time_seconds_total{device="vda"} 0
    node_disk_discard_time_seconds_total{device="vdb"} 0
    node_disk_discarded_sectors_total{device="vda"} 0
    node_disk_discarded_sectors_total{device="vdb"} 0
    node_disk_discards_completed_total{device="vda"} 0
    node_disk_discards_completed_total{device="vdb"} 0
    node_disk_discards_merged_total{device="vda"} 0
    node_disk_discards_merged_total{device="vdb"} 0
    node_disk_info{device="vda",major="252",minor="0"} 1
    node_disk_info{device="vdb",major="252",minor="16"} 1
    node_disk_io_now{device="vda"} 0
    node_disk_io_now{device="vdb"} 0
    node_disk_io_time_seconds_total{device="vda"} 174
    node_disk_io_time_seconds_total{device="vdb"} 0.054
    node_disk_io_time_weighted_seconds_total{device="vda"} 259.79200000000003
    node_disk_io_time_weighted_seconds_total{device="vdb"} 0.039
    node_disk_read_bytes_total{device="vda"} 3.71867136e+08
    node_disk_read_bytes_total{device="vdb"} 366592
    node_disk_read_time_seconds_total{device="vda"} 19.128
    node_disk_read_time_seconds_total{device="vdb"} 0.039
    node_disk_reads_completed_total{device="vda"} 5619
    node_disk_reads_completed_total{device="vdb"} 96
    node_disk_reads_merged_total{device="vda"} 5
    node_disk_reads_merged_total{device="vdb"} 0
    node_disk_write_time_seconds_total{device="vda"} 240.66400000000002
    node_disk_write_time_seconds_total{device="vdb"} 0
    node_disk_writes_completed_total{device="vda"} 71584
    node_disk_writes_completed_total{device="vdb"} 0
    node_disk_writes_merged_total{device="vda"} 19761
    node_disk_writes_merged_total{device="vdb"} 0
    node_disk_written_bytes_total{device="vda"} 2.007924224e+09
    node_disk_written_bytes_total{device="vdb"} 0

Creating a ServiceMonitor resource for the node exporter service

You can use a Prometheus client library and scrape metrics from the /metrics endpoint to access and view the metrics exposed by the node-exporter service. Use a ServiceMonitor custom resource definition (CRD) to monitor the node exporter service.

Prerequisites
  • You have access to the cluster as a user with cluster-admin privileges or the monitoring-edit role.

  • You have enabled monitoring for the user-defined project by configuring the node-exporter service.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Create a YAML file for the ServiceMonitor resource configuration. In this example, the service monitor matches any service with the label metrics and queries the exmet port every 30 seconds.

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        k8s-app: node-exporter-metrics-monitor
      name: node-exporter-metrics-monitor 
      namespace: dynamation 
    spec:
      endpoints:
      - interval: 30s 
        port: exmet 
        scheme: http
      selector:
        matchLabels:
          servicetype: metrics
    1. The name of the ServiceMonitor.
    2. The namespace where the ServiceMonitor is created.
    3. The interval at which the port will be queried.
    4. The name of the port that is queried every 30 seconds
  2. Create the ServiceMonitor configuration for the node-exporter service.

    $ oc create -f node-exporter-metrics-monitor.yaml

Accessing the node exporter service outside the cluster

You can access the node-exporter service outside the cluster and view the exposed metrics.

Prerequisites
  • You have access to the cluster as a user with cluster-admin privileges or the monitoring-edit role.

  • You have enabled monitoring for the user-defined project by configuring the node-exporter service.

  • You have installed the OpenShift CLI (oc).

Procedure
  1. Expose the node-exporter service.

    $ oc expose service -n <namespace> <node_exporter_service_name>
  2. Obtain the FQDN (Fully Qualified Domain Name) for the route.

    $ oc get route -o=custom-columns=NAME:.metadata.name,DNS:.spec.host

    Example output:

    NAME                    DNS
    node-exporter-service   node-exporter-service-dynamation.apps.cluster.example.org
  3. Use the curl command to display metrics for the node-exporter service.

    $ curl -s http://node-exporter-service-dynamation.apps.cluster.example.org/metrics

    Example output:

    go_gc_duration_seconds{quantile="0"} 1.5382e-05
    go_gc_duration_seconds{quantile="0.25"} 3.1163e-05
    go_gc_duration_seconds{quantile="0.5"} 3.8546e-05
    go_gc_duration_seconds{quantile="0.75"} 4.9139e-05
    go_gc_duration_seconds{quantile="1"} 0.000189423