Exposing custom metrics for virtual machines
OpenShift Container Platform includes a preconfigured, preinstalled, and self-updating monitoring stack that provides monitoring for core platform components. This monitoring stack is based on the Prometheus monitoring system. Prometheus is a time-series database and a rule evaluation engine for metrics.
In addition to using the OpenShift Container Platform monitoring stack, you can enable monitoring for user-defined projects by using the CLI and query custom metrics that are exposed for virtual machines through the node-exporter service.
Configuring the node exporter service
The node-exporter agent is deployed on every virtual machine in the cluster from which you want to collect metrics. Configure the node-exporter agent as a service to expose internal metrics and processes that are associated with virtual machines.
-
Install the OpenShift CLI (
oc). -
Log in to the cluster as a user with
cluster-adminprivileges. -
Create the
cluster-monitoring-configConfigMapobject in theopenshift-monitoringproject. -
Configure the
user-workload-monitoring-configConfigMapobject in theopenshift-user-workload-monitoringproject by settingenableUserWorkloadtotrue.
-
Create the
ServiceYAML file. In the following example, the file is callednode-exporter-service.yaml.kind: Service apiVersion: v1 metadata: name: node-exporter-service namespace: dynamation labels: servicetype: metrics spec: ports: - name: exmet protocol: TCP port: 9100 targetPort: 9100 type: ClusterIP selector: monitor: metrics- The node-exporter service that exposes the metrics from the virtual machines.
- The namespace where the service is created.
- The label for the service. The
ServiceMonitoruses this label to match this service. - The name given to the port that exposes metrics on port 9100 for the
ClusterIPservice. - The target port used by
node-exporter-serviceto listen for requests. - The TCP port number of the virtual machine that is configured with the
monitorlabel. - The label used to match the virtual machine’s pods. In this example, any virtual machine’s pod with the label
monitorand a value ofmetricswill be matched.
-
Create the node-exporter service:
$ oc create -f node-exporter-service.yaml
Configuring a virtual machine with the node exporter service
Download the node-exporter file on to the virtual machine. Then, create a systemd service that runs the node-exporter service when the virtual machine boots.
-
The pods for the component are running in the
openshift-user-workload-monitoringproject. -
Grant the
monitoring-editrole to users who need to monitor this user-defined project.
-
Log on to the virtual machine.
-
Download the
node-exporterfile on to the virtual machine by using the directory path that applies to the version ofnode-exporterfile.$ wget https://github.com/prometheus/node_exporter/releases/download/<version>/node_exporter-<version>.linux-<architecture>.tar.gz -
Extract the executable and place it in the
/usr/bindirectory.$ sudo tar xvf node_exporter-<version>.linux-<architecture>.tar.gz \ --directory /usr/bin --strip 1 "*/node_exporter" -
Create a
node_exporter.servicefile in this directory path:/etc/systemd/system. Thissystemdservice file runs the node-exporter service when the virtual machine reboots.[Unit] Description=Prometheus Metrics Exporter After=network.target StartLimitIntervalSec=0 [Service] Type=simple Restart=always RestartSec=1 User=root ExecStart=/usr/bin/node_exporter [Install] WantedBy=multi-user.target -
Enable and start the
systemdservice.$ sudo systemctl enable node_exporter.service$ sudo systemctl start node_exporter.service
-
Verify that the node-exporter agent is reporting metrics from the virtual machine.
$ curl http://localhost:9100/metricsExample output:
go_gc_duration_seconds{quantile="0"} 1.5244e-05 go_gc_duration_seconds{quantile="0.25"} 3.0449e-05 go_gc_duration_seconds{quantile="0.5"} 3.7913e-05
Creating a custom monitoring label for virtual machines
To enable queries to multiple virtual machines from a single service, you can add a custom label in the virtual machine’s YAML file.
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges. -
Access to the web console for stop and restart a virtual machine.
-
Edit the
templatespec of your virtual machine configuration file. In this example, the labelmonitorhas the valuemetrics.spec: template: metadata: labels: monitor: metrics -
Stop and restart the virtual machine to create a new pod with the label name given to the
monitorlabel.
Querying the node-exporter service for metrics
Metrics are exposed for virtual machines through an HTTP service endpoint under the /metrics canonical name. When you query for metrics, Prometheus directly scrapes the metrics from the metrics endpoint exposed by the virtual machines and presents these metrics for viewing.
-
You have access to the cluster as a user with
cluster-adminprivileges or themonitoring-editrole. -
You have enabled monitoring for the user-defined project by configuring the node-exporter service.
-
You have installed the OpenShift CLI (
oc).
-
Obtain the HTTP service endpoint by specifying the namespace for the service:
$ oc get service -n <namespace> <node-exporter-service> -
To list all available metrics for the node-exporter service, query the
metricsresource.$ curl http://<172.30.226.162:9100>/metrics | grep -vE "^#|^$"Example output:
node_arp_entries{device="eth0"} 1 node_boot_time_seconds 1.643153218e+09 node_context_switches_total 4.4938158e+07 node_cooling_device_cur_state{name="0",type="Processor"} 0 node_cooling_device_max_state{name="0",type="Processor"} 0 node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0 node_cpu_guest_seconds_total{cpu="0",mode="user"} 0 node_cpu_seconds_total{cpu="0",mode="idle"} 1.10586485e+06 node_cpu_seconds_total{cpu="0",mode="iowait"} 37.61 node_cpu_seconds_total{cpu="0",mode="irq"} 233.91 node_cpu_seconds_total{cpu="0",mode="nice"} 551.47 node_cpu_seconds_total{cpu="0",mode="softirq"} 87.3 node_cpu_seconds_total{cpu="0",mode="steal"} 86.12 node_cpu_seconds_total{cpu="0",mode="system"} 464.15 node_cpu_seconds_total{cpu="0",mode="user"} 1075.2 node_disk_discard_time_seconds_total{device="vda"} 0 node_disk_discard_time_seconds_total{device="vdb"} 0 node_disk_discarded_sectors_total{device="vda"} 0 node_disk_discarded_sectors_total{device="vdb"} 0 node_disk_discards_completed_total{device="vda"} 0 node_disk_discards_completed_total{device="vdb"} 0 node_disk_discards_merged_total{device="vda"} 0 node_disk_discards_merged_total{device="vdb"} 0 node_disk_info{device="vda",major="252",minor="0"} 1 node_disk_info{device="vdb",major="252",minor="16"} 1 node_disk_io_now{device="vda"} 0 node_disk_io_now{device="vdb"} 0 node_disk_io_time_seconds_total{device="vda"} 174 node_disk_io_time_seconds_total{device="vdb"} 0.054 node_disk_io_time_weighted_seconds_total{device="vda"} 259.79200000000003 node_disk_io_time_weighted_seconds_total{device="vdb"} 0.039 node_disk_read_bytes_total{device="vda"} 3.71867136e+08 node_disk_read_bytes_total{device="vdb"} 366592 node_disk_read_time_seconds_total{device="vda"} 19.128 node_disk_read_time_seconds_total{device="vdb"} 0.039 node_disk_reads_completed_total{device="vda"} 5619 node_disk_reads_completed_total{device="vdb"} 96 node_disk_reads_merged_total{device="vda"} 5 node_disk_reads_merged_total{device="vdb"} 0 node_disk_write_time_seconds_total{device="vda"} 240.66400000000002 node_disk_write_time_seconds_total{device="vdb"} 0 node_disk_writes_completed_total{device="vda"} 71584 node_disk_writes_completed_total{device="vdb"} 0 node_disk_writes_merged_total{device="vda"} 19761 node_disk_writes_merged_total{device="vdb"} 0 node_disk_written_bytes_total{device="vda"} 2.007924224e+09 node_disk_written_bytes_total{device="vdb"} 0
Creating a ServiceMonitor resource for the node exporter service
You can use a Prometheus client library and scrape metrics from the /metrics endpoint to access and view the metrics exposed by the node-exporter service. Use a ServiceMonitor custom resource definition (CRD) to monitor the node exporter service.
-
You have access to the cluster as a user with
cluster-adminprivileges or themonitoring-editrole. -
You have enabled monitoring for the user-defined project by configuring the node-exporter service.
-
You have installed the OpenShift CLI (
oc).
-
Create a YAML file for the
ServiceMonitorresource configuration. In this example, the service monitor matches any service with the labelmetricsand queries theexmetport every 30 seconds.apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: node-exporter-metrics-monitor name: node-exporter-metrics-monitor namespace: dynamation spec: endpoints: - interval: 30s port: exmet scheme: http selector: matchLabels: servicetype: metrics- The name of the
ServiceMonitor. - The namespace where the
ServiceMonitoris created. - The interval at which the port will be queried.
- The name of the port that is queried every 30 seconds
- The name of the
-
Create the
ServiceMonitorconfiguration for the node-exporter service.$ oc create -f node-exporter-metrics-monitor.yaml
Accessing the node exporter service outside the cluster
You can access the node-exporter service outside the cluster and view the exposed metrics.
-
You have access to the cluster as a user with
cluster-adminprivileges or themonitoring-editrole. -
You have enabled monitoring for the user-defined project by configuring the node-exporter service.
-
You have installed the OpenShift CLI (
oc).
-
Expose the node-exporter service.
$ oc expose service -n <namespace> <node_exporter_service_name> -
Obtain the FQDN (Fully Qualified Domain Name) for the route.
$ oc get route -o=custom-columns=NAME:.metadata.name,DNS:.spec.hostExample output:
NAME DNS node-exporter-service node-exporter-service-dynamation.apps.cluster.example.org -
Use the
curlcommand to display metrics for the node-exporter service.$ curl -s http://node-exporter-service-dynamation.apps.cluster.example.org/metricsExample output:
go_gc_duration_seconds{quantile="0"} 1.5382e-05 go_gc_duration_seconds{quantile="0.25"} 3.1163e-05 go_gc_duration_seconds{quantile="0.5"} 3.8546e-05 go_gc_duration_seconds{quantile="0.75"} 4.9139e-05 go_gc_duration_seconds{quantile="1"} 0.000189423