Configuring the Distributed Tracing Platform
The Tempo Operator uses a custom resource definition (CRD) file that defines the architecture and configuration settings for creating and deploying the Distributed Tracing Platform resources. You can install the default configuration or modify the file.
Introduction to TempoStack configuration parameters
The TempoStack custom resource (CR) defines the architecture and settings for creating the Distributed Tracing Platform resources. You can modify these parameters to customize your implementation to your business needs.
TempoStack CRapiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
name: <name>
spec:
storage: {}
resources: {}
replicationFactor: 1
retention:
global:
traces: 48h
perTenant: {}
template:
distributor: {}
ingester: {}
compactor: {}
querier: {}
queryFrontend: {}
gateway: {}
limits:
global:
ingestion: {}
query: {}
observability:
grafana: {}
metrics: {}
tracing: {}
search: {}
managementState: managed
- API version to use when creating the object.
- Defines the kind of Kubernetes object to create.
- Data that uniquely identifies the object, including a
namestring,UID, and optionalnamespace. OpenShift Container Platform automatically generates theUIDand completes thenamespacewith the name of the project where the object is created. - Name of the TempoStack instance.
- Contains all of the configuration parameters of the TempoStack instance. When a common definition for all Tempo components is required, define it in the
specsection. When the definition relates to an individual component, place it in thespec.template.<component>section. - Storage is specified at instance deployment. See the installation page for information about storage options for the instance.
- Defines the compute resources for the Tempo container.
- Integer value for the number of ingesters that must acknowledge the data from the distributors before accepting a span.
- Configuration options for retention of traces. The default value is
48h. - Configuration options for the Tempo
distributorcomponent. - Configuration options for the Tempo
ingestercomponent. - Configuration options for the Tempo
compactorcomponent. - Configuration options for the Tempo
queriercomponent. - Configuration options for the Tempo
query-frontendcomponent. - Configuration options for the Tempo
gatewaycomponent. - Limits ingestion and query rates.
- Defines ingestion rate limits.
- Defines query rate limits.
- Configures operands to handle telemetry data.
- Configures search capabilities.
- Defines whether or not this CR is managed by the Operator. The default value is
managed.
| Parameter | Description | Values | Default value |
|---|---|---|---|
|
API version to use when creating the object. |
|
|
|
Defines the kind of the Kubernetes object to create. |
|
|
|
Data that uniquely identifies the object, including a |
OpenShift Container Platform automatically generates the |
|
|
Name for the object. |
Name of your TempoStack instance. |
|
|
Specification for the object to be created. |
Contains all of the configuration parameters for your TempoStack instance. When a common definition for all Tempo components is required, it is defined under the |
N/A |
|
Resources assigned to the TempoStack instance. |
||
|
Storage size for ingester PVCs. |
||
|
Configuration for the replication factor. |
||
|
Configuration options for retention of traces. |
||
|
Configuration options that define the storage. |
||
|
Configuration options for the Tempo distributor. |
||
|
Configuration options for the Tempo ingester. |
||
|
Configuration options for the Tempo compactor. |
||
|
Configuration options for the Tempo querier. |
||
|
Configuration options for the Tempo query frontend. |
||
|
Configuration options for the Tempo gateway. |
Query configuration options
Two components of the Distributed Tracing Platform, the querier and query frontend, manage queries. You can configure both of these components.
The querier component finds the requested trace ID in the ingesters or back-end storage. Depending on the set parameters, the querier component can query both the ingesters and pull bloom or indexes from the back end to search blocks in object storage. The querier component exposes an HTTP endpoint at GET /querier/api/traces/<trace_id>, but it is not expected to be used directly. Queries must be sent to the query frontend.
| Parameter | Description | Values |
|---|---|---|
|
The simple form of the node-selection constraint. |
type: object |
|
The number of replicas to be created for the component. |
type: integer; format: int32 |
|
Component-specific pod tolerations. |
type: array |
The query frontend component is responsible for sharding the search space for an incoming query. The query frontend exposes traces via a simple HTTP endpoint: GET /api/traces/<trace_id>. Internally, the query frontend component splits the blockID space into a configurable number of shards and then queues these requests. The querier component connects to the query frontend component via a streaming gRPC connection to process these sharded queries.
| Parameter | Description | Values |
|---|---|---|
|
Configuration of the query frontend component. |
type: object |
|
The simple form of the node selection constraint. |
type: object |
|
The number of replicas to be created for the query frontend component. |
type: integer; format: int32 |
|
Pod tolerations specific to the query frontend component. |
type: array |
|
The options specific to the Jaeger Query component. |
type: object |
|
When |
type: boolean |
|
The options for the Jaeger Query ingress. |
type: object |
|
The annotations of the ingress object. |
type: object |
|
The hostname of the ingress object. |
type: string |
|
The name of an IngressClass cluster resource. Defines which ingress controller serves this ingress resource. |
type: string |
|
The options for the OpenShift route. |
type: object |
|
The termination type. The default is |
type: string (enum: insecure, edge, passthrough, reencrypt) |
|
The type of ingress for the Jaeger Query UI. The supported types are |
type: string (enum: ingress, route) |
|
The monitor tab configuration. |
type: object |
|
Enables the monitor tab in the Jaeger console. The |
type: boolean |
|
The endpoint to the Prometheus instance that contains the span rate, error, and duration (RED) metrics. For example, |
type: string |
TempoStack CRapiVersion: tempo.grafana.com/v1alpha1
kind: TempoStack
metadata:
name: simplest
spec:
storage:
secret:
name: minio
type: s3
storageSize: 200M
resources:
total:
limits:
memory: 2Gi
cpu: 2000m
template:
queryFrontend:
jaegerQuery:
enabled: true
ingress:
route:
termination: edge
type: route
Configuring the UI
You can use the distributed tracing UI plugin of the Cluster Observability Operator (COO) as the user interface (UI) for the Red Hat OpenShift Distributed Tracing Platform. For more information about installing and using the distributed tracing UI plugin, see "Distributed tracing UI plugin" in Cluster Observability Operator.
Configuring the Monitor tab in Jaeger UI
You can have the request rate, error, and duration (RED) metrics extracted from traces and visualized through the Jaeger Console in the Monitor tab of the OpenShift Container Platform web console. The metrics are derived from spans in the OpenTelemetry Collector that are scraped from the Collector by Prometheus, which you can deploy in your user-workload monitoring stack. The Jaeger UI queries these metrics from the Prometheus endpoint and visualizes them.
-
You have configured the permissions and tenants for the Distributed Tracing Platform. For more information, see "Configuring the permissions and tenants".
-
In the
OpenTelemetryCollectorcustom resource of the OpenTelemetry Collector, enable the Spanmetrics Connector (spanmetrics), which derives metrics from traces and exports the metrics in the Prometheus format.ExampleOpenTelemetryCollectorcustom resource for span REDapiVersion: opentelemetry.io/v1beta1 kind: OpenTelemetryCollector metadata: name: otel spec: mode: deployment observability: metrics: enableMetrics: true config: | connectors: spanmetrics: metrics_flush_interval: 15s receivers: otlp: protocols: grpc: http: exporters: prometheus: endpoint: 0.0.0.0:8889 add_metric_suffixes: false resource_to_telemetry_conversion: enabled: true otlp: auth: authenticator: bearertokenauth endpoint: tempo-redmetrics-gateway.mynamespace.svc.cluster.local:8090 headers: X-Scope-OrgID: dev tls: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt insecure: false extensions: bearertokenauth: filename: /var/run/secrets/kubernetes.io/serviceaccount/token service: extensions: - bearertokenauth pipelines: traces: receivers: [otlp] exporters: [otlp, spanmetrics] metrics: receivers: [spanmetrics] exporters: [prometheus] # ...- Creates the
ServiceMonitorcustom resource to enable scraping of the Prometheus exporter. - The Spanmetrics connector receives traces and exports metrics.
- The OTLP receiver to receive spans in the OpenTelemetry protocol.
- The Prometheus exporter is used to export metrics in the Prometheus format.
- The resource attributes are dropped by default.
- The Spanmetrics connector is configured as exporter in traces pipeline.
- The Spanmetrics connector is configured as receiver in metrics pipeline.
- Creates the
-
In the
TempoStackcustom resource, enable the Monitor tab and set the Prometheus endpoint to the Thanos querier service to query the data from your user-defined monitoring stack.ExampleTempoStackcustom resource with the enabled Monitor tabapiVersion: tempo.grafana.com/v1alpha1 kind: TempoStack metadata: name: redmetrics spec: storage: secret: name: minio-test type: s3 storageSize: 1Gi tenants: mode: openshift authentication: - tenantName: dev tenantId: "1610b0c3-c509-4592-a256-a1871353dbfa" template: gateway: enabled: true queryFrontend: jaegerQuery: monitorTab: enabled: true prometheusEndpoint: https://thanos-querier.openshift-monitoring.svc.cluster.local:9092 redMetricsNamespace: "" # ...- Enables the monitoring tab in the Jaeger console.
- The service name for Thanos Querier from user-workload monitoring.
- Optional: The metrics namespace on which the Jaeger query retrieves the Prometheus metrics. Include this line only if you are using an OpenTelemetry Collector version earlier than 0.109.0. If you are using an OpenTelemetry Collector version 0.109.0 or later, omit this line.
-
Optional: Use the span RED metrics generated by the
spanmetricsconnector with alerting rules. For example, for alerts about a slow service or to define service level objectives (SLOs), the connector creates aduration_buckethistogram and thecallscounter metric. These metrics have labels that identify the service, API name, operation type, and other attributes.Table 4. Labels of the metrics created in the spanmetricsconnectorLabel Description Values service_nameService name set by the
otel_service_nameenvironment variable.frontendspan_nameName of the operation.
-
/ -
/customer
span_kindIdentifies the server, client, messaging, or internal operation.
-
SPAN_KIND_SERVER -
SPAN_KIND_CLIENT -
SPAN_KIND_PRODUCER -
SPAN_KIND_CONSUMER -
SPAN_KIND_INTERNAL
ExamplePrometheusRulecustom resource that defines an alerting rule for SLO when not serving 95% of requests within 2000ms on the front-end serviceapiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: span-red spec: groups: - name: server-side-latency rules: - alert: SpanREDFrontendAPIRequestLatency expr: histogram_quantile(0.95, sum(rate(duration_bucket{service_name="frontend", span_kind="SPAN_KIND_SERVER"}[5m])) by (le, service_name, span_name)) > 2000 labels: severity: Warning annotations: summary: "High request latency on {{$labels.service_name}} and {{$labels.span_name}}" description: "{{$labels.instance}} has 95th request latency above 2s (current value: {{$value}}s)"- The expression for checking if 95% of the front-end server response time values are below 2000 ms. The time range (
[5m]) must be at least four times the scrape interval and long enough to accommodate a change in the metric.
-
Configuring the receiver TLS
The custom resource of your TempoStack or TempoMonolithic instance supports configuring the TLS for receivers by using user-provided certificates or OpenShift’s service serving certificates.
Receiver TLS configuration for a TempoStack instance
You can provide a TLS certificate in a secret or use the service serving certificates that are generated by OpenShift Container Platform.
-
To provide a TLS certificate in a secret, configure it in the
TempoStackcustom resource.Note
This feature is not supported with the enabled Tempo Gateway.
TLS for receivers and using a user-provided certificate in a secretapiVersion: tempo.grafana.com/v1alpha1 kind: TempoStack # ... spec: # ... template: distributor: tls: enabled: true certName: <tls_secret> caName: <ca_name> # ...- TLS enabled at the Tempo Distributor.
- Secret containing a
tls.keykey andtls.crtcertificate that you apply in advance. - Optional: CA in a config map to enable mutual TLS authentication (mTLS).
-
Alternatively, you can use the service serving certificates that are generated by OpenShift Container Platform.
Note
Mutual TLS authentication (mTLS) is not supported with this feature.
TLS for receivers and using the service serving certificates that are generated by OpenShift Container PlatformapiVersion: tempo.grafana.com/v1alpha1 kind: TempoStack # ... spec: # ... template: distributor: tls: enabled: true # ...- Sufficient configuration for the TLS at the Tempo Distributor.
Receiver TLS configuration for a TempoMonolithic instance
You can provide a TLS certificate in a secret or use the service serving certificates that are generated by OpenShift Container Platform.
-
To provide a TLS certificate in a secret, configure it in the
TempoMonolithiccustom resource.Note
This feature is not supported with the enabled Tempo Gateway.
TLS for receivers and using a user-provided certificate in a secretapiVersion: tempo.grafana.com/v1alpha1 kind: TempoMonolithic # ... spec: # ... ingestion: otlp: grpc: tls: enabled: true certName: <tls_secret> caName: <ca_name> # ...- TLS enabled at the Tempo Distributor.
- Secret containing a
tls.keykey andtls.crtcertificate that you apply in advance. - Optional: CA in a config map to enable mutual TLS authentication (mTLS).
-
Alternatively, you can use the service serving certificates that are generated by OpenShift Container Platform.
Note
Mutual TLS authentication (mTLS) is not supported with this feature.
TLS for receivers and using the service serving certificates that are generated by OpenShift Container PlatformapiVersion: tempo.grafana.com/v1alpha1 kind: TempoMonolithic # ... spec: # ... ingestion: otlp: grpc: tls: enabled: true http: tls: enabled: true # ...- Minimal configuration for the TLS at the Tempo Distributor.
Configuring the query RBAC
As an administrator, you can set up the query role-based access control (RBAC) to filter the span attributes for your users by the namespaces for which you granted them permissions.
Note
When you enable the query RBAC, users can still access traces from all namespaces, and the service.name and k8s.namespace.name attributes are also visible to all users.
-
An active OpenShift CLI (
oc) session by a cluster administrator with thecluster-adminrole.Tip
-
Ensure that your OpenShift CLI (
oc) version is up to date and matches your OpenShift Container Platform version. -
Run
oc login:$ oc login --username=<your_username>
-
-
Enable multitenancy and query RBAC in the
TempoStackcustom resource (CR), for example:apiVersion: tempo.grafana.com/v1alpha1 kind: TempoStack metadata: name: simplest namespace: chainsaw-multitenancy spec: storage: secret: name: minio type: s3 storageSize: 1Gi resources: total: limits: memory: 2Gi cpu: 2000m tenants: mode: openshift authentication: - tenantName: dev tenantId: "1610b0c3-c509-4592-a256-a1871353dbfb" template: gateway: enabled: true rbac: enabled: true queryFrontend: jaegerQuery: enabled: false- Always set to
true. - Always set to
true. - Always set to
false.
- Always set to
-
Create a cluster role and cluster role binding to grant the target users the permissions to access the tenant that you specified in the
TempoStackCR, for example:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: tempo-dev-read rules: - apiGroups: [tempo.grafana.com] resources: [dev] resourceNames: [traces] verbs: [get] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: tempo-dev-read roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: tempo-dev-read subjects: - kind: Group apiGroup: rbac.authorization.k8s.io name: system:authenticated- Tenant name in the
TempoStackCR. - Means all authenticated OpenShift users.
- Tenant name in the
-
Grant the target users the permissions to read attributes for the project. You can do this by running the following command:
$ oc adm policy add-role-to-user view <username> -n <project>
Using taints and tolerations
To schedule the TempoStack pods on dedicated nodes, see How to deploy the different TempoStack components on infra nodes using nodeSelector and tolerations in OpenShift 4.
Configuring monitoring and alerts
The Tempo Operator supports monitoring and alerts about each TempoStack component such as distributor, ingester, and so on, and exposes upgrade and operational metrics about the Operator itself.
Configuring the TempoStack metrics and alerts
You can enable metrics and alerts of TempoStack instances.
-
Monitoring for user-defined projects is enabled in the cluster.
-
To enable metrics of a TempoStack instance, set the
spec.observability.metrics.createServiceMonitorsfield totrue:apiVersion: tempo.grafana.com/v1alpha1 kind: TempoStack metadata: name: <name> spec: observability: metrics: createServiceMonitors: true -
To enable alerts for a TempoStack instance, set the
spec.observability.metrics.createPrometheusRulesfield totrue:apiVersion: tempo.grafana.com/v1alpha1 kind: TempoStack metadata: name: <name> spec: observability: metrics: createPrometheusRules: true
You can use the Administrator view of the web console to verify successful configuration:
-
Go to Observe → Targets, filter for Source: User, and check that ServiceMonitors in the format
tempo-<instance_name>-<component>have the Up status. -
To verify that alerts are set up correctly, go to Observe → Alerting → Alerting rules, filter for Source: User, and check that the Alert rules for the TempoStack instance components are available.
Configuring the Tempo Operator metrics and alerts
When installing the Tempo Operator from the web console, you can select the Enable Operator recommended cluster monitoring on this Namespace checkbox, which enables creating metrics and alerts of the Tempo Operator.
If the checkbox was not selected during installation, you can manually enable metrics and alerts even after installing the Tempo Operator.
-
Add the
openshift.io/cluster-monitoring: "true"label in the project where the Tempo Operator is installed, which isopenshift-tempo-operatorby default.
You can use the Administrator view of the web console to verify successful configuration:
-
Go to Observe → Targets, filter for Source: Platform, and search for
tempo-operator, which must have the Up status. -
To verify that alerts are set up correctly, go to Observe → Alerting → Alerting rules, filter for Source: Platform, and locate the Alert rules for the Tempo Operator.