Configuring the SR-IOV Network Operator
To manage SR-IOV network devices and network attachments in your cluster, use the Single Root I/O Virtualization (SR-IOV) Network Operator.
Configuring the SR-IOV Network Operator
To manage SR-IOV network devices and network attachments in your cluster, configure the Single Root I/O Virtualization (SR-IOV) Network Operator. You can then deploy the Operator components to your cluster.
-
Create a
SriovOperatorConfigcustom resource (CR). The following example creates a file namedsriovOperatorConfig.yaml:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: disableDrain: false enableInjector: true enableOperatorWebhook: true logLevel: 2 featureGates: metricsExporter: false # ...where:
metadata.name-
Specifies the name of the SR-IOV Network Operator instance. The only valid name for the
SriovOperatorConfigresource isdefaultand the name must be in the namespace where the Operator is deployed. spec.enableInjector-
Specifies if any
network-resources-injectorpod can run in the namespace. If not specified in the CR or explicitly set totrue, defaults tofalseor<none>, preventing anynetwork-resources-injectorpod from running in the namespace. The recommended setting istrue. spec.enableOperatorWebhook-
Specifies if any
operator-webhookpods can run in the namespace. TheenableOperatorWebhookfield, if not specified in the CR or explicitly set to true, defaults tofalseor<none>, preventing anyoperator-webhookpod from running in the namespace. The recommended setting istrue.
-
Apply the resource to your cluster by running the following command:
$ oc apply -f sriovOperatorConfig.yaml
SR-IOV Network Operator config custom resource
To customize the SR-IOV Network Operator, configure the sriovoperatorconfig custom resource. The reference lists the parameters available for controlling the global settings and deployment behavior of the Operator.
The following table describes the sriovoperatorconfig CR fields:
| Field | Type | Description |
|---|---|---|
|
|
Specifies the name of the SR-IOV Network Operator instance. The default value is |
|
|
Specifies the namespace of the SR-IOV Network Operator instance. The default value is |
|
|
Specifies the node selection to control scheduling the SR-IOV Network Config Daemon on selected nodes. By default, this field is not set and the Operator deploys the SR-IOV Network Config daemon set on compute nodes. |
|
|
Specifies whether to disable the node draining process or enable the node draining process when you apply a new policy to configure the NIC on a node. Setting this field to |
|
|
Specifies whether to enable or disable the Network Resources Injector daemon set. |
|
|
Specifies whether to enable or disable the Operator Admission Controller webhook daemon set. |
|
|
Specifies the log verbosity level of the Operator. By default, this field is set to |
|
|
Specifies whether to enable or disable the optional features. For example, |
|
|
Specifies whether to enable or disable the SR-IOV Network Operator metrics. By default, this field is set to |
|
|
Specifies whether to reset the firmware on virtual function (VF) changes in the SR-IOV Network Operator. Some chipsets, such as the Intel C740 Series, do not completely power off the PCI-E devices, which is required to configure VFs on NVIDIA/Mellanox NICs. By default, this field is set to Important The For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope. |
About the Network Resources Injector
To automate network configuration for your workloads, use the Network Resources Injector. This Kubernetes Dynamic Admission Controller intercepts pod creation requests to automatically inject the necessary network resources and parameters defined for your cluster.
The Network Resources Injector provides the following capabilities:
-
Mutation of resource requests and limits in a pod specification to add an SR-IOV resource name according to an SR-IOV network attachment definition annotation.
-
Mutation of a pod specification with a Downward API volume to expose pod annotations, labels, and huge pages requests and limits. Containers that run in the pod can access the exposed information as files under the
/etc/podnetinfopath.
The SR-IOV Network Operator enables the Network Resources Injector when the enableInjector is set to true in the SriovOperatorConfig CR. The network-resources-injector pod runs as a daemon set on all control plane nodes. The following is an example of Network Resources Injector pods running in a cluster with three control plane nodes:
$ oc get pods -n openshift-sriov-network-operator
NAME READY STATUS RESTARTS AGE
network-resources-injector-5cz5p 1/1 Running 0 10m
network-resources-injector-dwqpx 1/1 Running 0 10m
network-resources-injector-lktz5 1/1 Running 0 10m
By default, the failurePolicy field in the Network Resources Injector webhook is set to Ignore. This default setting prevents pod creation from being blocked if the webhook is unavailable.
If you set the failurePolicy field to Fail, and the Network Resources Injector webhook is unavailable, the webhook attempts to mutate all pod creation and update requests. This behavior can block pod creation and disrupt normal cluster operations. To prevent such issues, you can enable the featureGates.resourceInjectorMatchCondition feature in the SriovOperatorConfig object to limit the scope of the Network Resources Injector webhook. If this feature is enabled, the webhook applies only to pods with the secondary network annotation k8s.v1.cni.cncf.io/networks.
If you set the failurePolicy field to Fail after enabling the resourceInjectorMatchCondition feature, the webhook applies only to pods with the secondary network annotation k8s.v1.cni.cncf.io/networks. If the webhook is unavailable, the cluster still deploys pods without this annotation; this prevents unnecessary disruptions to cluster operations.
The featureGates.resourceInjectorMatchCondition feature is disabled by default. To enable this feature, set the featureGates.resourceInjectorMatchCondition field to true in the SriovOperatorConfig object.
SriovOperatorConfig object configurationapiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
name: default
namespace: sriov-network-operator
spec:
# ...
featureGates:
resourceInjectorMatchCondition: true
# ...
Disabling or enabling the Network Resources Injector
To control the automatic configuration of your cluster workloads, enable or disable the Network Resources Injector. By adjusting these settings you can better manage whether the Kubernetes Dynamic Admission Controller automatically injects network resources and parameters into pods during their creation.
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges. -
You must have installed the SR-IOV Network Operator.
-
Set the
enableInjectorfield. Replace<value>withfalseto disable the feature ortrueto enable the feature.$ oc patch sriovoperatorconfig default \ --type=merge -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableInjector": <value> } }'Tip
You can alternatively apply the following YAML to update the Operator:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: enableInjector: <value> # ...
About the SR-IOV Network Operator admission controller webhook
To validate network configurations and prevent errors, rely on the SR-IOV Network Operator admission controller webhook. This Kubernetes Dynamic Admission Controller intercepts API requests to ensure that your SR-IOV resource definitions and pod specifications comply with cluster policies.
The SR-IOV Network Operator Admission Controller webhook provides the following capabilities:
-
Validation of the
SriovNetworkNodePolicyCR when it is created or updated. -
Mutation of the
SriovNetworkNodePolicyCR by setting the default value for thepriorityanddeviceTypefields when the CR is created or updated.
The SR-IOV Network Operator Admission Controller webhook is enabled by the Operator when the enableOperatorWebhook is set to true in the SriovOperatorConfig CR. The operator-webhook pod runs as a daemon set on all control plane nodes.
Note
Use caution when disabling the SR-IOV Network Operator Admission Controller webhook. You can disable the webhook under specific circumstances, such as troubleshooting, or if you want to use unsupported devices. For information about configuring unsupported devices, see "Configuring the SR-IOV Network Operator to use an unsupported NIC".
The following is an example of the Operator Admission Controller webhook pods running in a cluster with three control plane nodes:
$ oc get pods -n openshift-sriov-network-operator
NAME READY STATUS RESTARTS AGE
operator-webhook-9jkw6 1/1 Running 0 16m
operator-webhook-kbr5p 1/1 Running 0 16m
operator-webhook-rpfrl 1/1 Running 0 16m
Disabling or enabling the SR-IOV Network Operator admission controller webhook
To manage validation of your network configurations, enable or disable the SR-IOV Network Operator admission controller webhook. When enabled, the Operator automatically verifies SR-IOV resource definitions and pod specifications against cluster policies.
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges. -
You must have installed the SR-IOV Network Operator.
-
Set the
enableOperatorWebhookfield. Replace<value>withfalseto disable the feature ortrueto enable it:$ oc patch sriovoperatorconfig default --type=merge \ -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableOperatorWebhook": <value> } }'Tip
You can alternatively apply the following YAML to update the Operator:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: enableOperatorWebhook: <value> # ...
Configuring a custom NodeSelector for the SR-IOV Network Config daemon
To specify which nodes host the SR-IOV Network Config daemon, configure a custom node selector by using node labels. By completing this task, you can restrict deployment to specific nodes instead of the default compute nodes.
The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, the daemon is deployed to all the compute nodes in the cluster.
Important
When you update the configDaemonNodeSelector field, the SR-IOV Network Config daemon is recreated on each selected node.
While the daemon is recreated, cluster users are unable to apply any new SR-IOV Network node policy or create new SR-IOV pods.
-
To update the node selector for the Operator, enter the following command:
$ oc patch sriovoperatorconfig default --type=json \ -n openshift-sriov-network-operator \ --patch '[{ "op": "replace", "path": "/spec/configDaemonNodeSelector", "value": {<node_label>} }]'Replace
<node_label>with a label to apply as in the following example:"node-role.kubernetes.io/worker": "".Tip
You can alternatively apply the following YAML to update the Operator:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: configDaemonNodeSelector: <node_label> # ...
Configuring the SR-IOV Network Operator for single node installations
By default, the SR-IOV Network Operator drains workloads from a node before every policy change. The Operator performs this action to ensure that no workloads are using the virtual functions before the reconfiguration. As a result, you must configure the Operator to not drain workloads from the single node.
For installations on a single node, other nodes do not receive the workloads.
Important
After performing the following procedure to disable draining workloads, you must remove any workload that uses an SR-IOV network interface before you change any SR-IOV network node policy.
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges. -
You must have installed the SR-IOV Network Operator.
-
To set the
disableDrainfield totrueand theconfigDaemonNodeSelectorfield tonode-role.kubernetes.io/master: "", enter the following command:$ oc patch sriovoperatorconfig default --type=merge -n openshift-sriov-network-operator --patch '{ "spec": { "disableDrain": true, "configDaemonNodeSelector": { "node-role.kubernetes.io/master": "" } } }'Tip
You can alternatively apply the following YAML to update the Operator:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: disableDrain: true configDaemonNodeSelector: node-role.kubernetes.io/master: "" # ...
Deploying the SR-IOV Operator for hosted control planes
After you configure and deploy your hosting service cluster, you can create a subscription to the SR-IOV Operator on a hosted cluster. The SR-IOV pod runs on worker machines rather than the control plane.
You must configure and deploy the hosted cluster on AWS.
-
Create a namespace and an Operator group:
apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator -
Create a subscription to the SR-IOV Operator:
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subsription namespace: openshift-sriov-network-operator spec: channel: stable name: sriov-network-operator config: nodeSelector: node-role.kubernetes.io/worker: "" source: redhat-operators sourceNamespace: openshift-marketplace
-
To verify that the SR-IOV Operator is ready, run the following command and view the resulting output:
$ oc get csv -n openshift-sriov-network-operatorExample outputNAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.19.0-202211021237 SR-IOV Network Operator 4.19.0-202211021237 sriov-network-operator.4.19.0-202210290517 Succeeded -
To verify that the SR-IOV pods are deployed, run the following command:
$ oc get pods -n openshift-sriov-network-operator
About the SR-IOV network metrics exporter
To monitor the networking activity of SR-IOV pods, enable the SR-IOV network metrics exporter. This tool exposes metrics for SR-IOV virtual functions (VFs) in Prometheus format, so that you can query and visualize data through the OpenShift Container Platform web console.
When you query the SR-IOV VF metrics by using the web console, the SR-IOV network metrics exporter fetches and returns the VF network statistics along with the name and namespace of the pod that the VF is attached to.
The following table describes the SR-IOV VF metrics that the metrics exporter reads and exposes in Prometheus format:
| Metric | Description | Example PromQL query to examine the VF metric |
|---|---|---|
|
Received bytes per virtual function. |
|
|
Transmitted bytes per virtual function. |
|
|
Received packets per virtual function. |
|
|
Transmitted packets per virtual function. |
|
|
Dropped packets upon receipt per virtual function. |
|
|
Dropped packets during transmission per virtual function. |
|
|
Received multicast packets per virtual function. |
|
|
Received broadcast packets per virtual function. |
|
|
Virtual functions linked to active pods. |
- |
You can also combine these queries by using the kube-state-metrics tool to get more information about the SR-IOV pods. For example, you can use the following query to get the VF network statistics along with the application name from the standard Kubernetes pod label:
(sriov_vf_tx_packets * on (pciAddr,node) group_left(pod,namespace) sriov_kubepoddevice) * on (pod,namespace) group_left (label_app_kubernetes_io_name) kube_pod_labels
Enabling the SR-IOV network metrics exporter
To enable the SR-IOV network metrics exporter, set the spec.featureGates.metricsExporter field to true. Because the exporter is disabled by default, you must explicitly activate this feature gate to start exposing metrics for your SR-IOV devices.
Important
When the metrics exporter is enabled, the SR-IOV Network Operator deploys the metrics exporter only on nodes with SR-IOV capabilities.
-
You have installed the OpenShift CLI (
oc). -
You have logged in as a user with
cluster-adminprivileges. -
You have installed the SR-IOV Network Operator.
-
Enable cluster monitoring by running the following command:
$ oc label ns/openshift-sriov-network-operator openshift.io/cluster-monitoring=trueTo enable cluster monitoring, you must add the
openshift.io/cluster-monitoring=truelabel in the namespace where you have installed the SR-IOV Network Operator. -
Set the
spec.featureGates.metricsExporterfield totrueby running the following command:$ oc patch -n openshift-sriov-network-operator sriovoperatorconfig/default \ --type='merge' -p='{"spec": {"featureGates": {"metricsExporter": true}}}'
-
Check that the SR-IOV network metrics exporter is enabled by running the following command:
$ oc get pods -n openshift-sriov-network-operatorExample outputNAME READY STATUS RESTARTS AGE operator-webhook-hzfg4 1/1 Running 0 5d22h sriov-network-config-daemon-tr54m 1/1 Running 0 5d22h sriov-network-metrics-exporter-z5d7t 1/1 Running 0 10s sriov-network-operator-cc6fd88bc-9bsmt 1/1 Running 0 5d22hEnsure that
sriov-network-metrics-exporterpod is in theREADYstate. -
Optional: Examine the SR-IOV virtual function (VF) metrics by using the OpenShift Container Platform web console. For more information, see "Querying metrics".