Troubleshooting network issues
Use the following sections to troubleshoot network issues.
How the network interface is selected
For installations on bare metal or with virtual machines that have more than one network interface controller (NIC), the NIC that OpenShift Container Platform uses for communication with the Kubernetes API server is determined by the nodeip-configuration.service service unit that is run by systemd when the node boots. The nodeip-configuration.service selects the IP from the interface associated with the default route.
After the nodeip-configuration.service service determines the correct NIC, the service creates the /etc/systemd/system/kubelet.service.d/20-nodenet.conf file. The 20-nodenet.conf file sets the KUBELET_NODE_IP environment variable to the IP address that the service selected.
When the kubelet service starts, it reads the value of the environment variable from the 20-nodenet.conf file and sets the IP address as the value of the --node-ip kubelet command-line argument. As a result, the kubelet service uses the selected IP address as the node IP address.
If hardware or networking is reconfigured after installation, or if there is a networking layout where the node IP should not come from the default route interface, it is possible for the nodeip-configuration.service service to select a different NIC after a reboot. In some cases, you might be able to detect that a different NIC is selected by reviewing the INTERNAL-IP column in the output from the oc get nodes -o wide command.
If network communication is disrupted or misconfigured because a different NIC is selected, you might receive the following error: EtcdCertSignerControllerDegraded. You can create a hint file that includes the NODEIP_HINT variable to override the default IP selection logic. For more information, see Optional: Overriding the default node IP selection logic.
Optional: Overriding the default node IP selection logic
To override the default IP selection logic, you can create a hint file that includes the NODEIP_HINT variable to override the default IP selection logic. Creating a hint file allows you to select a specific node IP address from the interface in the subnet of the IP address specified in the NODEIP_HINT variable.
For example, if a node has two interfaces, eth0 with an address of 10.0.0.10/24, and eth1 with an address of 192.0.2.5/24, and the default route points to eth0 (10.0.0.10),the node IP address would normally use the 10.0.0.10 IP address.
Users can configure the NODEIP_HINT variable to point at a known IP in the subnet, for example, a subnet gateway such as 192.0.2.1 so that the other subnet, 192.0.2.0/24, is selected. As a result, the 192.0.2.5 IP address on eth1 is used for the node.
The following procedure shows how to override the default node IP selection logic.
-
Add a hint file to your
/etc/default/nodeip-configurationfile, for example:NODEIP_HINT=192.0.2.1Important
-
Do not use the exact IP address of a node as a hint, for example,
192.0.2.5. Using the exact IP address of a node causes the node using the hint IP address to fail to configure correctly. -
The IP address in the hint file is only used to determine the correct subnet. It will not receive traffic as a result of appearing in the hint file.
-
-
Generate the
base-64encoded content by running the following command:$ echo -n 'NODEIP_HINT=192.0.2.1' | base64 -w0Example outputTk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx== -
Activate the hint by creating a machine config manifest for both
masterandworkerroles before deploying the cluster:99-nodeip-hint-master.yamlapiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 99-nodeip-hint-master spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,<encoded_content> mode: 0644 overwrite: true path: /etc/default/nodeip-configuration- where
spec.config.storage.files.contents.source.<encoded_content>-
Replace this placeholder with the base64-encoded content of the
/etc/default/nodeip-configurationfile, for example,Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx==. Note that a space is not acceptable after the comma and before the encoded content.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-nodeip-hint-worker
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,<encoded_content>
mode: 0644
overwrite: true
path: /etc/default/nodeip-configuration
+
where:
spec.config.storage.files.contents.source.<encoded_content>:: Replace this placeholder with the base64-encoded content of the /etc/default/nodeip-configuration file, for example, Tk9ERUlQX0hJTlQ9MTkyLjAuMCxxxx==. Note that a space is not acceptable after the comma and before the encoded content.
-
Save the manifest to the directory where you store your cluster configuration, for example,
~/clusterconfigs. -
Deploy the cluster.
Configuring OVN-Kubernetes to use a secondary OVS bridge
You can create an additional or secondary Open vSwitch (OVS) bridge, br-ex1, that OVN-Kubernetes manages and the Multiple External Gateways (MEG) implementation uses for defining external gateways for an OpenShift Container Platform node. You can define a MEG in an AdminPolicyBasedExternalRoute custom resource (CR). The MEG implementation provides a pod with access to multiple gateways, equal-cost multipath (ECMP) routes, and the Bidirectional Forwarding Detection (BFD) implementation.
Consider a use case for pods impacted by the Multiple External Gateways (MEG) feature and you want to egress traffic to a different interface, for example br-ex1, on a node. Egress traffic for pods not impacted by MEG get routed to the default OVS br-ex bridge.
Important
Currently, MEG is unsupported for use with other egress features, such as egress IP, egress firewalls, or egress routers. Attempting to use MEG with egress features like egress IP can result in routing and traffic flow conflicts. This occurs because of how OVN-Kubernetes handles routing and source network address translation (SNAT). This results in inconsistent routing and might break connections in some environments where the return path must patch the incoming path.
You must define the additional bridge in an interface definition of a machine configuration manifest file. The Machine Config Operator uses the manifest to create a new file at /etc/ovnk/extra_bridge on the host. The new file includes the name of the network interface that the additional OVS bridge configures for a node.
Important
Do not use the nmstate API to make configuration changes to the secondary interface that is defined in the /etc/ovnk/extra_bridge directory path. The configure-ovs.sh configuration script creates and manages OVS bridge interfaces, so any interruptive changes to these interfaces by the nmstate API can lead to network configuration instability.
After you create and edit the manifest file, the Machine Config Operator completes tasks in the following order:
-
Drains nodes in singular order based on the selected machine configuration pool.
-
Injects Ignition configuration files into each node, so that each node receives the additional
br-ex1bridge network configuration. -
Verify that the
br-exMAC address matches the MAC address for the interface thatbr-exuses for the network connection. -
Executes the
configure-ovs.shshell script that references the new interface definition. -
Adds
br-exandbr-ex1to the host node. -
Uncordons the nodes.
Note
After all the nodes return to the Ready state and the OVN-Kubernetes Operator detects and configures br-ex and br-ex1, the Operator applies the k8s.ovn.org/l3-gateway-config annotation to each node.
For more information about useful situations for the additional br-ex1 bridge and a situation that always requires the default br-ex bridge, see "Configuration for a localnet topology".
-
Optional: Create an interface connection that your additional bridge,
br-ex1, can use by completing the following steps. The example steps show the creation of a new bond and its dependent interfaces that are all defined in a machine configuration manifest file. The additional bridge uses theMachineConfigobject to form a additional bond interface.Important
Do not use the Kubernetes NMState Operator or a
NodeNetworkConfigurationPolicy(NNCP) manifest file to define the additional interface. Ensure that the additional interface or sub-interfaces when defining abondinterface are not used by an existingbr-exOVN Kubernetes network deployment.You cannot make configuration changes to the
br-exbridge or its underlying interfaces as a postinstallation task. As a workaround, use a secondary network interface connected to your host or switch.-
Create the following interface definition files. These files get added to a machine configuration manifest file so that host nodes can access the definition files.
Example of the first interface definition file that is namedeno1.config[connection] id=eno1 type=ethernet interface-name=eno1 master=bond1 slave-type=bond autoconnect=true autoconnect-priority=20Example of the second interface definition file that is namedeno2.config[connection] id=eno2 type=ethernet interface-name=eno2 master=bond1 slave-type=bond autoconnect=true autoconnect-priority=20Example of the second bond interface definition file that is namedbond1.config[connection] id=bond1 type=bond interface-name=bond1 autoconnect=true connection.autoconnect-slaves=1 autoconnect-priority=20 [bond] mode=802.3ad miimon=100 xmit_hash_policy="layer3+4" [ipv4] method=auto -
Convert the definition files to Base64 encoded strings by running the following command:
$ base64 <directory_path>/en01.config$ base64 <directory_path>/eno2.config$ base64 <directory_path>/bond1.config
-
-
Prepare the environment variables. Replace
<machine_role>with the node role, such asworker, and replace<interface_name>with the name of your additionalbr-exbridge name.$ export ROLE=<machine_role> -
Define each interface definition in a machine configuration manifest file:
Example of a machine configuration file with definitions added forbond1,eno1, anden02apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: ${worker} name: 12-${ROLE}-sec-bridge-cni spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:;base64,<base-64-encoded-contents-for-bond1.conf> path: /etc/NetworkManager/system-connections/bond1.nmconnection filesystem: root mode: 0600 - contents: source: data:;base64,<base-64-encoded-contents-for-eno1.conf> path: /etc/NetworkManager/system-connections/eno1.nmconnection filesystem: root mode: 0600 - contents: source: data:;base64,<base-64-encoded-contents-for-eno2.conf> path: /etc/NetworkManager/system-connections/eno2.nmconnection filesystem: root mode: 0600 # ... -
Create a machine configuration manifest file for configuring the network plugin by entering the following command in your terminal:
$ oc create -f <machine_config_file_name> -
Create an Open vSwitch (OVS) bridge,
br-ex1, on nodes by using the OVN-Kubernetes network plugin to create anextra_bridgefile`. Ensure that you save the file in the/etc/ovnk/extra_bridgepath of the host. The file must state the interface name that supports the additional bridge and not the default interface that supportsbr-ex, which holds the primary IP address of the node.Example configuration for theextra_bridgefile,/etc/ovnk/extra_bridge, that references a additional interfacebond1 -
Create a machine configuration manifest file that defines the existing static interface that hosts
br-ex1on any nodes restarted on your cluster:Example of a machine configuration file that definesbond1as the interface for hostingbr-ex1apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: ${worker} name: 12-worker-extra-bridge spec: config: ignition: version: 3.2.0 storage: files: - path: /etc/ovnk/extra_bridge mode: 0420 overwrite: true contents: source: data:text/plain;charset=utf-8,bond1 filesystem: root -
Apply the machine-configuration to your selected nodes:
$ oc create -f <machine_config_file_name> -
Optional: You can override the
br-exselection logic for nodes by creating a machine configuration file that in turn creates a/var/lib/ovnk/iface_default_hintresource.Note
The resource lists the name of the interface that
br-exselects for your cluster. By default,br-exselects the primary interface for a node based on boot order and the IP address subnet in the machine network. Certain machine network configurations might require thatbr-excontinues to select the default interfaces or bonds for a host node.-
Create a machine configuration file on the host node to override the default interface.
Important
Only create this machine configuration file for the purposes of changing the
br-exselection logic. Using this file to change the IP addresses of existing nodes in your cluster is not supported.Example of a machine configuration file that overrides the default interfaceapiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: ${worker} name: 12-worker-br-ex-override spec: config: ignition: version: 3.2.0 storage: files: - path: /var/lib/ovnk/iface_default_hint mode: 0420 overwrite: true contents: source: data:text/plain;charset=utf-8,bond0 filesystem: rootwhere:
-
Ensure
bond0exists on the node before you apply the machine configuration file to the node.
-
-
Before you apply the configuration to all new nodes in your cluster, reboot the host node to verify that
br-exselects the intended interface and does not conflict with the new interfaces that you defined onbr-ex1. -
Apply the machine configuration file to all new nodes in your cluster:
$ oc create -f <machine_config_file_name>
-
-
Identify the IP addresses of nodes with the
exgw-ip-addresseslabel in your cluster to verify that the nodes use the additional bridge instead of the default bridge:$ oc get nodes -o json | grep --color exgw-ip-addressesExample output"k8s.ovn.org/l3-gateway-config": \"exgw-ip-address\":\"172.xx.xx.yy/24\",\"next-hops\":[\"xx.xx.xx.xx\"], -
Observe that the additional bridge exists on target nodes by reviewing the network interface names on the host node:
$ oc debug node/<node_name> -- chroot /host sh -c "ip a | grep mtu | grep br-ex"Example outputStarting pod/worker-1-debug ... To use host binaries, run `chroot /host` # ... 5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 6: br-ex1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 -
Optional: If you use
/var/lib/ovnk/iface_default_hint, check that the MAC address ofbr-exmatches the MAC address of the primary selected interface:$ oc debug node/<node_name> -- chroot /host sh -c "ip a | grep -A1 -E 'br-ex|bond0'Example output that shows the primary interface forbr-exasbond0Starting pod/worker-1-debug ... To use host binaries, run `chroot /host` # ... sh-5.1# ip a | grep -A1 -E 'br-ex|bond0' 2: bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000 link/ether fa:16:3e:47:99:98 brd ff:ff:ff:ff:ff:ff -- 5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:47:99:98 brd ff:ff:ff:ff:ff:ff inet 10.xx.xx.xx/21 brd 10.xx.xx.255 scope global dynamic noprefixroute br-ex
Troubleshooting Open vSwitch issues
To troubleshoot some Open vSwitch (OVS) issues, you might need to configure the log level to include more information.
If you modify the log level on a node temporarily, be aware that you can receive log messages from the machine config daemon on the node like the following example:
E0514 12:47:17.998892 2281 daemon.go:1350] content mismatch for file /etc/systemd/system/ovs-vswitchd.service: [Unit]
To avoid the log messages related to the mismatch, revert the log level change after you complete your troubleshooting.
Configuring the Open vSwitch log level temporarily
For short-term troubleshooting, you can configure the Open vSwitch (OVS) log level temporarily. The following procedure does not require rebooting the node. In addition, the configuration change does not persist whenever you reboot the node.
After you perform this procedure to change the log level, you can receive log messages from the machine config daemon that indicate a content mismatch for the ovs-vswitchd.service.
To avoid the log messages, repeat this procedure and set the log level to the original value.
-
You have access to the cluster as a user with the
cluster-adminrole. -
You have installed the OpenShift CLI (
oc).
-
Start a debug pod for a node:
$ oc debug node/<node_name> -
Set
/hostas the root directory within the debug shell. The debug pod mounts the root file system from the host in/hostwithin the pod. By changing the root directory to/host, you can run binaries from the host file system:# chroot /host -
View the current syslog level for OVS modules:
# ovs-appctl vlog/listThe following example output shows the log level for syslog set to
info.Example outputconsole syslog file ------- ------ ------ backtrace OFF INFO INFO bfd OFF INFO INFO bond OFF INFO INFO bridge OFF INFO INFO bundle OFF INFO INFO bundles OFF INFO INFO cfm OFF INFO INFO collectors OFF INFO INFO command_line OFF INFO INFO connmgr OFF INFO INFO conntrack OFF INFO INFO conntrack_tp OFF INFO INFO coverage OFF INFO INFO ct_dpif OFF INFO INFO daemon OFF INFO INFO daemon_unix OFF INFO INFO dns_resolve OFF INFO INFO dpdk OFF INFO INFO ... -
Specify the log level in the
/etc/systemd/system/ovs-vswitchd.service.d/10-ovs-vswitchd-restart.conffile:Restart=always ExecStartPre=-/bin/sh -c '/usr/bin/chown -R :$${OVS_USER_ID##*:} /var/lib/openvswitch' ExecStartPre=-/bin/sh -c '/usr/bin/chown -R :$${OVS_USER_ID##*:} /etc/openvswitch' ExecStartPre=-/bin/sh -c '/usr/bin/chown -R :$${OVS_USER_ID##*:} /run/openvswitch' ExecStartPost=-/usr/bin/ovs-appctl vlog/set syslog:dbg ExecReload=-/usr/bin/ovs-appctl vlog/set syslog:dbgIn the preceding example, the log level is set to
dbg. Change the last two lines by settingsyslog:<log_level>tooff,emer,err,warn,info, ordbg. Theofflog level filters out all log messages. -
Restart the service:
# systemctl daemon-reload# systemctl restart ovs-vswitchd
Configuring the Open vSwitch log level permanently
For long-term changes to the Open vSwitch (OVS) log level, you can change the log level permanently.
-
You have access to the cluster as a user with the
cluster-adminrole. -
You have installed the OpenShift CLI (
oc).
-
Create a file, such as
99-change-ovs-loglevel.yaml, with aMachineConfigobject like the following example:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 99-change-ovs-loglevel spec: config: ignition: version: 3.2.0 systemd: units: - dropins: - contents: | [Service] ExecStartPost=-/usr/bin/ovs-appctl vlog/set syslog:dbg ExecReload=-/usr/bin/ovs-appctl vlog/set syslog:dbg name: 20-ovs-vswitchd-restart.conf name: ovs-vswitchd.servicewhere:
-
metadata.labels.machineconfiguration.openshift.io/role:: After you perform this procedure to configure control plane nodes, repeat the procedure and set the role to `workerto configure worker nodes. -
spec.systemmd.units.dropins.contents.ExecStartPost:: Set thesyslog:<log_level>value. Log levels areoff,emer,err,warn,info, ordbg. Setting the value toofffilters out all log messages.
-
-
Apply the machine config:
$ oc apply -f 99-change-ovs-loglevel.yaml
Displaying Open vSwitch logs
Use the following procedure to display Open vSwitch (OVS) logs.
-
You have access to the cluster as a user with the
cluster-adminrole. -
You have installed the OpenShift CLI (
oc).
-
Run one of the following commands:
-
Display the logs by using the
occommand from outside the cluster:$ oc adm node-logs <node_name> -u ovs-vswitchd -
Display the logs after logging on to a node in the cluster:
# journalctl -b -f -u ovs-vswitchd.serviceOne way to log on to a node is by using the
oc debug node/<node_name>command.
-