Troubleshooting the control plane machine set
Use the information in this section to understand and recover from issues you might encounter.
Checking the control plane machine set custom resource state
You can verify the existence and state of the ControlPlaneMachineSet custom resource (CR).
-
Determine the state of the CR by running the following command:
$ oc get controlplanemachineset.machine.openshift.io cluster \ --namespace openshift-machine-api-
A result of
Activeindicates that theControlPlaneMachineSetCR exists and is activated. No administrator action is required. -
A result of
Inactiveindicates that aControlPlaneMachineSetCR exists but is not activated. -
A result of
NotFoundindicates that there is no existingControlPlaneMachineSetCR.
-
To use the control plane machine set, you must ensure that a ControlPlaneMachineSet CR with the correct settings for your cluster exists.
-
If your cluster has an existing CR, you must verify that the configuration in the CR is correct for your cluster.
-
If your cluster does not have an existing CR, you must create one with the correct configuration for your cluster.
Adding a missing Azure internal load balancer
The internalLoadBalancer parameter is required in both the ControlPlaneMachineSet and control plane Machine custom resources (CRs) for Azure. If this parameter is not preconfigured on your cluster, you must add it to both CRs.
For more information about where this parameter is located in the Azure provider specification, see the sample Azure provider specification. The placement in the control plane Machine CR is similar.
-
List the control plane machines in your cluster by running the following command:
$ oc get machines \ -l machine.openshift.io/cluster-api-machine-role==master \ -n openshift-machine-api -
For each control plane machine, edit the CR by running the following command:
$ oc edit machine <control_plane_machine_name> -
Add the
internalLoadBalancerparameter with the correct details for your cluster and save your changes. -
Edit your control plane machine set CR by running the following command:
$ oc edit controlplanemachineset.machine.openshift.io cluster \ -n openshift-machine-api -
Add the
internalLoadBalancerparameter with the correct details for your cluster and save your changes.
-
For clusters that use the default
RollingUpdateupdate strategy, the Operator automatically propagates the changes to your control plane configuration. -
For clusters that are configured to use the
OnDeleteupdate strategy, you must replace your control plane machines manually.
Recovering a degraded etcd Operator
Certain situations can cause the etcd Operator to become degraded.
For example, while performing remediation, the machine health check might delete a control plane machine that is hosting etcd. If the etcd member is not reachable at that time, the etcd Operator becomes degraded.
When the etcd Operator is degraded, manual intervention is required to force the Operator to remove the failed member and restore the cluster state.
-
List the control plane machines in your cluster by running the following command:
$ oc get machines \ -l machine.openshift.io/cluster-api-machine-role==master \ -n openshift-machine-api \ -o wideAny of the following conditions might indicate a failed control plane machine:
-
The
STATEvalue isstopped. -
The
PHASEvalue isFailed. -
The
PHASEvalue isDeletingfor more than ten minutes.
Important
Before continuing, ensure that your cluster has two healthy control plane machines. Performing the actions in this procedure on more than one control plane machine risks losing etcd quorum and can cause data loss.
If you have lost the majority of your control plane hosts, leading to etcd quorum loss, then you must follow the disaster recovery procedure "Restoring to a previous cluster state" instead of this procedure.
-
-
Edit the machine CR for the failed control plane machine by running the following command:
$ oc edit machine <control_plane_machine_name> -
Remove the contents of the
lifecycleHooksparameter from the failed control plane machine and save your changes.The etcd Operator removes the failed machine from the cluster and can then safely add new etcd members.
Upgrading clusters that run on RHOSP
For clusters that run on Red Hat OpenStack Platform (RHOSP) that were created with OpenShift Container Platform 4.13 or earlier, you might have to perform post-upgrade tasks before you can use control plane machine sets.
Configuring RHOSP clusters that have machines with root volume availability zones after an upgrade
For some clusters that run on Red Hat OpenStack Platform (RHOSP) that you upgrade, you must manually update machine resources before you can use control plane machine sets if the following configurations are true:
-
The upgraded cluster was created with OpenShift Container Platform 4.13 or earlier.
-
The cluster infrastructure is installer-provisioned.
-
Machines were distributed across multiple availability zones.
-
Machines were configured to use root volumes for which block storage availability zones were not defined.
To understand why this procedure is necessary, see Solution #7024383.
-
For all control plane machines, edit the provider spec for all control plane machines that match the environment. For example, to edit the machine
master-0, enter the following command:$ oc edit machine/<cluster_id>-master-0 -n openshift-machine-apiwhere:
<cluster_id>-
Specifies the ID of the upgraded cluster.
-
In the provider spec, set the value of the property
rootVolume.availabilityZoneto the volume of the availability zone you want to use.An example RHOSP provider specproviderSpec: value: apiVersion: machine.openshift.io/v1alpha1 availabilityZone: az0 cloudName: openstack cloudsSecret: name: openstack-cloud-credentials namespace: openshift-machine-api flavor: m1.xlarge image: rhcos-4.14 kind: OpenstackProviderSpec metadata: creationTimestamp: null networks: - filter: {} subnets: - filter: name: refarch-lv7q9-nodes tags: openshiftClusterID=refarch-lv7q9 rootVolume: availabilityZone: nova diskSize: 30 sourceUUID: rhcos-4.12 volumeType: fast-0 securityGroups: - filter: {} name: refarch-lv7q9-master serverGroupName: refarch-lv7q9-master serverMetadata: Name: refarch-lv7q9-master openshiftClusterID: refarch-lv7q9 tags: - openshiftClusterID=refarch-lv7q9 trunk: true userDataSecret: name: master-user-data- Set the zone name as this value.
Note
If you edited or recreated machine resources after your initial cluster deployment, you might have to adapt these steps for your configuration.
In your RHOSP cluster, find the availability zone of the root volumes for your machines and use that as the value.
- Set the zone name as this value.
-
Run the following command to retrieve information about the control plane machine set resource:
$ oc describe controlplanemachineset.machine.openshift.io/cluster --namespace openshift-machine-api -
Run the following command to edit the resource:
$ oc edit controlplanemachineset.machine.openshift.io/cluster --namespace openshift-machine-api -
For that resource, set the value of the
spec.stateproperty toActiveto activate control plane machine sets for your cluster.
Your control plane is ready to be managed by the Cluster Control Plane Machine Set Operator.
Configuring RHOSP clusters that have control plane machines with availability zones after an upgrade
For some clusters that run on Red Hat OpenStack Platform (RHOSP) that you upgrade, you must manually update machine resources before you can use control plane machine sets if the following configurations are true:
-
The upgraded cluster was created with OpenShift Container Platform 4.13 or earlier.
-
The cluster infrastructure is installer-provisioned.
-
Control plane machines were distributed across multiple compute availability zones.
To understand why this procedure is necessary, see Solution #7013893.
-
For the
master-1andmaster-2control plane machines, open the provider specs for editing. For example, to edit the first machine, enter the following command:$ oc edit machine/<cluster_id>-master-1 -n openshift-machine-apiwhere:
<cluster_id>-
Specifies the ID of the upgraded cluster.
-
For the
master-1andmaster-2control plane machines, edit the value of theserverGroupNameproperty in their provider specs to match that of the machinemaster-0.An example RHOSP provider specproviderSpec: value: apiVersion: machine.openshift.io/v1alpha1 availabilityZone: az0 cloudName: openstack cloudsSecret: name: openstack-cloud-credentials namespace: openshift-machine-api flavor: m1.xlarge image: rhcos-4.19 kind: OpenstackProviderSpec metadata: creationTimestamp: null networks: - filter: {} subnets: - filter: name: refarch-lv7q9-nodes tags: openshiftClusterID=refarch-lv7q9 securityGroups: - filter: {} name: refarch-lv7q9-master serverGroupName: refarch-lv7q9-master-az0 serverMetadata: Name: refarch-lv7q9-master openshiftClusterID: refarch-lv7q9 tags: - openshiftClusterID=refarch-lv7q9 trunk: true userDataSecret: name: master-user-data- This value must match for machines
master-0,master-1, andmaster-3.Note
If you edited or recreated machine resources after your initial cluster deployment, you might have to adapt these steps for your configuration.
In your RHOSP cluster, find the server group that your control plane instances are in and use that as the value.
- This value must match for machines
-
Run the following command to retrieve information about the control plane machine set resource:
$ oc describe controlplanemachineset.machine.openshift.io/cluster --namespace openshift-machine-api -
Run the following command to edit the resource:
$ oc edit controlplanemachineset.machine.openshift.io/cluster --namespace openshift-machine-api -
For that resource, set the value of the
spec.stateproperty toActiveto activate control plane machine sets for your cluster.
Your control plane is ready to be managed by the Cluster Control Plane Machine Set Operator.