Manually scaling control plane machines
When installing a cluster on bare-metal infrastructure, you can manually scale up to 4 or 5 control plane nodes for your cluster. Consider this use case in situations where you need to recover your cluster from a degraded state, perform deep-level debugging, or ensure stability and security of the control planes in complex scenarios.
Important
Red Hat supports a cluster that has 4 or 5 control plane nodes only on bare-metal infrastructure.
Adding a control plane node to your cluster
When installing a cluster on bare-metal infrastructure, you can manually scale up to 4 or 5 control plane nodes for your cluster. The example in the procedure uses node-5 as the new control plane node.
-
You have installed a healthy cluster with at least three control plane nodes.
-
You have created a single control plane node that you intend to add to your cluster as a postinstalltion task.
-
Retrieve pending Certificate Signing Requests (CSRs) for the new control plane node by entering the following command:
$ oc get csr | grep Pending -
Approve all pending CSRs for the control plane node by entering the following command:
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approveImportant
You must approve the CSRs to complete the installation.
-
Confirm that the control plane node is in the
Readystatus by entering the following command:$ oc get nodesNote
On installer-provisioned infrastructure, the etcd Operator relies on the Machine API to manage the control plane and ensure etcd quorum. The Machine API then uses
MachineCRs to represent and manage the underlying control plane nodes. -
Create the
BareMetalHostandMachineCRs and link them to theNodeCR of the control plane node.-
Create the
BareMetalHostCR with a unique.metadata.namevalue as demonstrated in the following example:apiVersion: metal3.io/v1alpha1 kind: BareMetalHost metadata: name: node-5 namespace: openshift-machine-api spec: automatedCleaningMode: metadata bootMACAddress: 00:00:00:00:00:02 bootMode: UEFI customDeploy: method: install_coreos externallyProvisioned: true online: true userData: name: master-user-data-managed namespace: openshift-machine-api # ... -
Apply the
BareMetalHostCR by entering the following command:$ oc apply -f <filename>- Replace <filename> with the name of the
BareMetalHostCR.
- Replace <filename> with the name of the
-
Create the
MachineCR by using the unique.metadata.namevalue as demonstrated in the following example:apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: annotations: machine.openshift.io/instance-state: externally provisioned metal3.io/BareMetalHost: openshift-machine-api/node-5 finalizers: - machine.machine.openshift.io labels: machine.openshift.io/cluster-api-cluster: <cluster_name> machine.openshift.io/cluster-api-machine-role: master machine.openshift.io/cluster-api-machine-type: master name: node-5 namespace: openshift-machine-api spec: metadata: {} providerSpec: value: apiVersion: baremetal.cluster.k8s.io/v1alpha1 customDeploy: method: install_coreos hostSelector: {} image: checksum: "" url: "" kind: BareMetalMachineProviderSpec metadata: creationTimestamp: null userData: name: master-user-data-managed # ...- Replace
<cluster_name>with the name of the specific cluster, for example,test-day2-1-6qv96.
- Replace
-
Get the cluster name by running the following command:
$ oc get infrastructure cluster -o=jsonpath='{.status.infrastructureName}{"\n"}' -
Apply the
MachineCR by entering the following command:$ oc apply -f <filename>- Replace
<filename>with the name of theMachineCR.
- Replace
-
Link
BareMetalHost,Machine, andNodeobjects by running thelink-machine-and-node.shscript:-
Copy the following
link-machine-and-node.shscript to a local machine:#!/bin/bash # Credit goes to # https://bugzilla.redhat.com/show_bug.cgi?id=1801238. # This script will link Machine object # and Node object. This is needed # in order to have IP address of # the Node present in the status of the Machine. set -e machine="$1" node="$2" if [ -z "$machine" ] || [ -z "$node" ]; then echo "Usage: $0 MACHINE NODE" exit 1 fi node_name=$(echo "${node}" | cut -f2 -d':') oc proxy & proxy_pid=$! function kill_proxy { kill $proxy_pid } trap kill_proxy EXIT SIGINT HOST_PROXY_API_PATH="http://localhost:8001/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts" function print_nics() { local ips local eob declare -a ips readarray -t ips < <(echo "${1}" \ | jq '.[] | select(. | .type == "InternalIP") | .address' \ | sed 's/"//g') eob=',' for (( i=0; i<${#ips[@]}; i++ )); do if [ $((i+1)) -eq ${#ips[@]} ]; then eob="" fi cat <<- EOF { "ip": "${ips[$i]}", "mac": "00:00:00:00:00:00", "model": "unknown", "speedGbps": 10, "vlanId": 0, "pxe": true, "name": "eth1" }${eob} EOF done } function wait_for_json() { local name local url local curl_opts local timeout local start_time local curr_time local time_diff name="$1" url="$2" timeout="$3" shift 3 curl_opts="$@" echo -n "Waiting for $name to respond" start_time=$(date +%s) until curl -g -X GET "$url" "${curl_opts[@]}" 2> /dev/null | jq '.' 2> /dev/null > /dev/null; do echo -n "." curr_time=$(date +%s) time_diff=$((curr_time - start_time)) if [[ $time_diff -gt $timeout ]]; then printf '\nTimed out waiting for %s' "${name}" return 1 fi sleep 5 done echo " Success!" return 0 } wait_for_json oc_proxy "${HOST_PROXY_API_PATH}" 10 -H "Accept: application/json" -H "Content-Type: application/json" addresses=$(oc get node -n openshift-machine-api "${node_name}" -o json | jq -c '.status.addresses') machine_data=$(oc get machines.machine.openshift.io -n openshift-machine-api -o json "${machine}") host=$(echo "$machine_data" | jq '.metadata.annotations["metal3.io/BareMetalHost"]' | cut -f2 -d/ | sed 's/"//g') if [ -z "$host" ]; then echo "Machine $machine is not linked to a host yet." 1>&2 exit 1 fi # The address structure on the host doesn't match the node, so extract # the values we want into separate variables so we can build the patch # we need. hostname=$(echo "${addresses}" | jq '.[] | select(. | .type == "Hostname") | .address' | sed 's/"//g') set +e read -r -d '' host_patch << EOF { "status": { "hardware": { "hostname": "${hostname}", "nics": [ $(print_nics "${addresses}") ], "systemVendor": { "manufacturer": "Red Hat", "productName": "product name", "serialNumber": "" }, "firmware": { "bios": { "date": "04/01/2014", "vendor": "SeaBIOS", "version": "1.11.0-2.el7" } }, "ramMebibytes": 0, "storage": [], "cpu": { "arch": "x86_64", "model": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz", "clockMegahertz": 2199.998, "count": 4, "flags": [] } } } } EOF set -e echo "PATCHING HOST" echo "${host_patch}" | jq . curl -s \ -X PATCH \ "${HOST_PROXY_API_PATH}/${host}/status" \ -H "Content-type: application/merge-patch+json" \ -d "${host_patch}" oc get baremetalhost -n openshift-machine-api -o yaml "${host}" -
Make the script executable by entering the following command:
$ chmod +x link-machine-and-node.sh -
Run the script by entering the following command:
$ bash link-machine-and-node.sh node-5 node-5Note
The first
node-5instance represents the machine, and the second instance represents the node.
-
-
-
Confirm members of etcd by executing into one of the pre-existing control plane nodes:
-
Open a remote shell session to the control plane node by entering the following command:
$ oc rsh -n openshift-etcd etcd-node-0 -
List etcd members:
# etcdctl member list -w table
-
-
Check the etcd Operator configuration process until completion by entering the following command. Expected output shows
Falseunder thePROGRESSINGcolumn.$ oc get clusteroperator etcd -
Confirm etcd health by running the following commands:
-
Open a remote shell session to the control plane node:
$ oc rsh -n openshift-etcd etcd-node-0 -
Check endpoint health. Expected output shows
is healthyfor the endpoint.# etcdctl endpoint health
-
-
Verify that all nodes are ready by entering the following command. The expected output shows the
Readystatus beside each node entry.$ oc get nodes -
Verify that the cluster Operators are all available by entering the following command. Expected output lists each Operator and shows the available status as
Truebeside each listed Operator.$ oc get ClusterOperators -
Verify that the cluster version is correct by entering the following command:
$ oc get ClusterVersionExample outputNAME VERSION AVAILABLE PROGRESSING SINCE STATUS version OpenShift Container Platform.5 True False 5h57m Cluster version is OpenShift Container Platform.5