Handling machine configuration for hosted control planes
In a standalone OpenShift Container Platform cluster, a machine config pool manages a set of nodes. You can handle a machine configuration by using the MachineConfigPool custom resource (CR).
Tip
You can reference any machineconfiguration.openshift.io resources in the nodepool.spec.config field of the NodePool CR.
In hosted control planes, the MachineConfigPool CR does not exist. A node pool contains a set of compute nodes. You can handle a machine configuration by using node pools.
You can manage your workloads in your hosted cluster by using the cluster autoscaler.
Note
In OpenShift Container Platform 4.18 or later, the default container runtime for worker nodes is changed from runC to crun.
Configuring node pools for hosted control planes
On hosted control planes, you can configure node pools by creating a MachineConfig object inside of a config map in the management cluster.
-
To create a
MachineConfigobject inside of a config map in the management cluster, enter the following information:apiVersion: v1 kind: ConfigMap metadata: name: <configmap_name> namespace: clusters data: config: | apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: <machineconfig_name> spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:... mode: 420 overwrite: true path: ${PATH}- Sets the path on the node where the
MachineConfigobject is stored.
- Sets the path on the node where the
-
After you add the object to the config map, you can apply the config map to the node pool as follows:
$ oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace>apiVersion: hypershift.openshift.io/v1alpha1 kind: NodePool metadata: # ... name: nodepool-1 namespace: clusters # ... spec: config: - name: <configmap_name> # ...- Replace
<configmap_name>with the name of your config map.
- Replace
Referencing the kubelet configuration in node pools
To reference your kubelet configuration in node pools, you add the kubelet configuration in a config map and then apply the config map in the NodePool resource.
-
Add the kubelet configuration inside of a config map in the management cluster by entering the following information:
ExampleConfigMapobject with the kubelet configurationapiVersion: v1 kind: ConfigMap metadata: name: <configmap_name> namespace: clusters data: config: | apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: <kubeletconfig_name> spec: kubeletConfig: registerWithTaints: - key: "example.sh/unregistered" value: "true" effect: "NoExecute"- Replace
<configmap_name>with the name of your config map. - Replace
<kubeletconfig_name>with the name of theKubeletConfigresource.
- Replace
-
Apply the config map to the node pool by entering the following command:
$ oc edit nodepool <nodepool_name> --namespace clusters- Replace
<nodepool_name>with the name of your node pool.ExampleNodePoolresource configurationapiVersion: hypershift.openshift.io/v1alpha1 kind: NodePool metadata: # ... name: nodepool-1 namespace: clusters # ... spec: config: - name: <configmap_name> # ... - Replace
<configmap_name>with the name of your config map.
- Replace
Configuring node tuning in a hosted cluster
To set node-level tuning on the nodes in your hosted cluster, you can use the Node Tuning Operator. In hosted control planes, you can configure node tuning by creating config maps that contain Tuned objects and referencing those config maps in your node pools.
-
Create a config map that contains a valid tuned manifest, and reference the manifest in a node pool. In the following example, a
Tunedmanifest defines a profile that setsvm.dirty_ratioto 55 on nodes that contain thetuned-1-node-labelnode label with any value. Save the followingConfigMapmanifest in a file namedtuned-1.yaml:apiVersion: v1 kind: ConfigMap metadata: name: tuned-1 namespace: clusters data: tuning: | apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: tuned-1 namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Custom OpenShift profile include=openshift-node [sysctl] vm.dirty_ratio="55" name: tuned-1-profile recommend: - priority: 20 profile: tuned-1-profileNote
If you do not add any labels to an entry in the
spec.recommendsection of the Tuned spec, node-pool-based matching is assumed, so the highest priority profile in thespec.recommendsection is applied to nodes in the pool. Although you can achieve more fine-grained node-label-based matching by setting a label value in the Tuned.spec.recommend.matchsection, node labels will not persist during an upgrade unless you set the.spec.management.upgradeTypevalue of the node pool toInPlace. -
Create the
ConfigMapobject in the management cluster:$ oc --kubeconfig="$MGMT_KUBECONFIG" create -f tuned-1.yaml -
Reference the
ConfigMapobject in thespec.tuningConfigfield of the node pool, either by editing a node pool or creating one. In this example, assume that you have only oneNodePool, namednodepool-1, which contains 2 nodes.apiVersion: hypershift.openshift.io/v1alpha1 kind: NodePool metadata: ... name: nodepool-1 namespace: clusters ... spec: ... tuningConfig: - name: tuned-1 status: ...Note
You can reference the same config map in multiple node pools. In hosted control planes, the Node Tuning Operator appends a hash of the node pool name and namespace to the name of the Tuned CRs to distinguish them. Outside of this case, do not create multiple TuneD profiles of the same name in different Tuned CRs for the same hosted cluster.
Now that you have created the ConfigMap object that contains a Tuned manifest and referenced it in a NodePool, the Node Tuning Operator syncs the Tuned objects into the hosted cluster. You can verify which Tuned objects are defined and which TuneD profiles are applied to each node.
-
List the
Tunedobjects in the hosted cluster:$ oc --kubeconfig="$HC_KUBECONFIG" get tuned.tuned.openshift.io \ -n openshift-cluster-node-tuning-operatorExample outputNAME AGE default 7m36s rendered 7m36s tuned-1 65s -
List the
Profileobjects in the hosted cluster:$ oc --kubeconfig="$HC_KUBECONFIG" get profile.tuned.openshift.io \ -n openshift-cluster-node-tuning-operatorExample outputNAME TUNED APPLIED DEGRADED AGE nodepool-1-worker-1 tuned-1-profile True False 7m43s nodepool-1-worker-2 tuned-1-profile True False 7m14sNote
If no custom profiles are created, the
openshift-nodeprofile is applied by default. -
To confirm that the tuning was applied correctly, start a debug shell on a node and check the sysctl values:
$ oc --kubeconfig="$HC_KUBECONFIG" \ debug node/nodepool-1-worker-1 -- chroot /host sysctl vm.dirty_ratioExample outputvm.dirty_ratio = 55
Deploying the SR-IOV Operator for hosted control planes
After you configure and deploy your hosting service cluster, you can create a subscription to the SR-IOV Operator on a hosted cluster. The SR-IOV pod runs on worker machines rather than the control plane.
You must configure and deploy the hosted cluster on AWS.
-
Create a namespace and an Operator group:
apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator -
Create a subscription to the SR-IOV Operator:
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subsription namespace: openshift-sriov-network-operator spec: channel: stable name: sriov-network-operator config: nodeSelector: node-role.kubernetes.io/worker: "" source: redhat-operators sourceNamespace: openshift-marketplace
-
To verify that the SR-IOV Operator is ready, run the following command and view the resulting output:
$ oc get csv -n openshift-sriov-network-operatorExample outputNAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.19.0-202211021237 SR-IOV Network Operator 4.19.0-202211021237 sriov-network-operator.4.19.0-202210290517 Succeeded -
To verify that the SR-IOV pods are deployed, run the following command:
$ oc get pods -n openshift-sriov-network-operator
Configuring the NTP server for hosted clusters
You can configure the Network Time Protocol (NTP) server for your hosted clusters by using Butane.
-
Create a Butane config file,
99-worker-chrony.bu, that includes the contents of thechrony.conffile. For more information about Butane, see "Creating machine configs with Butane".Example99-worker-chrony.buconfiguration# ... variant: openshift version: 4.19.0 metadata: name: 99-worker-chrony labels: machineconfiguration.openshift.io/role: worker storage: files: - path: /etc/chrony.conf mode: 0644 overwrite: true contents: inline: | pool 0.rhel.pool.ntp.org iburst driftfile /var/lib/chrony/drift makestep 1.0 3 rtcsync logdir /var/log/chrony # ...- Specify an octal value mode for the
modefield in the machine config file. After creating the file and applying the changes, themodefield is converted to a decimal value. - Specify any valid, reachable time source, such as the one provided by your Dynamic Host Configuration Protocol (DHCP) server.
Note
For machine-to-machine communication, the NTP on the User Datagram Protocol (UDP) port is
123. If you configured an external NTP time server, you must open UDP port123.
- Specify an octal value mode for the
-
Use Butane to generate a
MachineConfigobject file,99-worker-chrony.yaml, that contains a configuration that Butane sends to the nodes. Run the following command:$ butane 99-worker-chrony.bu -o 99-worker-chrony.yamlExample99-worker-chrony.yamlconfiguration# Generated by Butane; do not edit apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: <machineconfig_name> spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:... mode: 420 overwrite: true path: /example/path -
Add the contents of the
99-worker-chrony.yamlfile inside of a config map in the management cluster:Example config mapapiVersion: v1 kind: ConfigMap metadata: name: <configmap_name> namespace: <namespace> data: config: | apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: <machineconfig_name> spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:... mode: 420 overwrite: true path: /example/path # ...- Replace
<namespace>with the name of your namespace where you created the node pool, such asclusters.
- Replace
-
Apply the config map to your node pool by running the following command:
$ oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace>ExampleNodePoolconfigurationapiVersion: hypershift.openshift.io/v1alpha1 kind: NodePool metadata: # ... name: nodepool-1 namespace: clusters # ... spec: config: - name: <configmap_name> # ...- Replace
<configmap_name>with the name of your config map.
- Replace
-
Add the list of your NTP servers in the
infra-env.yamlfile, which defines theInfraEnvcustom resource (CR):Exampleinfra-env.yamlfileapiVersion: agent-install.openshift.io/v1beta1 kind: InfraEnv # ... spec: additionalNTPSources: - <ntp_server> - <ntp_server1> - <ntp_server2> # ...- Replace
<ntp_server>with the name of your NTP server. For more details about creating a host inventory and theInfraEnvCR, see "Creating a host inventory".
- Replace
-
Apply the
InfraEnvCR by running the following command:$ oc apply -f infra-env.yaml
-
Check the following fields to know the status of your host inventory:
-
conditions: The standard Kubernetes conditions indicating if the image was created successfully. -
isoDownloadURL: The URL to download the Discovery Image. -
createdTime: The time at which the image was last created. If you modify theInfraEnvCR, ensure that you have updated the timestamp before downloading a new image.Verify that your host inventory is created by running the following command:
$ oc describe infraenv <infraenv_resource_name> -n <infraenv_namespace>Note
If you modify the
InfraEnvCR, confirm that theInfraEnvCR has created a new Discovery Image by looking at thecreatedTimefield. If you already booted hosts, boot them again with the latest Discovery Image.
-
Scaling up and down workloads in a hosted cluster
To scale up and down the workloads in your hosted cluster, you can use the ScaleUpAndScaleDown behavior. The compute nodes scale up when you add workloads and scale down when you delete workloads.
-
You have created the
HostedClusterandNodePoolresources.
-
Enable cluster autoscaling for your hosted cluster by setting the scaling behavior to
ScaleUpAndScaleDown. Run the following command:$ oc patch -n <hosted_cluster_namespace> \ hostedcluster <hosted_cluster_name> \ --type=merge \ --patch='{"spec": {"autoscaling": {"scaling": "ScaleUpAndScaleDown", "maxPodGracePeriod": 60, "scaleDown": {"utilizationThresholdPercent": 50}}}}' -
Remove the
spec.replicasfield from theNodePoolresource to allow cluster autoscaler to manage the node count. Run the following command:$ oc patch -n <hosted_cluster_namespace> \ nodepool <node_pool_name> \ --type=json \ --patch='[{"op": "remove", "path": "/spec/replicas"}]' -
Enable cluster autoscaling to configure the minimum and maximum node counts for your node pools. Run the following command:
$ oc patch -n <hosted_cluster_namespace> \ nodepool <nodepool_name> \ --type=merge --patch='{"spec": {"autoScaling": {"max": 3, "min": 1}}}'
-
To verify that all compute nodes are in the
Readystatus, run the following command:$ oc --kubeconfig <hosted_cluster_name>.kubeconfig get nodes
Scaling up workloads in a hosted cluster
To scale up the workloads in your hosted cluster, you can use the ScaleUpOnly behavior.
-
You have created the
HostedClusterandNodePoolresources.
-
Enable cluster autoscaling for your hosted cluster by setting the scaling behavior to
ScaleUpOnly. Run the following command:$ oc patch -n <hosted_cluster_namespace> hostedcluster <hosted_cluster_name> --type=merge --patch='{"spec": {"autoscaling": {"scaling": "ScaleUpOnly", "maxPodGracePeriod": 60}}}' -
Remove the
spec.replicasfield from theNodePoolresource to allow the cluster autoscaler to manage the node count. Run the following command:$ oc patch -n clusters nodepool <node_pool_name> --type=json --patch='[{"op": "remove", "path": "/spec/replicas"}]' -
Enable cluster autoscaling to configure the minimum and maximum node counts for your node pools. Run the following command:
$ oc patch -n <hosted_cluster_namespace> nodepool <nodepool_name> --type=merge --patch='{"spec": {"autoScaling": {"max": 3, "min": 1}}}'
-
Verify that all compute nodes are in the
Readystatus by running the following command:$ oc --kubeconfig <hosted_cluster_name>.kubeconfig get nodes -
Verify that the compute nodes are scaled up successfully by checking the node count for your node pools. Run the following command:
$ oc --kubeconfig nested.config get nodes -l 'hypershift.openshift.io/nodePool=<node_pool_name>'
Setting the priority expander in a hosted cluster
You can define the priority for your node pools and create high priority machines before low priority machines by using the priority expander in your hosted cluster.
-
You have created the
HostedClusterandNodePoolresources.
-
To define the priority for your node pools, create a config map named
priority-expander-configmap.yamlin your hosted cluster. Node pools with low numbers receive high priority. See the following example configuration:apiVersion: v1 kind: ConfigMap metadata: name: cluster-autoscaler-priority-expander namespace: kube-system # ... data: priorities: |- 10: - ".*<node_pool_name1>.*" 100: - ".*<node_pool_name2>.*" # ... -
Generate the
kubeconfigfile by running the following command:$ hcp create kubeconfig --name <hosted_cluster_name> --namespace <hosted_cluster_namespace> > nested.config -
Create the
ConfigMapobject by running the following command:$ oc --kubeconfig nested.config create -f priority-expander-configmap.yaml -
Enable cluster autoscaling by setting the priority expander for your hosted cluster. Run the following command:
$ oc patch -n <hosted_cluster_namespace> \ hostedcluster <hosted_cluster_name> \ --type=merge \ --patch='{"spec": {"autoscaling": {"scaling": "ScaleUpOnly", "maxPodGracePeriod": 60, "expanders": ["Priority"]}}}' -
Remove the
spec.replicasfield from theNodePoolresource to allow the cluster autoscaler to manage the node count. Run the following command:$ oc patch -n <hosted_cluster_namespace> \ nodepool <node_pool_name> \ --type=json --patch='[{"op": "remove", "path": "/spec/replicas"}]' -
Enable cluster autoscaling to configure the minimum and maximum node counts for your node pools. Run the following command:
$ oc patch -n <hosted_cluster_namespace> \ nodepool <nodepool_name> \ --type=merge --patch='{"spec": {"autoScaling": {"max": 3, "min": 1}}}'
-
After you apply new workloads, verify that the compute nodes associated with the priority node pool are scaled up first. Run the following command to check the status of the compute node:
$ oc --kubeconfig nested.config get nodes -l 'hypershift.openshift.io/nodePool=<node_pool_name>'
Balancing ignored labels in a hosted cluster
After you scale up your node pools, you can use balancingIgnoredLabels to evenly distribute the machines across node pools.
-
You have created the
HostedClusterandNodePoolresources.
-
Add the
node.group.balancing.ignoredlabel to each of the relevant node pool by using the same label value. Run the following command:$ oc patch -n <hosted_cluster_namespace> \ nodepool <node_pool_name> \ --type=merge \ --patch='{"spec": {"nodeLabels": {"node.group.balancing.ignored": "<label_name>"}}}' -
Enable cluster autoscaling for your hosted cluster by running the following command:
$ oc patch -n <hosted_cluster_namespace> \ hostedcluster <hosted_cluster_name> \ --type=merge \ --patch='{"spec": {"autoscaling": {"balancingIgnoredLabels": ["node.group.balancing.ignored"]}}}' -
Remove the
spec.replicasfield from theNodePoolresource to allow the cluster autoscaler to manage the node count. Run the following command:$ oc patch -n <hosted_cluster_namespace> \ nodepool <node_pool_name> \ --type=json \ --patch='[{"op": "remove", "path": "/spec/replicas"}]' -
Enable cluster autoscaling to configure the minimum and maximum node counts for your node pools. Run the following command:
$ oc patch -n <hosted_cluster_namespace> \ nodepool <nodepool_name> \ --type=merge --patch='{"spec": {"autoScaling": {"max": 3, "min": 1}}}' -
Generate the
kubeconfigfile by running the following command:$ hcp create kubeconfig \ --name <hosted_cluster_name> \ --namespace <hosted_cluster_namespace> > nested.config -
After scaling up the node pools, check that all compute nodes are in the
Readystatus by running the following command:$ oc --kubeconfig nested.config get nodes -l 'hypershift.openshift.io/nodePool=<node_pool_name>' -
Confirm that the new nodes contain the
node.group.balancing.ignoredlabel by running the following command:$ oc --kubeconfig nested.config get nodes \ -l 'hypershift.openshift.io/nodePool=<node_pool_name>' \ -o jsonpath='{.items[*].metadata.labels}' | grep "node.group.balancing.ignored" -
Enable cluster autoscaling for your hosted cluster by running the following command:
$ oc patch -n <hosted_cluster_namespace> \ hostedcluster <hosted_cluster_name> \ --type=merge \ --patch='{"spec": {"autoscaling": {"balancingIgnoredLabels": ["node.group.balancing.ignored"]}}}'
-
Verify that the number of nodes provisioned by each node pool is evenly distributed. For example, if you created three node pools with the same label value, the node counts might be 3, 2, and 3. Run the following command:
$ oc --kubeconfig nested.config get nodes -l 'hypershift.openshift.io/nodePool=<node_pool_name>'