Managing a cluster with multi-architecture compute machines
Managing a cluster that has nodes with multiple architectures requires you to consider node architecture as you monitor the cluster and manage your workloads. This requires you to take additional considerations into account when you configure cluster resource requirements and behavior, or schedule workloads in a multi-architecture cluster.
Scheduling workloads on clusters with multi-architecture compute machines
When you deploy workloads on a cluster with compute nodes that use different architectures, you must align pod architecture with the architecture of the underlying node. Your workload may also require additional configuration to particular resources depending on the underlying node architecture.
You can use the Multiarch Tuning Operator to enable architecture-aware scheduling of workloads on clusters with multi-architecture compute machines. The Multiarch Tuning Operator implements additional scheduler predicates in the pods specifications based on the architectures that the pods can support at creation time.
For information about the Multiarch Tuning Operator, see Managing workloads on multi-architecture clusters by using the Multiarch Tuning Operator.
Sample multi-architecture node workload deployments
Scheduling a workload to an appropriate node based on architecture works in the same way as scheduling based on any other node characteristic. Consider the following options when determining how to schedule your workloads.
- Using
nodeAffinityto schedule nodes with specific architectures -
You can allow a workload to be scheduled on only a set of nodes with architectures supported by its images, you can set the
spec.affinity.nodeAffinityfield in your pod’s template specification.Example deployment with node affinity setapiVersion: apps/v1 kind: Deployment metadata: # ... spec: # ... template: # ... spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/arch operator: In values: - amd64 - arm64- Specify the supported architectures. Valid values include
amd64,arm64, or both values.
- Specify the supported architectures. Valid values include
- Tainting each node for a specific architecture
-
You can taint a node to avoid the node scheduling workloads that are incompatible with its architecture. When your cluster uses a
MachineSetobject, you can add parameters to the.spec.template.spec.taintsfield to avoid workloads being scheduled on nodes with non-supported architectures.Before you add a taint to a node, you must scale down the
MachineSetobject or remove existing available machines. For more information, see Modifying a compute machine set.Example machine set with taint setapiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: # ... spec: # ... template: # ... spec: # ... taints: - effect: NoSchedule key: multiarch.openshift.io/arch value: arm64You can also set a taint on a specific node by running the following command:
$ oc adm taint nodes <node-name> multiarch.openshift.io/arch=arm64:NoSchedule
- Creating a default toleration in a namespace
-
When a node or machine set has a taint, only workloads that tolerate that taint can be scheduled. You can annotate a namespace so all of the workloads get the same default toleration by running the following command:
Example default toleration set on a namespace$ oc annotate namespace my-namespace \ 'scheduler.alpha.kubernetes.io/defaultTolerations'='[{"operator": "Exists", "effect": "NoSchedule", "key": "multiarch.openshift.io/arch"}]'
- Tolerating architecture taints in workloads
-
When a node or machine set has a taint, only workloads that tolerate that taint can be scheduled. You can configure your workload with a
tolerationso that it is scheduled on nodes with specific architecture taints.Example deployment with toleration setapiVersion: apps/v1 kind: Deployment metadata: # ... spec: # ... template: # ... spec: tolerations: - key: "multiarch.openshift.io/arch" value: "arm64" operator: "Equal" effect: "NoSchedule"This example deployment can be scheduled on nodes and machine sets that have the
multiarch.openshift.io/arch=arm64taint specified.
- Using node affinity with taints and tolerations
-
When a scheduler computes the set of nodes to schedule a pod, tolerations can broaden the set while node affinity restricts the set. If you set a taint on nodes that have a specific architecture, you must also add a toleration to workloads that you want to be scheduled there.
Example deployment with node affinity and toleration setapiVersion: apps/v1 kind: Deployment metadata: # ... spec: # ... template: # ... spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/arch operator: In values: - amd64 - arm64 tolerations: - key: "multiarch.openshift.io/arch" value: "arm64" operator: "Equal" effect: "NoSchedule"
Enabling 64k pages on the Red Hat Enterprise Linux CoreOS (RHCOS) kernel
You can enable the 64k memory page in the Red Hat Enterprise Linux CoreOS (RHCOS) kernel on the 64-bit ARM compute machines in your cluster. The 64k page size kernel specification can be used for large GPU or high memory workloads. This is done using the Machine Config Operator (MCO) which uses a machine config pool to update the kernel. To enable 64k page sizes, you must dedicate a machine config pool for ARM64 to enable on the kernel.
Important
Using 64k pages is exclusive to 64-bit ARM architecture compute nodes or clusters installed on 64-bit ARM machines. If you configure the 64k pages kernel on a machine config pool using 64-bit x86 machines, the machine config pool and MCO will degrade.
-
You installed the OpenShift CLI (
oc). -
You created a cluster with compute nodes of different architecture on one of the supported platforms.
-
Label the nodes where you want to run the 64k page size kernel:
$ oc label node <node_name> <label>Example command$ oc label node worker-arm64-01 node-role.kubernetes.io/worker-64k-pages= -
Create a machine config pool that contains the worker role that uses the ARM64 architecture and the
worker-64k-pagesrole:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker-64k-pages spec: machineConfigSelector: matchExpressions: - key: machineconfiguration.openshift.io/role operator: In values: - worker - worker-64k-pages nodeSelector: matchLabels: node-role.kubernetes.io/worker-64k-pages: "" kubernetes.io/arch: arm64 -
Create a machine config on your compute node to enable
64k-pageswith the64k-pagesparameter.$ oc create -f <filename>.yamlExample MachineConfigapiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker-64k-pages" name: 99-worker-64kpages spec: kernelType: 64k-pages- Specify the value of the
machineconfiguration.openshift.io/rolelabel in the custom machine config pool. The example MachineConfig uses theworker-64k-pageslabel to enable 64k pages in theworker-64k-pagespool. - Specify your desired kernel type. Valid values are
64k-pagesanddefaultNote
The
64k-pagestype is supported on only 64-bit ARM architecture based compute nodes. Therealtimetype is supported on only 64-bit x86 architecture based compute nodes.
- Specify the value of the
-
To view your new
worker-64k-pagesmachine config pool, run the following command:$ oc get mcpExample outputNAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-9d55ac9a91127c36314e1efe7d77fbf8 True False False 3 3 3 0 361d worker rendered-worker-e7b61751c4a5b7ff995d64b967c421ff True False False 7 7 7 0 361d worker-64k-pages rendered-worker-64k-pages-e7b61751c4a5b7ff995d64b967c421ff True False False 2 2 2 0 35m
Importing manifest lists in image streams on your multi-architecture compute machines
On an OpenShift Container Platform 4.19 cluster with multi-architecture compute machines, the image streams in the cluster do not import manifest lists automatically. You must manually change the default importMode option to the PreserveOriginal option in order to import the manifest list.
-
You installed the OpenShift Container Platform CLI (
oc).
-
The following example command shows how to patch the
ImageStreamcli-artifacts so that thecli-artifacts:latestimage stream tag is imported as a manifest list.$ oc patch is/cli-artifacts -n openshift -p '{"spec":{"tags":[{"name":"latest","importPolicy":{"importMode":"PreserveOriginal"}}]}}'
-
You can check that the manifest lists imported properly by inspecting the image stream tag. The following command will list the individual architecture manifests for a particular tag.
$ oc get istag cli-artifacts:latest -n openshift -oyamlIf the
dockerImageManifestsobject is present, then the manifest list import was successful.Example output of thedockerImageManifestsobjectdockerImageManifests: - architecture: amd64 digest: sha256:16d4c96c52923a9968fbfa69425ec703aff711f1db822e4e9788bf5d2bee5d77 manifestSize: 1252 mediaType: application/vnd.docker.distribution.manifest.v2+json os: linux - architecture: arm64 digest: sha256:6ec8ad0d897bcdf727531f7d0b716931728999492709d19d8b09f0d90d57f626 manifestSize: 1252 mediaType: application/vnd.docker.distribution.manifest.v2+json os: linux - architecture: ppc64le digest: sha256:65949e3a80349cdc42acd8c5b34cde6ebc3241eae8daaeea458498fedb359a6a manifestSize: 1252 mediaType: application/vnd.docker.distribution.manifest.v2+json os: linux - architecture: s390x digest: sha256:75f4fa21224b5d5d511bea8f92dfa8e1c00231e5c81ab95e83c3013d245d1719 manifestSize: 1252 mediaType: application/vnd.docker.distribution.manifest.v2+json os: linux