Placing pods relative to other pods using affinity and anti-affinity rules

Affinity is a property of pods that controls the nodes on which they prefer to be scheduled. Anti-affinity is a property of pods that prevents a pod from being scheduled on a node.

In OpenShift Container Platform, pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key-value labels on other pods.

Understanding pod affinity

Pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key/value labels on other pods.

Pod affinity can tell the scheduler to locate a new pod on the same node as other pods if the label selector on the new pod matches the label on the current pod.
Pod anti-affinity can prevent the scheduler from locating a new pod on the same node as pods with the same labels if the label selector on the new pod matches the label on the current pod.

For example, using affinity rules, you could spread or pack pods within a service or relative to pods in other services. Anti-affinity rules allow you to prevent pods of a particular service from scheduling on the same nodes as pods of another service that are known to interfere with the performance of the pods of the first service. Or, you could spread the pods of a service across nodes, availability zones, or availability sets to reduce correlated failures.

Note

A label selector might match pods with multiple pod deployments. Use unique combinations of labels when configuring anti-affinity rules to avoid matching pods.

There are two types of pod affinity rules: required and preferred.

Required rules must be met before a pod can be scheduled on a node. Preferred rules specify that, if the rule is met, the scheduler tries to enforce the rules, but does not guarantee enforcement.

Note

Depending on your pod priority and preemption settings, the scheduler might not be able to find an appropriate node for a pod without violating affinity requirements. If so, a pod might not be scheduled.

To prevent this situation, carefully configure pod affinity with equal-priority pods.

You configure pod affinity/anti-affinity through the Pod spec files. You can specify a required rule, a preferred rule, or both. If you specify both, the node must first meet the required rule, then attempts to meet the preferred rule.

The following example shows a Pod spec configured for pod affinity and anti-affinity.

In this example, the pod affinity rule indicates that the pod can schedule onto a node only if that node has at least one already-running pod with a label that has the key security and value S1. The pod anti-affinity rule says that the pod prefers to not schedule onto a node if that node is already running a pod with label having key security and value S2.

Sample Pod config file with pod affinity

apiVersion: v1
kind: Pod
metadata:
  name: with-pod-affinity
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  affinity:
    podAffinity: 
      requiredDuringSchedulingIgnoredDuringExecution: 
      - labelSelector:
          matchExpressions:
          - key: security 
            operator: In 
            values:
            - S1 
        topologyKey: topology.kubernetes.io/zone
  containers:
  - name: with-pod-affinity
    image: docker.io/ocpqe/hello-pod
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]

Stanza to configure pod affinity.
Defines a required rule.
The key and value (label) that must be matched to apply the rule.
The operator represents the relationship between the label on the existing pod and the set of values in the matchExpression parameters in the specification for the new pod. Can be In, NotIn, Exists, or DoesNotExist.

Sample Pod config file with pod anti-affinity

apiVersion: v1
kind: Pod
metadata:
  name: with-pod-antiaffinity
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  affinity:
    podAntiAffinity: 
      preferredDuringSchedulingIgnoredDuringExecution: 
      - weight: 100  
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security 
              operator: In 
              values:
              - S2
          topologyKey: kubernetes.io/hostname
  containers:
  - name: with-pod-affinity
    image: docker.io/ocpqe/hello-pod
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]

Stanza to configure pod anti-affinity.
Defines a preferred rule.
Specifies a weight for a preferred rule. The node with the highest weight is preferred.
Description of the pod label that determines when the anti-affinity rule applies. Specify a key and value for the label.
The operator represents the relationship between the label on the existing pod and the set of values in the matchExpression parameters in the specification for the new pod. Can be In, NotIn, Exists, or DoesNotExist.

Note

If labels on a node change at runtime such that the affinity rules on a pod are no longer met, the pod continues to run on the node.

Configuring a pod affinity rule

The following steps demonstrate a simple two-pod configuration that creates pod with a label and a pod that uses affinity to allow scheduling with that pod.

Note

You cannot add an affinity directly to a scheduled pod.

Procedure

Create a pod with a specific label in the pod spec:

Create a YAML file with the following content:

apiVersion: v1
kind: Pod
metadata:
  name: security-s1
  labels:
    security: S1
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: security-s1
    image: docker.io/ocpqe/hello-pod
    securityContext:
      runAsNonRoot: true
      seccompProfile:
        type: RuntimeDefault

Create the pod.
```
$ oc create -f <pod-spec>.yaml
```

When creating other pods, configure the following parameters to add the affinity:
1. Create a YAML file with the following content:
  apiVersion: v1 kind: Pod metadata: name: security-s1-east # ... spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security values: - S1 operator: In topologyKey: topology.kubernetes.io/zone # ...
  Adds a pod affinity.
  Configures the requiredDuringSchedulingIgnoredDuringExecution parameter or the preferredDuringSchedulingIgnoredDuringExecution parameter.
  Specifies the key and values that must be met. If you want the new pod to be scheduled with the other pod, use the same key and values parameters as the label on the first pod.
  Specifies an operator. The operator can be In, NotIn, Exists, or DoesNotExist. For example, use the operator In to require the label to be in the node.
  Specify a topologyKey, which is a prepopulated Kubernetes label that the system uses to denote such a topology domain.
2. Create the pod.
  $ oc create -f <pod-spec>.yaml

Configuring a pod anti-affinity rule

The following steps demonstrate a simple two-pod configuration that creates pod with a label and a pod that uses an anti-affinity preferred rule to attempt to prevent scheduling with that pod.

Note

You cannot add an affinity directly to a scheduled pod.

Procedure

Create a pod with a specific label in the pod spec:

Create a YAML file with the following content:

apiVersion: v1
kind: Pod
metadata:
  name: security-s1
  labels:
    security: S1
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: security-s1
    image: docker.io/ocpqe/hello-pod
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]

Create the pod.
```
$ oc create -f <pod-spec>.yaml
```

When creating other pods, configure the following parameters:
1. Create a YAML file with the following content:
  apiVersion: v1 kind: Pod metadata: name: security-s2-east # ... spec: # ... affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security values: - S1 operator: In topologyKey: kubernetes.io/hostname # ...
  1. Adds a pod anti-affinity.
  2. Configures the requiredDuringSchedulingIgnoredDuringExecution parameter or the preferredDuringSchedulingIgnoredDuringExecution parameter.
  3. For a preferred rule, specifies a weight for the node, 1-100. The node that with highest weight is preferred.
  4. Specifies the key and values that must be met. If you want the new pod to not be scheduled with the other pod, use the same key and values parameters as the label on the first pod.
  5. Specifies an operator. The operator can be In, NotIn, Exists, or DoesNotExist. For example, use the operator In to require the label to be in the node.
  6. Specifies a topologyKey, which is a prepopulated Kubernetes label that the system uses to denote such a topology domain.
2. Create the pod.
  $ oc create -f <pod-spec>.yaml

Sample pod affinity and anti-affinity rules

The following examples demonstrate pod affinity and pod anti-affinity.

Pod Affinity

The following example demonstrates pod affinity for pods with matching labels and label selectors.

The pod team4 has the label team:4.

apiVersion: v1
kind: Pod
metadata:
  name: team4
  labels:
     team: "4"
# ...
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: ocp
    image: docker.io/ocpqe/hello-pod
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]
# ...

The pod team4a has the label selector team:4 under podAffinity.

apiVersion: v1
kind: Pod
metadata:
  name: team4a
# ...
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: team
            operator: In
            values:
            - "4"
        topologyKey: kubernetes.io/hostname
  containers:
  - name: pod-affinity
    image: docker.io/ocpqe/hello-pod
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]
# ...

The team4a pod is scheduled on the same node as the team4 pod.

Pod Anti-affinity

The following example demonstrates pod anti-affinity for pods with matching labels and label selectors.

The pod pod-s1 has the label security:s1.

apiVersion: v1
kind: Pod
metadata:
  name: pod-s1
  labels:
    security: s1
# ...
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: ocp
    image: docker.io/ocpqe/hello-pod
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]
# ...

The pod pod-s2 has the label selector security:s1 under podAntiAffinity.

apiVersion: v1
kind: Pod
metadata:
  name: pod-s2
# ...
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
            - s1
        topologyKey: kubernetes.io/hostname
  containers:
  - name: pod-antiaffinity
    image: docker.io/ocpqe/hello-pod
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]
# ...

The pod pod-s2 cannot be scheduled on the same node as pod-s1.

Pod Affinity with no Matching Labels

The following example demonstrates pod affinity for pods without matching labels and label selectors.

The pod pod-s1 has the label security:s1.

apiVersion: v1
kind: Pod
metadata:
  name: pod-s1
  labels:
    security: s1
# ...
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: ocp
    image: docker.io/ocpqe/hello-pod
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]
# ...

The pod pod-s2 has the label selector security:s2.

apiVersion: v1
kind: Pod
metadata:
  name: pod-s2
# ...
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
            - s2
        topologyKey: kubernetes.io/hostname
  containers:
  - name: pod-affinity
    image: docker.io/ocpqe/hello-pod
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]
# ...

The pod pod-s2 is not scheduled unless there is a node with a pod that has the security:s2 label. If there is no other pod with that label, the new pod remains in a pending state:
Example output
```
NAME      READY     STATUS    RESTARTS   AGE       IP        NODE
pod-s2    0/1       Pending   0          32s       <none>
```

Using pod affinity and anti-affinity to control where an Operator is installed

By default, when you install an Operator, OpenShift Container Platform installs the Operator pod to one of your worker nodes randomly. However, there might be situations where you want that pod scheduled on a specific node or set of nodes.

The following examples describe situations where you might want to schedule an Operator pod to a specific node or set of nodes:

If an Operator requires a particular platform, such as amd64 or arm64
If an Operator requires a particular operating system, such as Linux or Windows
If you want Operators that work together scheduled on the same host or on hosts located on the same rack
If you want Operators dispersed throughout the infrastructure to avoid downtime due to network or hardware issues

You can control where an Operator pod is installed by adding a pod affinity or anti-affinity to the Operator’s Subscription object.

The following example shows how to use pod anti-affinity to prevent the installation the Custom Metrics Autoscaler Operator from any node that has pods with a specific label:

Pod affinity example that places the Operator pod on one or more specific nodes

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-custom-metrics-autoscaler-operator
  namespace: openshift-keda
spec:
  name: my-package
  source: my-operators
  sourceNamespace: operator-registries
  config:
    affinity:
      podAffinity: 
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - test
          topologyKey: kubernetes.io/hostname
#...

A pod affinity that places the Operator’s pod on a node that has pods with the app=test label.

Pod anti-affinity example that prevents the Operator pod from one or more specific nodes

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-custom-metrics-autoscaler-operator
  namespace: openshift-keda
spec:
  name: my-package
  source: my-operators
  sourceNamespace: operator-registries
  config:
    affinity:
      podAntiAffinity: 
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: cpu
              operator: In
              values:
              - high
          topologyKey: kubernetes.io/hostname
#...

A pod anti-affinity that prevents the Operator’s pod from being scheduled on a node that has pods with the cpu=high label.

Procedure

To control the placement of an Operator pod, complete the following steps:

Install the Operator as usual.
If needed, ensure that your nodes are labeled to properly respond to the affinity.

Edit the Operator Subscription object to add an affinity:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-custom-metrics-autoscaler-operator
  namespace: openshift-keda
spec:
  name: my-package
  source: my-operators
  sourceNamespace: operator-registries
  config:
    affinity:
      podAntiAffinity: 
        requiredDuringSchedulingIgnoredDuringExecution:
          podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - ip-10-0-185-229.ec2.internal
            topologyKey: topology.kubernetes.io/zone
#...

Add a podAffinity or podAntiAffinity.

Verification

To ensure that the pod is deployed on the specific node, run the following command:

$ oc get pods -o wide

Example output

NAME                                                  READY   STATUS    RESTARTS   AGE   IP            NODE                           NOMINATED NODE   READINESS GATES
custom-metrics-autoscaler-operator-5dcc45d656-bhshg   1/1     Running   0          50s   10.131.0.20   ip-10-0-185-229.ec2.internal   <none>           <none>