Compliance Operator scans

The ScanSetting and ScanSettingBinding APIs are recommended to run compliance scans with the Compliance Operator. For more information on these API objects, run:

$ oc explain scansettings

or

$ oc explain scansettingbindings

Running compliance scans

You can run a scan using the Center for Internet Security (CIS) profiles. For convenience, the Compliance Operator creates a ScanSetting object with reasonable defaults on startup. This ScanSetting object is named default.

Note

For all-in-one control plane and worker nodes, the compliance scan runs twice on the worker and control plane nodes. The compliance scan might generate inconsistent scan results. You can avoid inconsistent results by defining only a single role in the ScanSetting object.

Important

Compliance Operator scans report INCONSISTENT on clusters with multi-architecture compute machines whether the control plane uses aarch64 or x86 CPUs. This is due to the same rule behaving differently on different architectures. This should only be applicable for node scans, where the Compliance Operator aggregates results from multiple nodes into a single result.

For more information about inconsistent scan results, see Compliance Operator shows INCONSISTENT scan result with worker node.

Procedure

Inspect the ScanSetting object by running the following command:

$ oc describe scansettings default -n openshift-compliance

Example output

Name:                  default
Namespace:             openshift-compliance
Labels:                <none>
Annotations:           <none>
API Version:           compliance.openshift.io/v1alpha1
Kind:                  ScanSetting
Max Retry On Timeout:  3
Metadata:
  Creation Timestamp:  2024-07-16T14:56:42Z
  Generation:          2
  Resource Version:    91655682
  UID:                 50358cf1-57a8-4f69-ac50-5c7a5938e402
Raw Result Storage:
  Node Selector:
    node-role.kubernetes.io/master:
  Pv Access Modes:
    ReadWriteOnce 
  Rotation:            3 
  Size:                1Gi 
  Storage Class Name:  standard 
  Tolerations:
    Effect:              NoSchedule
    Key:                 node-role.kubernetes.io/master
    Operator:            Exists
    Effect:              NoExecute
    Key:                 node.kubernetes.io/not-ready
    Operator:            Exists
    Toleration Seconds:  300
    Effect:              NoExecute
    Key:                 node.kubernetes.io/unreachable
    Operator:            Exists
    Toleration Seconds:  300
    Effect:              NoSchedule
    Key:                 node.kubernetes.io/memory-pressure
    Operator:            Exists
Roles:
  master 
  worker 
Scan Tolerations: 
  Operator:           Exists
Schedule:             0 1 * * * 
Show Not Applicable:  false
Strict Node Scan:     true
Suspend:              false
Timeout:              30m
Events:               <none>

The Compliance Operator creates a persistent volume (PV) that contains the results of the scans. By default, the PV will use access mode ReadWriteOnce because the Compliance Operator cannot make any assumptions about the storage classes configured on the cluster. Additionally, ReadWriteOnce access mode is available on most clusters. If you need to fetch the scan results, you can do so by using a helper pod, which also binds the volume. Volumes that use the ReadWriteOnce access mode can be mounted by only one pod at time, so it is important to remember to delete the helper pods. Otherwise, the Compliance Operator will not be able to reuse the volume for subsequent scans.
The Compliance Operator keeps results of three subsequent scans in the volume; older scans are rotated.
The Compliance Operator will allocate one GB of storage for the scan results.
The scansetting.rawResultStorage.storageClassName field specifies the storageClassName value to use when creating the PersistentVolumeClaim object to store the raw results. The default value is null, which will attempt to use the default storage class configured in the cluster. If there is no default class specified, then you must set a default class.
If the scan setting uses any profiles that scan cluster nodes, scan these node roles.
The default scan setting object scans all the nodes.

The default scan setting object runs scans at 01:00 each day.

As an alternative to the default scan setting, you can use default-auto-apply, which has the following settings:

Name:                      default-auto-apply
Namespace:                 openshift-compliance
Labels:                    <none>
Annotations:               <none>
API Version:               compliance.openshift.io/v1alpha1
Auto Apply Remediations:   true 
Auto Update Remediations:  true 
Kind:                      ScanSetting
Metadata:
  Creation Timestamp:  2022-10-18T20:21:00Z
  Generation:          1
  Managed Fields:
    API Version:  compliance.openshift.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:autoApplyRemediations:
      f:autoUpdateRemediations:
      f:rawResultStorage:
        .:
        f:nodeSelector:
          .:
          f:node-role.kubernetes.io/master:
        f:pvAccessModes:
        f:rotation:
        f:size:
        f:tolerations:
      f:roles:
      f:scanTolerations:
      f:schedule:
      f:showNotApplicable:
      f:strictNodeScan:
    Manager:         compliance-operator
    Operation:       Update
    Time:            2022-10-18T20:21:00Z
  Resource Version:  38840
  UID:               8cb0967d-05e0-4d7a-ac1c-08a7f7e89e84
Raw Result Storage:
  Node Selector:
    node-role.kubernetes.io/master:
  Pv Access Modes:
    ReadWriteOnce
  Rotation:  3
  Size:      1Gi
  Tolerations:
    Effect:              NoSchedule
    Key:                 node-role.kubernetes.io/master
    Operator:            Exists
    Effect:              NoExecute
    Key:                 node.kubernetes.io/not-ready
    Operator:            Exists
    Toleration Seconds:  300
    Effect:              NoExecute
    Key:                 node.kubernetes.io/unreachable
    Operator:            Exists
    Toleration Seconds:  300
    Effect:              NoSchedule
    Key:                 node.kubernetes.io/memory-pressure
    Operator:            Exists
Roles:
  master
  worker
Scan Tolerations:
  Operator:           Exists
Schedule:             0 1 * * *
Show Not Applicable:  false
Strict Node Scan:     true
Events:               <none>

Setting autoUpdateRemediations and autoApplyRemediations flags to true allows you to easily create ScanSetting objects that auto-remediate without extra steps.

Create a ScanSettingBinding object that binds to the default ScanSetting object and scans the cluster using the cis and cis-node profiles. For example:

apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: cis-compliance
  namespace: openshift-compliance
profiles:
  - name: ocp4-cis-node
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
  - name: ocp4-cis
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: default
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1

Create the ScanSettingBinding object by running:
```
$ oc create -f <file-name>.yaml -n openshift-compliance
```
At this point in the process, the ScanSettingBinding object is reconciled and based on the Binding and the Bound settings. The Compliance Operator creates a ComplianceSuite object and the associated ComplianceScan objects.
Follow the compliance scan progress by running:
```
$ oc get compliancescan -w -n openshift-compliance
```
The scans progress through the scanning phases and eventually reach the DONE phase when complete. In most cases, the result of the scan is NON-COMPLIANT. You can review the scan results and start applying remediations to make the cluster compliant. See Managing Compliance Operator remediation for more information.

Setting custom storage size for results

While the custom resources such as ComplianceCheckResult represent an aggregated result of one check across all scanned nodes, it can be useful to review the raw results as produced by the scanner. The raw results are produced in the ARF format and can be large (tens of megabytes per node), it is impractical to store them in a Kubernetes resource backed by the etcd key-value store. Instead, every scan creates a persistent volume (PV) which defaults to 1GB size. Depending on your environment, you may want to increase the PV size accordingly. This is done using the rawResultStorage.size attribute that is exposed in both the ScanSetting and ComplianceScan resources.

A related parameter is rawResultStorage.rotation which controls how many scans are retained in the PV before the older scans are rotated. The default value is 3, setting the rotation policy to 0 disables the rotation. Given the default rotation policy and an estimate of 100MB per a raw ARF scan report, you can calculate the right PV size for your environment.

Using custom result storage values

Because OpenShift Container Platform can be deployed in a variety of public clouds or bare metal, the Compliance Operator cannot determine available storage configurations. By default, the Compliance Operator will try to create the PV for storing results using the default storage class of the cluster, but a custom storage class can be configured using the rawResultStorage.StorageClassName attribute.

Important

If your cluster does not specify a default storage class, this attribute must be set.

Configure the ScanSetting custom resource to use a standard storage class and create persistent volumes that are 10GB in size and keep the last 10 results:

Example ScanSetting CR

apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSetting
metadata:
  name: default
  namespace: openshift-compliance
rawResultStorage:
  storageClassName: standard
  rotation: 10
  size: 10Gi
roles:
- worker
- master
scanTolerations:
- effect: NoSchedule
  key: node-role.kubernetes.io/master
  operator: Exists
schedule: '0 1 * * *'

Scheduling the result server pod on a worker node

The result server pod mounts the persistent volume (PV) that stores the raw Asset Reporting Format (ARF) scan results. The nodeSelector and tolerations attributes enable you to configure the location of the result server pod.

This is helpful for those environments where control plane nodes are not permitted to mount persistent volumes.

Procedure

Create a ScanSetting custom resource (CR) for the Compliance Operator:

Define the ScanSetting CR, and save the YAML file, for example, rs-workers.yaml:

apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSetting
metadata:
  name: rs-on-workers
  namespace: openshift-compliance
rawResultStorage:
  nodeSelector:
    node-role.kubernetes.io/worker: "" 
  pvAccessModes:
  - ReadWriteOnce
  rotation: 3
  size: 1Gi
  tolerations:
  - operator: Exists 
roles:
- worker
- master
scanTolerations:
  - operator: Exists
schedule: 0 1 * * *

The Compliance Operator uses this node to store scan results in ARF format.
The result server pod tolerates all taints.

To create the ScanSetting CR, run the following command:
```
$ oc create -f rs-workers.yaml
```

Verification

To verify that the ScanSetting object is created, run the following command:

$ oc get scansettings rs-on-workers -n openshift-compliance -o yaml

Example output

apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSetting
metadata:
  creationTimestamp: "2021-11-19T19:36:36Z"
  generation: 1
  name: rs-on-workers
  namespace: openshift-compliance
  resourceVersion: "48305"
  uid: 43fdfc5f-15a7-445a-8bbc-0e4a160cd46e
rawResultStorage:
  nodeSelector:
    node-role.kubernetes.io/worker: ""
  pvAccessModes:
  - ReadWriteOnce
  rotation: 3
  size: 1Gi
  tolerations:
  - operator: Exists
roles:
- worker
- master
scanTolerations:
- operator: Exists
schedule: 0 1 * * *
strictNodeScan: true

`ScanSetting` Custom Resource

The ScanSetting Custom Resource now allows you to override the default CPU and memory limits of scanner pods through the scan limits attribute. The Compliance Operator will use defaults of 500Mi memory, 100m CPU for the scanner container, and 200Mi memory with 100m CPU for the api-resource-collector container. To set the memory limits of the Operator, modify the Subscription object if installed through OLM or the Operator deployment itself.

To increase the default CPU and memory limits of the Compliance Operator, see Increasing Compliance Operator resource limits.

Important

Increasing the memory limit for the Compliance Operator or the scanner pods is needed if the default limits are not sufficient and the Operator or scanner pods are ended by the Out Of Memory (OOM) process. For more information, see Increasing Compliance Operator resource limits.

Configuring the hosted control planes management cluster

If you are hosting your own Hosted control plane or Hypershift environment and want to scan a Hosted Cluster from the management cluster, you will need to set the name and prefix namespace for the target Hosted Cluster. You can achieve this by creating a TailoredProfile.

Important

This procedure only applies to users managing their own hosted control planes environment.

Note

Only ocp4-cis and ocp4-pci-dss profiles are supported in hosted control planes management clusters.

Prerequisites

The Compliance Operator is installed in the management cluster.

Procedure

Obtain the name and namespace of the hosted cluster to be scanned by running the following command:

$ oc get hostedcluster -A

Example output

NAMESPACE       NAME                   VERSION   KUBECONFIG                              PROGRESS    AVAILABLE   PROGRESSING   MESSAGE
local-cluster   79136a1bdb84b3c13217   4.13.5    79136a1bdb84b3c13217-admin-kubeconfig   Completed   True        False         The hosted control plane is available

In the management cluster, create a TailoredProfile extending the scan Profile and define the name and namespace of the Hosted Cluster to be scanned:

Example management-tailoredprofile.yaml

apiVersion: compliance.openshift.io/v1alpha1
kind: TailoredProfile
metadata:
  name: hypershift-cisk57aw88gry
  namespace: openshift-compliance
spec:
  description: This profile test required rules
  extends: ocp4-cis 
  title: Management namespace profile
  setValues:
  - name: ocp4-hypershift-cluster
    rationale: This value is used for HyperShift version detection
    value: 79136a1bdb84b3c13217 
  - name: ocp4-hypershift-namespace-prefix
    rationale: This value is used for HyperShift control plane namespace detection
    value: local-cluster

Variable. Only ocp4-cis and ocp4-pci-dss profiles are supported in hosted control planes management clusters.
The value is the NAME from the output in the previous step.
The value is the NAMESPACE from the output in the previous step.

Create the TailoredProfile:

$ oc create -n openshift-compliance -f mgmt-tp.yaml

Applying resource requests and limits

When the kubelet starts a container as part of a Pod, the kubelet passes that container’s requests and limits for memory and CPU to the container runtime. In Linux, the container runtime configures the kernel cgroups that apply and enforce the limits you defined.

The CPU limit defines how much CPU time the container can use. During each scheduling interval, the Linux kernel checks to see if this limit is exceeded. If so, the kernel waits before allowing the cgroup to resume execution.

If several different containers (cgroups) want to run on a contended system, workloads with larger CPU requests are allocated more CPU time than workloads with small requests. The memory request is used during Pod scheduling. On a node that uses cgroups v2, the container runtime might use the memory request as a hint to set memory.min and memory.low values.

If a container attempts to allocate more memory than this limit, the Linux kernel out-of-memory subsystem activates and intervenes by stopping one of the processes in the container that tried to allocate memory. The memory limit for the Pod or container can also apply to pages in memory-backed volumes, such as an emptyDir.

The kubelet tracks tmpfs emptyDir volumes as container memory is used, rather than as local ephemeral storage. If a container exceeds its memory request and the node that it runs on becomes short of memory overall, the Pod’s container might be evicted.

Important

A container may not exceed its CPU limit for extended periods. Container run times do not stop Pods or containers for excessive CPU usage. To determine whether a container cannot be scheduled or is being killed due to resource limits, see Troubleshooting the Compliance Operator.

Scheduling Pods with container resource requests

When a Pod is created, the scheduler selects a Node for the Pod to run on. Each node has a maximum capacity for each resource type in the amount of CPU and memory it can provide for the Pods. The scheduler ensures that the sum of the resource requests of the scheduled containers is less than the capacity nodes for each resource type.

Although memory or CPU resource usage on nodes is very low, the scheduler might still refuse to place a Pod on a node if the capacity check fails to protect against a resource shortage on a node.

For each container, you can specify the following resource limits and request:

spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.limits.hugepages-<size>
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.requests.hugepages-<size>

Although you can specify requests and limits for only individual containers, it is also useful to consider the overall resource requests and limits for a pod. For a particular resource, a container resource request or limit is the sum of the resource requests or limits of that type for each container in the pod.

Example container resource requests and limits

apiVersion: v1
kind: Pod
metadata:
  name: frontend
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: images.my-company.example/app:v4
    resources:
      requests: 
        memory: "64Mi"
        cpu: "250m"
      limits: 
        memory: "128Mi"
        cpu: "500m"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]
  - name: log-aggregator
    image: images.my-company.example/log-aggregator:v6
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: [ALL]

The container is requesting 64 Mi of memory and 250 m CPU.
The container’s limits are 128 Mi of memory and 500 m CPU.