Understanding the File Integrity Operator

The File Integrity Operator is an OpenShift Container Platform Operator that continually runs file integrity checks on the cluster nodes. It deploys a daemon set that initializes and runs privileged advanced intrusion detection environment (AIDE) containers on each node, providing a status object with a log of files that are modified during the initial run of the daemon set pods.

Important

Currently, only Red Hat Enterprise Linux CoreOS (RHCOS) nodes are supported.

Creating the FileIntegrity custom resource

An instance of a FileIntegrity custom resource (CR) represents a set of continuous file integrity scans for one or more nodes.

Each FileIntegrity CR is backed by a daemon set running AIDE on the nodes matching the FileIntegrity CR specification.

Procedure

Create the following example FileIntegrity CR named worker-fileintegrity.yaml to enable scans on worker nodes:

Example FileIntegrity CR

apiVersion: fileintegrity.openshift.io/v1alpha1
kind: FileIntegrity
metadata:
  name: worker-fileintegrity
  namespace: openshift-file-integrity
spec:
  nodeSelector: 
      node-role.kubernetes.io/worker: ""
  tolerations: 
  - key: "myNode"
    operator: "Exists"
    effect: "NoSchedule"
  config: 
    name: "myconfig"
    namespace: "openshift-file-integrity"
    key: "config"
    gracePeriod: 20 
    maxBackups: 5 
    initialDelay: 60 
  debug: false
status:
  phase: Active

Defines the selector for scheduling node scans.
Specify tolerations to schedule on nodes with custom taints. When not specified, a default toleration allowing running on main and infra nodes is applied.
Define a ConfigMap containing an AIDE configuration to use.
The number of seconds to pause in between AIDE integrity checks. Frequent AIDE checks on a node might be resource intensive, so it can be useful to specify a longer interval. Default is 900 seconds (15 minutes).
The maximum number of AIDE database and log backups (leftover from the re-init process) to keep on a node. Older backups beyond this number are automatically pruned by the daemon. Default is set to 5.
The number of seconds to wait before starting the first AIDE integrity check. Default is set to 0.

The running status of the FileIntegrity instance. Statuses are Initializing, Pending, or Active.

`Initializing`	The `FileIntegrity` object is currently initializing or re-initializing the AIDE database.
`Pending`	The `FileIntegrity` deployment is still being created.
`Active`	The scans are active and ongoing.

The FileIntegrity object is currently initializing or re-initializing the AIDE database.
The FileIntegrity deployment is still being created.
The scans are active and ongoing.

Apply the YAML file to the openshift-file-integrity namespace:

$ oc apply -f worker-fileintegrity.yaml -n openshift-file-integrity

Verification

Confirm the FileIntegrity object was created successfully by running the following command:

$ oc get fileintegrities -n openshift-file-integrity

Example output

NAME                   AGE
worker-fileintegrity   14s

Checking the FileIntegrity custom resource status

The FileIntegrity custom resource (CR) reports its status through the .status.phase subresource.

Procedure

To query the FileIntegrity CR status, run:

$ oc get fileintegrities/worker-fileintegrity  -o jsonpath="{ .status.phase }"

Example output

Active

FileIntegrity custom resource phases

Pending - The phase after the custom resource (CR) is created.
Active - The phase when the backing daemon set is up and running.
Initializing - The phase when the AIDE database is being reinitialized.

Understanding the FileIntegrityNodeStatuses object

The scan results of the FileIntegrity CR are reported in another object called FileIntegrityNodeStatuses.

$ oc get fileintegritynodestatuses

Example output

NAME                                                AGE
worker-fileintegrity-ip-10-0-130-192.ec2.internal   101s
worker-fileintegrity-ip-10-0-147-133.ec2.internal   109s
worker-fileintegrity-ip-10-0-165-160.ec2.internal   102s

Note

It might take some time for the FileIntegrityNodeStatus object results to be available.

There is one result object per node. The nodeName attribute of each FileIntegrityNodeStatus object corresponds to the node being scanned. The status of the file integrity scan is represented in the results array, which holds scan conditions.

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq

The fileintegritynodestatus object reports the latest status of an AIDE run and exposes the status as Failed, Succeeded, or Errored in a status field.

$ oc get fileintegritynodestatuses -w

Example output

NAME                                                               NODE                                         STATUS
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal   ip-10-0-134-186.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal   ip-10-0-150-230.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-169-137.us-east-2.compute.internal   ip-10-0-169-137.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal   ip-10-0-180-200.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal    ip-10-0-194-66.us-east-2.compute.internal    Failed
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal   ip-10-0-222-188.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal   ip-10-0-134-186.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal   ip-10-0-222-188.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal    ip-10-0-194-66.us-east-2.compute.internal    Failed
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal   ip-10-0-150-230.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal   ip-10-0-180-200.us-east-2.compute.internal   Succeeded

FileIntegrityNodeStatus CR status types

These conditions are reported in the results array of the corresponding FileIntegrityNodeStatus CR status:

Succeeded - The integrity check passed; the files and directories covered by the AIDE check have not been modified since the database was last initialized.
Failed - The integrity check failed; some files or directories covered by the AIDE check have been modified since the database was last initialized.
Errored - The AIDE scanner encountered an internal error.

FileIntegrityNodeStatus CR success example

Example output of a condition with a success status

[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:45:57Z"
  }
]
[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:46:03Z"
  }
]
[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:45:48Z"
  }
]

In this case, all three scans succeeded and so far there are no other conditions.

FileIntegrityNodeStatus CR failure status example

To simulate a failure condition, modify one of the files AIDE tracks. For example, modify /etc/resolv.conf on one of the worker nodes:

$ oc debug node/ip-10-0-130-192.ec2.internal

Example output

Creating debug namespace/openshift-debug-node-ldfbj ...
Starting pod/ip-10-0-130-192ec2internal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.130.192
If you don't see a command prompt, try pressing enter.
sh-4.2# echo "# integrity test" >> /host/etc/resolv.conf
sh-4.2# exit

Removing debug pod ...
Removing debug namespace/openshift-debug-node-ldfbj ...

After some time, the Failed condition is reported in the results array of the corresponding FileIntegrityNodeStatus object. The previous Succeeded condition is retained, which allows you to pinpoint the time the check failed.

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io/worker-fileintegrity-ip-10-0-130-192.ec2.internal -ojsonpath='{.results}' | jq -r

Alternatively, if you are not mentioning the object name, run:

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq

Example output

[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:54:14Z"
  },
  {
    "condition": "Failed",
    "filesChanged": 1,
    "lastProbeTime": "2020-09-15T12:57:20Z",
    "resultConfigMapName": "aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed",
    "resultConfigMapNamespace": "openshift-file-integrity"
  }
]

The Failed condition points to a config map that gives more details about what exactly failed and why:

$ oc describe cm aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed

Example output

Name:         aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Namespace:    openshift-file-integrity
Labels:       file-integrity.openshift.io/node=ip-10-0-130-192.ec2.internal
              file-integrity.openshift.io/owner=worker-fileintegrity
              file-integrity.openshift.io/result-log=
Annotations:  file-integrity.openshift.io/files-added: 0
              file-integrity.openshift.io/files-changed: 1
              file-integrity.openshift.io/files-removed: 0

Data

integritylog:
------
AIDE 0.15.1 found differences between database and filesystem!!
Start timestamp: 2020-09-15 12:58:15

Summary:
  Total number of files:  31553
  Added files:                0
  Removed files:            0
  Changed files:            1


---------------------------------------------------
Changed files:
---------------------------------------------------

changed: /hostroot/etc/resolv.conf

---------------------------------------------------
Detailed information about changes:
---------------------------------------------------


File: /hostroot/etc/resolv.conf
 SHA512   : sTQYpB/AL7FeoGtu/1g7opv6C+KT1CBJ , qAeM+a8yTgHPnIHMaRlS+so61EN8VOpg

Events:  <none>

Due to the config map data size limit, AIDE logs over 1 MB are added to the failure config map as a base64-encoded gzip archive. Use the following command to extract the log:

$ oc get cm <failure-cm-name> -o json | jq -r '.data.integritylog' | base64 -d | gunzip

Note

Compressed logs are indicated by the presence of a file-integrity.openshift.io/compressed annotation key in the config map.

Understanding events

Transitions in the status of the FileIntegrity and FileIntegrityNodeStatus objects are logged by events. The creation time of the event reflects the latest transition, such as Initializing to Active, and not necessarily the latest scan result. However, the newest event always reflects the most recent status.

$ oc get events --field-selector reason=FileIntegrityStatus

Example output

LAST SEEN   TYPE     REASON                OBJECT                                MESSAGE
97s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Pending
67s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Initializing
37s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Active

When a node scan fails, an event is created with the add/changed/removed and config map information.

$ oc get events --field-selector reason=NodeIntegrityStatus

Example output

LAST SEEN   TYPE      REASON                OBJECT                                MESSAGE
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-134-173.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-168-238.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-169-175.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-152-92.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-158-144.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-131-30.ec2.internal
87m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed

Changes to the number of added, changed, or removed files results in a new event, even if the status of the node has not transitioned.

$ oc get events --field-selector reason=NodeIntegrityStatus

Example output

LAST SEEN   TYPE      REASON                OBJECT                                MESSAGE
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-134-173.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-168-238.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-169-175.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-152-92.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-158-144.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-131-30.ec2.internal
87m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
40m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:3,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed