Understanding the File Integrity Operator
The File Integrity Operator is an OpenShift Container Platform Operator that continually runs file integrity checks on the cluster nodes. It deploys a daemon set that initializes and runs privileged advanced intrusion detection environment (AIDE) containers on each node, providing a status object with a log of files that are modified during the initial run of the daemon set pods.
Important
Currently, only Red Hat Enterprise Linux CoreOS (RHCOS) nodes are supported.
Creating the FileIntegrity custom resource
An instance of a FileIntegrity custom resource (CR) represents a set of continuous file integrity scans for one or more nodes.
Each FileIntegrity CR is backed by a daemon set running AIDE on the nodes matching the FileIntegrity CR specification.
-
Create the following example
FileIntegrityCR namedworker-fileintegrity.yamlto enable scans on worker nodes:Example FileIntegrity CRapiVersion: fileintegrity.openshift.io/v1alpha1 kind: FileIntegrity metadata: name: worker-fileintegrity namespace: openshift-file-integrity spec: nodeSelector: node-role.kubernetes.io/worker: "" tolerations: - key: "myNode" operator: "Exists" effect: "NoSchedule" config: name: "myconfig" namespace: "openshift-file-integrity" key: "config" gracePeriod: 20 maxBackups: 5 initialDelay: 60 debug: false status: phase: Active- Defines the selector for scheduling node scans.
- Specify
tolerationsto schedule on nodes with custom taints. When not specified, a default toleration allowing running on main and infra nodes is applied. - Define a
ConfigMapcontaining an AIDE configuration to use. - The number of seconds to pause in between AIDE integrity checks. Frequent AIDE checks on a node might be resource intensive, so it can be useful to specify a longer interval. Default is 900 seconds (15 minutes).
- The maximum number of AIDE database and log backups (leftover from the re-init process) to keep on a node. Older backups beyond this number are automatically pruned by the daemon. Default is set to 5.
- The number of seconds to wait before starting the first AIDE integrity check. Default is set to 0.
- The running status of the
FileIntegrityinstance. Statuses areInitializing,Pending, orActive.InitializingThe
FileIntegrityobject is currently initializing or re-initializing the AIDE database.PendingThe
FileIntegritydeployment is still being created.ActiveThe scans are active and ongoing.
The
FileIntegrityobject is currently initializing or re-initializing the AIDE database.The
FileIntegritydeployment is still being created.The scans are active and ongoing.
-
Apply the YAML file to the
openshift-file-integritynamespace:$ oc apply -f worker-fileintegrity.yaml -n openshift-file-integrity
-
Confirm the
FileIntegrityobject was created successfully by running the following command:$ oc get fileintegrities -n openshift-file-integrityExample outputNAME AGE worker-fileintegrity 14s
Checking the FileIntegrity custom resource status
The FileIntegrity custom resource (CR) reports its status through the .status.phase subresource.
-
To query the
FileIntegrityCR status, run:$ oc get fileintegrities/worker-fileintegrity -o jsonpath="{ .status.phase }"Example outputActive
FileIntegrity custom resource phases
-
Pending- The phase after the custom resource (CR) is created. -
Active- The phase when the backing daemon set is up and running. -
Initializing- The phase when the AIDE database is being reinitialized.
Understanding the FileIntegrityNodeStatuses object
The scan results of the FileIntegrity CR are reported in another object called FileIntegrityNodeStatuses.
$ oc get fileintegritynodestatuses
NAME AGE
worker-fileintegrity-ip-10-0-130-192.ec2.internal 101s
worker-fileintegrity-ip-10-0-147-133.ec2.internal 109s
worker-fileintegrity-ip-10-0-165-160.ec2.internal 102s
Note
It might take some time for the FileIntegrityNodeStatus object results to be available.
There is one result object per node. The nodeName attribute of each FileIntegrityNodeStatus object corresponds to the node being scanned. The
status of the file integrity scan is represented in the results array, which holds scan conditions.
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq
The fileintegritynodestatus object reports the latest status of an AIDE run and exposes the status as Failed, Succeeded, or Errored in a status field.
$ oc get fileintegritynodestatuses -w
NAME NODE STATUS
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal ip-10-0-134-186.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal ip-10-0-150-230.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-169-137.us-east-2.compute.internal ip-10-0-169-137.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal ip-10-0-180-200.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal ip-10-0-194-66.us-east-2.compute.internal Failed
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal ip-10-0-222-188.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal ip-10-0-134-186.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal ip-10-0-222-188.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal ip-10-0-194-66.us-east-2.compute.internal Failed
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal ip-10-0-150-230.us-east-2.compute.internal Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal ip-10-0-180-200.us-east-2.compute.internal Succeeded
FileIntegrityNodeStatus CR status types
These conditions are reported in the results array of the corresponding FileIntegrityNodeStatus CR status:
-
Succeeded- The integrity check passed; the files and directories covered by the AIDE check have not been modified since the database was last initialized. -
Failed- The integrity check failed; some files or directories covered by the AIDE check have been modified since the database was last initialized. -
Errored- The AIDE scanner encountered an internal error.
FileIntegrityNodeStatus CR success example
[
{
"condition": "Succeeded",
"lastProbeTime": "2020-09-15T12:45:57Z"
}
]
[
{
"condition": "Succeeded",
"lastProbeTime": "2020-09-15T12:46:03Z"
}
]
[
{
"condition": "Succeeded",
"lastProbeTime": "2020-09-15T12:45:48Z"
}
]
In this case, all three scans succeeded and so far there are no other conditions.
FileIntegrityNodeStatus CR failure status example
To simulate a failure condition, modify one of the files AIDE tracks. For example, modify /etc/resolv.conf on one of the worker nodes:
$ oc debug node/ip-10-0-130-192.ec2.internal
Creating debug namespace/openshift-debug-node-ldfbj ...
Starting pod/ip-10-0-130-192ec2internal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.130.192
If you don't see a command prompt, try pressing enter.
sh-4.2# echo "# integrity test" >> /host/etc/resolv.conf
sh-4.2# exit
Removing debug pod ...
Removing debug namespace/openshift-debug-node-ldfbj ...
After some time, the Failed condition is reported in the results array of the corresponding FileIntegrityNodeStatus object. The previous Succeeded condition is retained, which allows you to pinpoint the time the check failed.
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io/worker-fileintegrity-ip-10-0-130-192.ec2.internal -ojsonpath='{.results}' | jq -r
Alternatively, if you are not mentioning the object name, run:
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq
[
{
"condition": "Succeeded",
"lastProbeTime": "2020-09-15T12:54:14Z"
},
{
"condition": "Failed",
"filesChanged": 1,
"lastProbeTime": "2020-09-15T12:57:20Z",
"resultConfigMapName": "aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed",
"resultConfigMapNamespace": "openshift-file-integrity"
}
]
The Failed condition points to a config map that gives more details about what exactly failed and why:
$ oc describe cm aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Name: aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Namespace: openshift-file-integrity
Labels: file-integrity.openshift.io/node=ip-10-0-130-192.ec2.internal
file-integrity.openshift.io/owner=worker-fileintegrity
file-integrity.openshift.io/result-log=
Annotations: file-integrity.openshift.io/files-added: 0
file-integrity.openshift.io/files-changed: 1
file-integrity.openshift.io/files-removed: 0
Data
integritylog:
------
AIDE 0.15.1 found differences between database and filesystem!!
Start timestamp: 2020-09-15 12:58:15
Summary:
Total number of files: 31553
Added files: 0
Removed files: 0
Changed files: 1
---------------------------------------------------
Changed files:
---------------------------------------------------
changed: /hostroot/etc/resolv.conf
---------------------------------------------------
Detailed information about changes:
---------------------------------------------------
File: /hostroot/etc/resolv.conf
SHA512 : sTQYpB/AL7FeoGtu/1g7opv6C+KT1CBJ , qAeM+a8yTgHPnIHMaRlS+so61EN8VOpg
Events: <none>
Due to the config map data size limit, AIDE logs over 1 MB are added to the failure config map as a base64-encoded gzip archive. Use the following command to extract the log:
$ oc get cm <failure-cm-name> -o json | jq -r '.data.integritylog' | base64 -d | gunzip
Note
Compressed logs are indicated by the presence of a file-integrity.openshift.io/compressed annotation key in the config map.
Understanding events
Transitions in the status of the FileIntegrity and FileIntegrityNodeStatus objects are logged by events. The creation time of the event reflects the latest transition, such as Initializing to Active, and not necessarily the latest scan result. However, the newest event always reflects the most recent status.
$ oc get events --field-selector reason=FileIntegrityStatus
LAST SEEN TYPE REASON OBJECT MESSAGE
97s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Pending
67s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Initializing
37s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Active
When a node scan fails, an event is created with the add/changed/removed and config map information.
$ oc get events --field-selector reason=NodeIntegrityStatus
LAST SEEN TYPE REASON OBJECT MESSAGE
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-134-173.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-168-238.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-169-175.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-152-92.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-158-144.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-131-30.ec2.internal
87m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
Changes to the number of added, changed, or removed files results in a new event, even if the status of the node has not transitioned.
$ oc get events --field-selector reason=NodeIntegrityStatus
LAST SEEN TYPE REASON OBJECT MESSAGE
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-134-173.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-168-238.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-169-175.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-152-92.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-158-144.ec2.internal
114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-131-30.ec2.internal
87m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
40m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:3,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed