Configuring managed cluster policies by using PolicyGenerator resources

You can customize how Red Hat Advanced Cluster Management (RHACM) uses PolicyGenerator CRs to generate Policy CRs that configure the managed clusters that you provision.

Using RHACM and PolicyGenerator CRs is the recommended approach for managing policies and deploying them to managed clusters. This replaces the use of PolicyGenTemplate CRs for this purpose. For more information about PolicyGenerator resources, see the RHACM Policy Generator documentation.

Comparing RHACM PolicyGenerator and PolicyGenTemplate resource patching

PolicyGenerator custom resources (CRs) and PolicyGenTemplate CRs can be used in GitOps ZTP to generate RHACM policies for managed clusters.

There are advantages to using PolicyGenerator CRs over PolicyGenTemplate CRs when it comes to patching OpenShift Container Platform resources with GitOps ZTP. Using the RHACM PolicyGenerator API provides a generic way of patching resources which is not possible with PolicyGenTemplate resources.

The PolicyGenerator API is a part of the Open Cluster Management standard, while the PolicyGenTemplate API is not. A comparison of PolicyGenerator and PolicyGenTemplate resource patching and placement strategies are described in the following table.

Important

Using PolicyGenTemplate CRs to manage and deploy policies to managed clusters will be deprecated in an upcoming OpenShift Container Platform release. Equivalent and improved functionality is available using Red Hat Advanced Cluster Management (RHACM) and PolicyGenerator CRs.

For more information about PolicyGenerator resources, see the RHACM Integrating Policy Generator documentation.

Table 1. Comparison of RHACM PolicyGenerator and PolicyGenTemplate patching
PolicyGenerator patching	PolicyGenTemplate patching
Uses Kustomize strategic merges for merging resources. For more information see Declarative Management of Kubernetes Objects Using Kustomize.	Works by replacing variables with their values as defined by the patch. This is less flexible than Kustomize merge strategies.
Supports `ManagedClusterSet` and `Binding` resources.	Does not support `ManagedClusterSet` and `Binding` resources.
Relies only on patching, no embedded variable substitution is required.	Overwrites variable values defined in the patch.
Does not support merging lists in merge patches. Replacing a list in a merge patch is supported.	Merging and replacing lists is supported in a limited fashion - you can only merge one object in the list.
Does not currently support the OpenAPI specification for resource patching. This means that additional directives are required in the patch to merge content that does not follow a schema, for example, `PtpConfig` resources.	Works by replacing fields and values with values as defined by the patch.
Requires additional directives, for example, `$patch: replace` in the patch to merge content that does not follow a schema.	Substitutes fields and values defined in the source CR with values defined in the patch, for example `$name`.
Can patch the `Name` and `Namespace` fields defined in the reference source CR, but only if the CR file has a single object.	Can patch the `Name` and `Namespace` fields defined in the reference source CR.

About the PolicyGenerator CRD

The PolicyGenerator custom resource definition (CRD) tells the PolicyGen policy generator what custom resources (CRs) to include in the cluster configuration, how to combine the CRs into the generated policies, and what items in those CRs need to be updated with overlay content.

The following example shows a PolicyGenerator CR (acm-common-du-ranGen.yaml) extracted from the ztp-site-generate reference container. The acm-common-du-ranGen.yaml file defines two Red Hat Advanced Cluster Management (RHACM) policies. The policies manage a collection of configuration CRs, one for each unique value of policyName in the CR. acm-common-du-ranGen.yaml creates a single placement binding and a placement rule to bind the policies to clusters based on the labels listed in the policyDefaults.placement.labelSelector section.

Example PolicyGenerator CR - acm-common-ranGen.yaml

apiVersion: policy.open-cluster-management.io/v1
kind: PolicyGenerator
metadata:
    name: common-latest
placementBindingDefaults:
    name: common-latest-placement-binding 
policyDefaults:
    namespace: ztp-common
    placement:
        labelSelector:
            matchExpressions:
                - key: common
                  operator: In
                  values:
                    - "true"
                - key: du-profile
                  operator: In
                  values:
                    - latest
    remediationAction: inform
    severity: low
    namespaceSelector:
        exclude:
            - kube-*
        include:
            - '*'
    evaluationInterval:
        compliant: 10m
        noncompliant: 10s
policies:
    - name: common-latest-config-policy
      policyAnnotations:
        ran.openshift.io/ztp-deploy-wave: "1"
      manifests:
        - path: source-crs/ReduceMonitoringFootprint.yaml
        - path: source-crs/DefaultCatsrc.yaml 
          patches:
            - metadata:
                name: redhat-operators-disconnected
              spec:
                displayName: disconnected-redhat-operators
                image: registry.example.com:5000/disconnected-redhat-operators/disconnected-redhat-operator-index:v4.9
        - path: source-crs/DisconnectedICSP.yaml
          patches:
            - spec:
                repositoryDigestMirrors:
                    - mirrors:
                        - registry.example.com:5000
                      source: registry.redhat.io
    - name: common-latest-subscriptions-policy
      policyAnnotations:
        ran.openshift.io/ztp-deploy-wave: "2"
      manifests: 
        - path: source-crs/SriovSubscriptionNS.yaml
        - path: source-crs/SriovSubscriptionOperGroup.yaml
        - path: source-crs/SriovSubscription.yaml
        - path: source-crs/SriovOperatorStatus.yaml
        - path: source-crs/PtpSubscriptionNS.yaml
        - path: source-crs/PtpSubscriptionOperGroup.yaml
        - path: source-crs/PtpSubscription.yaml
        - path: source-crs/PtpOperatorStatus.yaml
        - path: source-crs/ClusterLogNS.yaml
        - path: source-crs/ClusterLogOperGroup.yaml
        - path: source-crs/ClusterLogSubscription.yaml
        - path: source-crs/ClusterLogOperatorStatus.yaml
        - path: source-crs/StorageNS.yaml
        - path: source-crs/StorageOperGroup.yaml
        - path: source-crs/StorageSubscription.yaml
        - path: source-crs/StorageOperatorStatus.yaml

Applies the policies to all clusters with this label.
The DefaultCatsrc.yaml file contains the catalog source for the disconnected registry and related registry configuration details.
Files listed under policies.manifests create the Operator policies for installed clusters.

A PolicyGenerator CR can be constructed with any number of included CRs. Apply the following example CR in the hub cluster to generate a policy containing a single CR:

apiVersion: policy.open-cluster-management.io/v1
kind: PolicyGenerator
metadata:
  name: group-du-sno
placementBindingDefaults:
  name: group-du-sno-placement-binding
policyDefaults:
  namespace: ztp-group
  placement:
    labelSelector:
      matchExpressions:
        - key: group-du-sno
          operator: Exists
  remediationAction: inform
  severity: low
  namespaceSelector:
    exclude:
      - kube-*
    include:
      - '*'
  evaluationInterval:
    compliant: 10m
    noncompliant: 10s
policies:
  - name: group-du-sno-config-policy
    policyAnnotations:
      ran.openshift.io/ztp-deploy-wave: '10'
    manifests:
      - path: source-crs/PtpConfigSlave-MCP-master.yaml
        patches:
          - metadata: null
            name: du-ptp-slave
            namespace: openshift-ptp
            annotations:
              ran.openshift.io/ztp-deploy-wave: '10'
            spec:
              profile:
                - name: slave
                  interface: $interface
                  ptp4lOpts: '-2 -s'
                  phc2sysOpts: '-a -r -n 24'
                  ptpSchedulingPolicy: SCHED_FIFO
                  ptpSchedulingPriority: 10
                  ptpSettings:
                    logReduce: 'true'
                  ptp4lConf: |
                    [global]
                    #
                    # Default Data Set
                    #
                    twoStepFlag 1
                    slaveOnly 1
                    priority1 128
                    priority2 128
                    domainNumber 24
                    #utc_offset 37
                    clockClass 255
                    clockAccuracy 0xFE
                    offsetScaledLogVariance 0xFFFF
                    free_running 0
                    freq_est_interval 1
                    dscp_event 0
                    dscp_general 0
                    dataset_comparison G.8275.x
                    G.8275.defaultDS.localPriority 128
                    #
                    # Port Data Set
                    #
                    logAnnounceInterval -3
                    logSyncInterval -4
                    logMinDelayReqInterval -4
                    logMinPdelayReqInterval -4
                    announceReceiptTimeout 3
                    syncReceiptTimeout 0
                    delayAsymmetry 0
                    fault_reset_interval -4
                    neighborPropDelayThresh 20000000
                    masterOnly 0
                    G.8275.portDS.localPriority 128
                    #
                    # Run time options
                    #
                    assume_two_step 0
                    logging_level 6
                    path_trace_enabled 0
                    follow_up_info 0
                    hybrid_e2e 0
                    inhibit_multicast_service 0
                    net_sync_monitor 0
                    tc_spanning_tree 0
                    tx_timestamp_timeout 50
                    unicast_listen 0
                    unicast_master_table 0
                    unicast_req_duration 3600
                    use_syslog 1
                    verbose 0
                    summary_interval 0
                    kernel_leap 1
                    check_fup_sync 0
                    clock_class_threshold 7
                    #
                    # Servo Options
                    #
                    pi_proportional_const 0.0
                    pi_integral_const 0.0
                    pi_proportional_scale 0.0
                    pi_proportional_exponent -0.3
                    pi_proportional_norm_max 0.7
                    pi_integral_scale 0.0
                    pi_integral_exponent 0.4
                    pi_integral_norm_max 0.3
                    step_threshold 2.0
                    first_step_threshold 0.00002
                    max_frequency 900000000
                    clock_servo pi
                    sanity_freq_limit 200000000
                    ntpshm_segment 0
                    #
                    # Transport options
                    #
                    transportSpecific 0x0
                    ptp_dst_mac 01:1B:19:00:00:00
                    p2p_dst_mac 01:80:C2:00:00:0E
                    udp_ttl 1
                    udp6_scope 0x0E
                    uds_address /var/run/ptp4l
                    #
                    # Default interface options
                    #
                    clock_type OC
                    network_transport L2
                    delay_mechanism E2E
                    time_stamping hardware
                    tsproc_mode filter
                    delay_filter moving_median
                    delay_filter_length 10
                    egressLatency 0
                    ingressLatency 0
                    boundary_clock_jbod 0
                    #
                    # Clock description
                    #
                    productDescription ;;
                    revisionData ;;
                    manufacturerIdentity 00:00:00
                    userDescription ;
                    timeSource 0xA0
              recommend:
                - profile: slave
                  priority: 4
                  match:
                    - nodeLabel: node-role.kubernetes.io/master

Using the source file PtpConfigSlave.yaml as an example, the file defines a PtpConfig CR. The generated policy for the PtpConfigSlave example is named group-du-sno-config-policy. The PtpConfig CR defined in the generated group-du-sno-config-policy is named du-ptp-slave. The spec defined in PtpConfigSlave.yaml is placed under du-ptp-slave along with the other spec items defined under the source file.

The following example shows the group-du-sno-config-policy CR:

---
apiVersion: policy.open-cluster-management.io/v1
kind: PolicyGenerator
metadata:
    name: du-upgrade
placementBindingDefaults:
    name: du-upgrade-placement-binding
policyDefaults:
    namespace: ztp-group-du-sno
    placement:
        labelSelector:
            matchExpressions:
                - key: group-du-sno
                  operator: Exists
    remediationAction: inform
    severity: low
    namespaceSelector:
        exclude:
            - kube-*
        include:
            - '*'
    evaluationInterval:
        compliant: 10m
        noncompliant: 10s
policies:
    - name: du-upgrade-operator-catsrc-policy
      policyAnnotations:
        ran.openshift.io/ztp-deploy-wave: "1"
      manifests:
        - path: source-crs/DefaultCatsrc.yaml
          patches:
            - metadata:
                name: redhat-operators
              spec:
                displayName: Red Hat Operators Catalog
                image: registry.example.com:5000/olm/redhat-operators:v4.14
                updateStrategy:
                    registryPoll:
                        interval: 1h
              status:
                connectionState:
                    lastObservedState: READY

Recommendations when customizing PolicyGenerator CRs

Consider the following best practices when customizing site configuration PolicyGenerator custom resources (CRs):

Use as few policies as are necessary. Using fewer policies requires less resources. Each additional policy creates increased CPU load for the hub cluster and the deployed managed cluster. CRs are combined into policies based on the policyName field in the PolicyGenerator CR. CRs in the same PolicyGenerator which have the same value for policyName are managed under a single policy.
In disconnected environments, use a single catalog source for all Operators by configuring the registry as a single index containing all Operators. Each additional CatalogSource CR on the managed clusters increases CPU usage.
Reduce the overall time taken until the cluster is ready to deploy applications by including MachineConfig CRs as extra manifests in the installation. To do this, package MachineConfig CRs in a ConfigMap CR. Reference the ConfigMap CRs in the extraManifestsRefs field in the ClusterInstance CR.
PolicyGenerator CRs should override the channel field to explicitly identify the desired version. This ensures that changes in the source CR during upgrades does not update the generated subscription.
The default setting for policyDefaults.consolidateManifests is true. This is the recommended setting for DU profile. Setting it to false might impact large scale deployments.
The default setting for policyDefaults.orderPolicies is false. This is the recommended setting for DU profile. After the cluster installation is complete and a cluster becomes Ready, TALM creates a ClusterGroupUpgrade CR corresponding to this cluster. The ClusterGroupUpgrade CR contains a list of ordered policies defined by the ran.openshift.io/ztp-deploy-wave annotation. If you use the PolicyGenerator CR to change the order of the policies, conflicts might occur and the configuration might not be applied.

Additional resources

For recommendations about scaling clusters with RHACM, see Performance and scalability.

Note

When managing large numbers of spoke clusters on the hub cluster, minimize the number of policies to reduce resource consumption.

Grouping multiple configuration CRs into a single or limited number of policies is one way to reduce the overall number of policies on the hub cluster. When using the common, group, and site hierarchy of policies for managing site configuration, it is especially important to combine site-specific configuration into a single policy.

PolicyGenerator CRs for RAN deployments

Use PolicyGenerator custom resources (CRs) to customize the configuration applied to the cluster by using the GitOps Zero Touch Provisioning (ZTP) pipeline. The PolicyGenerator CR allows you to generate one or more policies to manage the set of configuration CRs on your fleet of clusters. The PolicyGenerator CR identifies the set of managed CRs, bundles them into policies, builds the policy wrapping around those CRs, and associates the policies with clusters by using label binding rules.

The reference configuration, obtained from the GitOps ZTP container, is designed to provide a set of critical features and node tuning settings that ensure the cluster can support the stringent performance and resource utilization constraints typical of RAN (Radio Access Network) Distributed Unit (DU) applications. Changes or omissions from the baseline configuration can affect feature availability, performance, and resource utilization. Use the reference PolicyGenerator CRs as the basis to create a hierarchy of configuration files tailored to your specific site requirements.

The baseline PolicyGenerator CRs that are defined for RAN DU cluster configuration can be extracted from the GitOps ZTP ztp-site-generate container. See "Preparing the GitOps ZTP site configuration repository" for further details.

The PolicyGenerator CRs can be found in the ./out/argocd/example/acmpolicygenerator/ folder. The reference architecture has common, group, and site-specific configuration CRs. Each PolicyGenerator CR refers to other CRs that can be found in the ./out/source-crs folder.

The PolicyGenerator CRs relevant to RAN cluster configuration are described below. Variants are provided for the group PolicyGenerator CRs to account for differences in single-node, three-node compact, and standard cluster configurations. Similarly, site-specific configuration variants are provided for single-node clusters and multi-node (compact or standard) clusters. Use the group and site-specific configuration variants that are relevant for your deployment.

Table 2. PolicyGenerator CRs for RAN deployments
PolicyGenerator CR	Description
`acm-example-multinode-site.yaml`	Contains a set of CRs that get applied to multi-node clusters. These CRs configure SR-IOV features typical for RAN installations.
`acm-example-sno-site.yaml`	Contains a set of CRs that get applied to single-node OpenShift clusters. These CRs configure SR-IOV features typical for RAN installations.
`acm-common-mno-ranGen.yaml`	Contains a set of common RAN policy configuration that get applied to multi-node clusters.
`acm-common-ranGen.yaml`	Contains a set of common RAN CRs that get applied to all clusters. These CRs subscribe to a set of operators providing cluster features typical for RAN as well as baseline cluster tuning.
`acm-group-du-3node-ranGen.yaml`	Contains the RAN policies for three-node clusters only.
`acm-group-du-sno-ranGen.yaml`	Contains the RAN policies for single-node clusters only.
`acm-group-du-standard-ranGen.yaml`	Contains the RAN policies for standard three control-plane clusters.
`acm-group-du-3node-validator-ranGen.yaml`	`PolicyGenerator` CR used to generate the various policies required for three-node clusters.
`acm-group-du-standard-validator-ranGen.yaml`	`PolicyGenerator` CR used to generate the various policies required for standard clusters.
`acm-group-du-sno-validator-ranGen.yaml`	`PolicyGenerator` CR used to generate the various policies required for single-node OpenShift clusters.

Additional resources

Preparing the GitOps ZTP site configuration repository

Customizing a managed cluster with PolicyGenerator CRs

Use the following procedure to customize the policies that get applied to the managed cluster that you provision using the GitOps Zero Touch Provisioning (ZTP) pipeline.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in to the hub cluster as a user with cluster-admin privileges.
You configured the hub cluster for generating the required installation and policy CRs.
You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.

Procedure

Create a PolicyGenerator CR for site-specific configuration CRs.
1. Choose the appropriate example for your CR from the out/argocd/example/acmpolicygenerator/ folder, for example, acm-example-sno-site.yaml or acm-example-multinode-site.yaml.
2. Change the policyDefaults.placement.labelSelector field in the example file to match the site-specific label included in the ClusterInstance CR. In the example ClusterInstance file, the site-specific label is sites: example-sno.
  
  Note
  
  Ensure that the labels defined in your PolicyGenerator policyDefaults.placement.labelSelector field correspond to the labels that are defined in the related managed clusters ClusterInstance CR.
3. Change the content in the example file to match the desired configuration.
Optional: Create a PolicyGenerator CR for any common configuration CRs that apply to the entire fleet of clusters.
1. Select the appropriate example for your CR from the out/argocd/example/acmpolicygenerator/ folder, for example, acm-common-ranGen.yaml.
2. Change the content in the example file to match the required configuration.
Optional: Create a PolicyGenerator CR for any group configuration CRs that apply to the certain groups of clusters in the fleet.

Ensure that the content of the overlaid spec files matches your required end state. As a reference, the out/source-crs directory contains the full list of source-crs available to be included and overlaid by your PolicyGenerator templates.

Note

Depending on the specific requirements of your clusters, you might need more than a single group policy per cluster type, especially considering that the example group policies each have a single PerformancePolicy.yaml file that can only be shared across a set of clusters if those clusters consist of identical hardware configurations.
1. Select the appropriate example for your CR from the out/argocd/example/acmpolicygenerator/ folder, for example, acm-group-du-sno-ranGen.yaml.
2. Change the content in the example file to match the required configuration.
Optional. Create a validator inform policy PolicyGenerator CR to signal when the GitOps ZTP installation and configuration of the deployed cluster is complete. For more information, see "Creating a validator inform policy".
Define all the policy namespaces in a YAML file similar to the example out/argocd/example/acmpolicygenerator//ns.yaml file.

Important

Do not include the Namespace CR in the same file with the PolicyGenerator CR.
Add the PolicyGenerator CRs and Namespace CR to the kustomization.yaml file in the generators section, similar to the example shown in out/argocd/example/acmpolicygenerator/kustomization.yaml.
Commit the PolicyGenerator CRs, Namespace CR, and associated kustomization.yaml file in your Git repository and push the changes.

The ArgoCD pipeline detects the changes and begins the managed cluster deployment. You can push the changes to the ClusterInstance CR and the PolicyGenerator CR simultaneously.

Additional resources

Signalling GitOps ZTP cluster deployment completion with validator inform policies

Monitoring managed cluster policy deployment progress

The ArgoCD pipeline uses PolicyGenerator CRs in Git to generate the RHACM policies and then sync them to the hub cluster. You can monitor the progress of the managed cluster policy synchronization after the assisted service installs OpenShift Container Platform on the managed cluster.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in to the hub cluster as a user with cluster-admin privileges.

Procedure

The Topology Aware Lifecycle Manager (TALM) applies the configuration policies that are bound to the cluster.

After the cluster installation is complete and the cluster becomes Ready, a ClusterGroupUpgrade CR corresponding to this cluster, with a list of ordered policies defined by the ran.openshift.io/ztp-deploy-wave annotations, is automatically created by the TALM. The cluster’s policies are applied in the order listed in ClusterGroupUpgrade CR.

You can monitor the high-level progress of configuration policy reconciliation by using the following commands:
```
$ export CLUSTER=<clusterName>
```
```
$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[-1:]}' | jq
```
Example output
```
{
  "lastTransitionTime": "2022-11-09T07:28:09Z",
  "message": "The ClusterGroupUpgrade CR has upgrade policies that are still non compliant",
  "reason": "InProgress",
  "status": "True",
  "type": "Progressing"
}
```

You can monitor the detailed cluster policy compliance status by using the RHACM dashboard or the command line.

To check policy compliance by using oc, run the following command:

$ oc get policies -n $CLUSTER

Example output

NAME                                                     REMEDIATION ACTION   COMPLIANCE STATE   AGE
ztp-common.common-config-policy                          inform               Compliant          3h42m
ztp-common.common-subscriptions-policy                   inform               NonCompliant       3h42m
ztp-group.group-du-sno-config-policy                     inform               NonCompliant       3h42m
ztp-group.group-du-sno-validator-du-policy               inform               NonCompliant       3h42m
ztp-install.example1-common-config-policy-pjz9s          enforce              Compliant          167m
ztp-install.example1-common-subscriptions-policy-zzd9k   enforce              NonCompliant       164m
ztp-site.example1-config-policy                          inform               NonCompliant       3h42m
ztp-site.example1-perf-policy                            inform               NonCompliant       3h42m

To check policy status from the RHACM web console, perform the following actions:
1. Click Governance → Find policies.
2. Click on a cluster policy to check its status.

When all of the cluster policies become compliant, GitOps ZTP installation and configuration for the cluster is complete. The ztp-done label is added to the cluster.

In the reference configuration, the final policy that becomes compliant is the one defined in the *-du-validator-policy policy. This policy, when compliant on a cluster, ensures that all cluster configuration, Operator installation, and Operator configuration is complete.

Coordinating reboots for configuration changes

You can use Topology Aware Lifecycle Manager (TALM) to coordinate reboots across a fleet of spoke clusters when configuration changes require a reboot, such as deferred tuning changes. TALM reboots all nodes in the targeted MachineConfigPool on the selected clusters when the reboot policy is applied.

Instead of rebooting nodes after each individual change, you can apply all configuration updates through policies and then trigger a single, coordinated reboot.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in to the hub cluster as a user with cluster-admin privileges.
You have deployed and configured TALM.

Procedure

Generate the configuration policies by creating a PolicyGenerator custom resource (CR). You can use one of the following sample manifests:
- out/argocd/example/acmpolicygenerator/acm-example-sno-reboot
- out/argocd/example/acmpolicygenerator/acm-example-multinode-reboot
Update the policyDefaults.placement.labelSelector field in the PolicyGenerator CR to target the clusters that you want to reboot. Modify other fields as necessary for your use case.

If you are coordinating a reboot to apply a deferred tuning change, ensure the MachineConfigPool in the reboot policy matches the value specified in the spec.recommend field in the Tuned object.
Apply the PolicyGenerator CR to generate and apply the configuration policies. For detailed steps, see "Customizing a managed cluster with PolicyGenerator CRs".
After ArgoCD completes syncing the policies, create and apply the ClusterGroupUpgrade (CGU) CR.
Example CGU custom resource configuration
```
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: reboot
  namespace: default
spec:
  clusterLabelSelectors:
  - matchLabels: 
# ...
  enable: true
  managedPolicies: 
  - example-reboot
  remediationStrategy:
    timeout: 300 
    maxConcurrency: 10
# ...
```
1. Configure the labels that match the clusters you want to reboot.
2. Add all required configuration policies before the reboot policy. TALM applies the configuration changes as specified in the policies, in the order they are listed.
3. Specify the timeout in seconds for the entire upgrade across all selected clusters. Set this field by considering the worst-case scenario.
After you apply the CGU custom resource, TALM rolls out the configuration policies in order. Once all policies are compliant, it applies the reboot policy and triggers a reboot of all nodes in the specified MachineConfigPool.

Verification

Monitor the CGU rollout status.

You can monitor the rollout of the CGU custom resource on the hub by checking the status. Verify the successful rollout of the reboot by running the following command:
```
oc get cgu -A
```
Example output
```
NAMESPACE   NAME     AGE   STATE       DETAILS
default     reboot   1d    Completed   All clusters are compliant with all the managed policies
```

Verify successful reboot on a specific node.

To confirm that the reboot was successful on a specific node, check the status of the MachineConfigPool (MCP) for the node by running the following command:

oc get mcp master

Example output

NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-be5785c3b98eb7a1ec902fef2b81e865   True      False      False      3              3                   3                     0                      72d

Additional resources

Customizing a managed cluster with PolicyGenerator CRs

Validating the generation of configuration policy CRs

Policy custom resources (CRs) are generated in the same namespace as the PolicyGenerator from which they are created. The same troubleshooting flow applies to all policy CRs generated from a PolicyGenerator regardless of whether they are ztp-common, ztp-group, or ztp-site based, as shown using the following commands:

$ export NS=<namespace>

$ oc get policy -n $NS

The expected set of policy-wrapped CRs should be displayed.

If the policies failed synchronization, use the following troubleshooting steps.

Procedure

To display detailed information about the policies, run the following command:
```
$ oc describe -n openshift-gitops application policies
```

Check for Status: Conditions: to show the error logs. For example, setting an invalid sourceFile entry to fileName: generates the error shown below:

Status:
  Conditions:
    Last Transition Time:  2021-11-26T17:21:39Z
    Message:               rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/policies/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not find test.yaml under source-crs/: no such file or directory Error: failure in plugin configured via /tmp/kust-plugin-config-52463179; exit status 1: exit status 1
    Type:  ComparisonError

Check for Status: Sync:. If there are log errors at Status: Conditions:, the Status: Sync: shows Unknown or Error:

Status:
  Sync:
    Compared To:
      Destination:
        Namespace:  policies-sub
        Server:     https://kubernetes.default.svc
      Source:
        Path:             policies
        Repo URL:         https://git.com/ran-sites/policies/.git
        Target Revision:  master
    Status:               Error

When Red Hat Advanced Cluster Management (RHACM) recognizes that policies apply to a ManagedCluster object, the policy CR objects are applied to the cluster namespace. Check to see if the policies were copied to the cluster namespace:

$ oc get policy -n $CLUSTER

Example output

NAME                                         REMEDIATION ACTION   COMPLIANCE STATE   AGE
ztp-common.common-config-policy              inform               Compliant          13d
ztp-common.common-subscriptions-policy       inform               Compliant          13d
ztp-group.group-du-sno-config-policy         inform               Compliant          13d
ztp-group.group-du-sno-validator-du-policy   inform               Compliant          13d
ztp-site.example-sno-config-policy           inform               Compliant          13d

RHACM copies all applicable policies into the cluster namespace. The copied policy names have the format: <PolicyGenerator.Namespace>.<PolicyGenerator.Name>-<policyName>.

Check the placement rule for any policies not copied to the cluster namespace. The matchSelector in the Placement for those policies should match labels on the ManagedCluster object:
```
$ oc get Placement -n $NS
```
Note the Placement name appropriate for the missing policy, common, group, or site, using the following command:
```
$ oc get Placement -n $NS <placement_rule_name> -o yaml
```
- The status-decisions should include your cluster name.
- The key-value pair of the matchSelector in the spec must match the labels on your managed cluster.
Check the labels on the ManagedCluster object by using the following command:
```
$ oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq
```
Check to see what policies are compliant by using the following command:
```
$ oc get policy -n $CLUSTER
```
If the Namespace, OperatorGroup, and Subscription policies are compliant but the Operator configuration policies are not, it is likely that the Operators did not install on the managed cluster. This causes the Operator configuration policies to fail to apply because the CRD is not yet applied to the spoke.

Restarting policy reconciliation

You can restart policy reconciliation when unexpected compliance issues occur, for example, when the ClusterGroupUpgrade custom resource (CR) has timed out.

Procedure

A ClusterGroupUpgrade CR is generated in the namespace ztp-install by the Topology Aware Lifecycle Manager after the managed cluster becomes Ready:
```
$ export CLUSTER=<clusterName>
```
```
$ oc get clustergroupupgrades -n ztp-install $CLUSTER
```
If there are unexpected issues and the policies fail to become complaint within the configured timeout (the default is 4 hours), the status of the ClusterGroupUpgrade CR shows UpgradeTimedOut:
```
$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
```
A ClusterGroupUpgrade CR in the UpgradeTimedOut state automatically restarts its policy reconciliation every hour. If you have changed your policies, you can start a retry immediately by deleting the existing ClusterGroupUpgrade CR. This triggers the automatic creation of a new ClusterGroupUpgrade CR that begins reconciling the policies immediately:
```
$ oc delete clustergroupupgrades -n ztp-install $CLUSTER
```

Note that when the ClusterGroupUpgrade CR completes with status UpgradeCompleted and the managed cluster has the label ztp-done applied, you can make additional configuration changes by using PolicyGenerator. Deleting the existing ClusterGroupUpgrade CR will not make the TALM generate a new CR.

At this point, GitOps ZTP has completed its interaction with the cluster and any further interactions should be treated as an update and a new ClusterGroupUpgrade CR created for remediation of the policies.

Additional resources

For information about using Topology Aware Lifecycle Manager (TALM) to construct your own ClusterGroupUpgrade CR, see About the ClusterGroupUpgrade CR.

Changing applied managed cluster CRs using policies

You can remove content from a custom resource (CR) that is deployed in a managed cluster through a policy.

By default, all Policy CRs created from a PolicyGenerator CR have the complianceType field set to musthave. A musthave policy without the removed content is still compliant because the CR on the managed cluster has all the specified content. With this configuration, when you remove content from a CR, TALM removes the content from the policy but the content is not removed from the CR on the managed cluster.

With the complianceType field to mustonlyhave, the policy ensures that the CR on the cluster is an exact match of what is specified in the policy.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in to the hub cluster as a user with cluster-admin privileges.
You have deployed a managed cluster from a hub cluster running RHACM.
You have installed Topology Aware Lifecycle Manager on the hub cluster.

Procedure

Remove the content that you no longer need from the affected CRs. In this example, the disableDrain: false line was removed from the SriovOperatorConfig CR.

Example CR

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  configDaemonNodeSelector:
    "node-role.kubernetes.io/$mcp": ""
  disableDrain: true
  enableInjector: true
  enableOperatorWebhook: true

Change the complianceType of the affected policies to mustonlyhave in the acm-group-du-sno-ranGen.yaml file.

Example YAML

# ...
policyDefaults:
  complianceType: "mustonlyhave"
# ...
policies:
  - name: config-policy
    policyAnnotations:
      ran.openshift.io/ztp-deploy-wave: ""
    manifests:
      - path: source-crs/SriovOperatorConfig.yaml

Create a ClusterGroupUpdates CR and specify the clusters that must receive the CR changes::

Example ClusterGroupUpdates CR

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-remove
  namespace: default
spec:
  managedPolicies:
    - ztp-group.group-du-sno-config-policy
  enable: false
  clusters:
  - spoke1
  - spoke2
  remediationStrategy:
    maxConcurrency: 2
    timeout: 240
  batchTimeoutAction:

Create the ClusterGroupUpgrade CR by running the following command:
```
$ oc create -f cgu-remove.yaml
```
When you are ready to apply the changes, for example, during an appropriate maintenance window, change the value of the spec.enable field to true by running the following command:
```
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-remove \
--patch '{"spec":{"enable":true}}' --type=merge
```

Verification

Check the status of the policies by running the following command:

$ oc get <kind> <changed_cr_name>

Example output

NAMESPACE   NAME                                                   REMEDIATION ACTION   COMPLIANCE STATE   AGE
default     cgu-ztp-group.group-du-sno-config-policy               enforce                                 17m
default     ztp-group.group-du-sno-config-policy                   inform               NonCompliant       15h

When the COMPLIANCE STATE of the policy is Compliant, it means that the CR is updated and the unwanted content is removed.

Check that the policies are removed from the targeted clusters by running the following command on the managed clusters:
```
$ oc get <kind> <changed_cr_name>
```
If there are no results, the CR is removed from the managed cluster.

Indication of done for GitOps ZTP installations

GitOps Zero Touch Provisioning (ZTP) simplifies the process of checking the GitOps ZTP installation status for a cluster. The GitOps ZTP status moves through three phases: cluster installation, cluster configuration, and GitOps ZTP done.

Cluster installation phase

The cluster installation phase is shown by the ManagedClusterJoined and ManagedClusterAvailable conditions in the ManagedCluster CR . If the ManagedCluster CR does not have these conditions, or the condition is set to False, the cluster is still in the installation phase. Additional details about installation are available from the AgentClusterInstall and ClusterDeployment CRs. For more information, see "Troubleshooting GitOps ZTP".

Cluster configuration phase

The cluster configuration phase is shown by a ztp-running label applied the ManagedCluster CR for the cluster.

GitOps ZTP done

Cluster installation and configuration is complete in the GitOps ZTP done phase. This is shown by the removal of the ztp-running label and addition of the ztp-done label to the ManagedCluster CR. The ztp-done label shows that the configuration has been applied and the baseline DU configuration has completed cluster tuning.

The change to the GitOps ZTP done state is conditional on the compliant state of a Red Hat Advanced Cluster Management (RHACM) validator inform policy. This policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when GitOps ZTP provisioning of the managed cluster is complete.

The validator inform policy ensures the configuration of the cluster is fully applied and Operators have completed their initialization. The policy validates the following:

The target MachineConfigPool contains the expected entries and has finished updating. All nodes are available and not degraded.
The SR-IOV Operator has completed initialization as indicated by at least one SriovNetworkNodeState with syncStatus: Succeeded.
The PTP Operator daemon set exists.

Configuring an OpenAPI schema for patching list fields by using the PolicyGenerator CR

You can configure an OpenAPI schema in the PolicyGenerator custom resource (CR) to control how list fields are merged when patching non-core Kubernetes objects.

By default, patching list fields can replace entire lists when the resource does not define merge behavior. An OpenAPI schema defines how list items are uniquely identified and merged during policy generation.

Prerequisites

You have created a PolicyGenerator CR.
You have access to a running cluster if you need to generate a schema.

Procedure

Obtain an OpenAPI schema for the resources that you want to patch:
1. If an OpenAPI schema is available for the custom resource that you want to patch, use that schema file.
2. If a schema is not available, generate it from an active cluster by running the following command:
  kustomize openapi fetch
Edit the generated schema file to keep only the resource definitions that you need to patch.

Removing unrelated definitions simplifies the schema and reduces maintenance effort.
Define merge behavior for list fields that you want to patch. For each list of objects that you want to patch, add fields that specify how list items are uniquely identified and merged. For example:
```
"x-kubernetes-patch-merge-key": "name"
"x-kubernetes-patch-strategy": "merge"
```
- x-kubernetes-patch-merge-key specifies the field that uniquely identifies an object in the list. For example, setting this field to name uses the name field to identify list items.
- x-kubernetes-patch-strategy specifies how the patch is applied to the identified list item. The following are the supported values:
  - merge: Merges the fields from the patch into the existing list item.
  - replace: Replaces the entire list item identified by the merge key with the patch content.
Save the schema file in the directory that contains the kustomization.yaml file.
Reference the OpenAPI schema in the kustomization.yaml file:
```
openapi:
  path: schema.json
```

Configure the OpenAPI schema path in the PolicyGenerator CR:

Example PolicyGenerator CR for patching list fields by using an OpenAPI schema

apiVersion: policy.open-cluster-management.io/v1
kind: PolicyGenerator
metadata:
  name: policy-generator-example
policies:
  - name: myapp
    manifests:
      - path: input-kustomize/
        patches: []
        openapi:
          path: schema.json

Generate or apply the policies by using the policy generator.

The policy generator passes the OpenAPI schema to Kustomize to control how list fields are patched.