Understanding the image-based upgrade for single-node OpenShift clusters
From OpenShift Container Platform 4.14.13, the Lifecycle Agent provides you with an alternative way to upgrade the platform version of a single-node OpenShift cluster. The image-based upgrade is faster than the standard upgrade method and allows you to directly upgrade from OpenShift Container Platform <4.y> to <4.y+2>, and <4.y.z> to <4.y.z+n>.
This upgrade method utilizes a generated OCI image from a dedicated seed cluster that is installed on the target single-node OpenShift cluster as a new ostree stateroot.
A seed cluster is a single-node OpenShift cluster deployed with the target OpenShift Container Platform version, Day 2 Operators, and configurations that are common to all target clusters.
You can use the seed image, which is generated from the seed cluster, to upgrade the platform version on any single-node OpenShift cluster that has the same combination of hardware, Day 2 Operators, and cluster configuration as the seed cluster.
Important
The image-based upgrade uses custom images that are specific to the hardware platform that the clusters are running on. Each different hardware platform requires a separate seed image.
The Lifecycle Agent uses two custom resources (CRs) on the participating clusters to orchestrate the upgrade:
-
On the seed cluster, the
SeedGeneratorCR allows for the seed image generation. This CR specifies the repository to push the seed image to. -
On the target cluster, the
ImageBasedUpgradeCR specifies the seed image for the upgrade of the target cluster and the backup configurations for your workloads.
apiVersion: lca.openshift.io/v1
kind: SeedGenerator
metadata:
name: seedimage
spec:
seedImage: <seed_image>
apiVersion: lca.openshift.io/v1
kind: ImageBasedUpgrade
metadata:
name: upgrade
spec:
stage: Idle
seedImageRef:
version: <target_version>
image: <seed_container_image>
pullSecretRef:
name: <seed_pull_secret>
autoRollbackOnFailure: {}
# initMonitorTimeoutSeconds: 1800
extraManifests:
- name: example-extra-manifests
namespace: openshift-lifecycle-agent
oadpContent:
- name: oadp-cm-example
namespace: openshift-adp
- Stage of the
ImageBasedUpgradeCR. The value can beIdle,Prep,Upgrade, orRollback. - Target platform version, seed image to be used, and the secret required to access the image.
- Optional: Time frame in seconds to roll back when the upgrade does not complete within that time frame after the first reboot. If not defined or set to
0, the default value of1800seconds (30 minutes) is used. - Optional: List of
ConfigMapresources that contain your custom catalog sources to retain after the upgrade, and your extra manifests to apply to the target cluster that are not part of the seed image. - List of
ConfigMapresources that contain the OADPBackupandRestoreCRs.
Stages of the image-based upgrade
After generating the seed image on the seed cluster, you can move through the stages on the target cluster by setting the spec.stage field to one of the following values in the ImageBasedUpgrade CR:
-
Idle -
Prep -
Upgrade -
Rollback(Optional)
Idle stage
The Lifecycle Agent creates an ImageBasedUpgrade CR set to stage: Idle when the Operator is first deployed.
This is the default stage.
There is no ongoing upgrade and the cluster is ready to move to the Prep stage.
You also move to the Idle stage to do one of the following steps:
-
Finalize a successful upgrade
-
Finalize a rollback
-
Cancel an ongoing upgrade until the pre-pivot phase in the
Upgradestage
Moving to the Idle stage ensures that the Lifecycle Agent cleans up resources, so that the cluster is ready for upgrades again.
Important
If using RHACM when you cancel an upgrade, you must remove the import.open-cluster-management.io/disable-auto-import annotation from the target managed cluster to re-enable the automatic import of the cluster.
Prep stage
Note
You can complete this stage before a scheduled maintenance window.
For the Prep stage, you specify the following upgrade details in the ImageBasedUpgrade CR:
-
seed image to use
-
resources to back up
-
extra manifests to apply and custom catalog sources to retain after the upgrade, if any
Then, based on what you specify, the Lifecycle Agent prepares for the upgrade without impacting the current running version.
During this stage, the Lifecycle Agent ensures that the target cluster is ready to proceed to the Upgrade stage by checking if it meets certain conditions.
The Operator pulls the seed image to the target cluster with additional container images specified in the seed image.
The Lifecycle Agent checks if there is enough space on the container storage disk and if necessary, the Operator deletes unpinned images until the disk usage is below the specified threshold.
For more information about how to configure or disable the cleaning up of the container storage disk, see "Configuring the automatic image cleanup of the container storage disk".
You also prepare backup resources with the OADP Operator’s Backup and Restore CRs.
These CRs are used in the Upgrade stage to reconfigure the cluster, register the cluster with RHACM, and restore application artifacts.
In addition to the OADP Operator, the Lifecycle Agent uses the ostree versioning system to create a backup, which allows complete cluster reconfiguration after both upgrade and rollback.
After the Prep stage finishes, you can cancel the upgrade process by moving to the Idle stage or you can start the upgrade by moving to the Upgrade stage in the ImageBasedUpgrade CR.
If you cancel the upgrade, the Operator performs cleanup operations.
Upgrade stage
The Upgrade stage consists of two phases:
- pre-pivot
-
Just before pivoting to the new stateroot, the Lifecycle Agent collects the required cluster specific artifacts and stores them in the new stateroot. The backup of your cluster resources specified in the
Prepstage are created on a compatible Object storage solution. The Lifecycle Agent exports CRs specified in theextraManifestsfield in theImageBasedUpgradeCR or the CRs described in the ZTP policies that are bound to the target cluster. After pre-pivot phase has completed, the Lifecycle Agent sets the new stateroot deployment as the default boot entry and reboots the node. - post-pivot
-
After booting from the new stateroot, the Lifecycle Agent also regenerates the seed image’s cluster cryptography. This ensures that each single-node OpenShift cluster upgraded with the same seed image has unique and valid cryptographic objects. The Operator then reconfigures the cluster by applying cluster-specific artifacts that were collected in the pre-pivot phase. The Operator applies all saved CRs, and restores the backups.
After the upgrade has completed and you are satisfied with the changes, you can finalize the upgrade by moving to the Idle stage.
Important
When you finalize the upgrade, you cannot roll back to the original release.
If you want to cancel the upgrade, you can do so until the pre-pivot phase of the Upgrade stage.
If you encounter issues after the upgrade, you can move to the Rollback stage for a manual rollback.
Rollback stage
The Rollback stage can be initiated manually or automatically upon failure.
During the Rollback stage, the Lifecycle Agent sets the original ostree stateroot deployment as default.
Then, the node reboots with the previous release of OpenShift Container Platform and application configurations.
Warning
If you move to the Idle stage after a rollback, the Lifecycle Agent cleans up resources that can be used to troubleshoot a failed upgrade.
The Lifecycle Agent initiates an automatic rollback if the upgrade does not complete within a specified time limit. For more information about the automatic rollback, see the "Moving to the Rollback stage with Lifecycle Agent" or "Moving to the Rollback stage with Lifecycle Agent and GitOps ZTP" sections.
Guidelines for the image-based upgrade
For a successful image-based upgrade, your deployments must meet certain requirements.
There are different deployment methods in which you can perform the image-based upgrade:
- GitOps ZTP
-
You use the GitOps Zero Touch Provisioning (ZTP) to deploy and configure your clusters.
- Non-GitOps
-
You manually deploy and configure your clusters.
You can perform an image-based upgrade in disconnected environments. For more information about how to mirror images for a disconnected environment, see "Mirroring images for a disconnected installation".
Minimum software version of components
Depending on your deployment method, the image-based upgrade requires the following minimum software versions.
| Component | Software version | Required |
|---|---|---|
Lifecycle Agent |
4.16 |
Yes |
OADP Operator |
1.4.1 |
Yes |
Managed cluster version |
4.14.13 |
Yes |
Hub cluster version |
4.16 |
No |
RHACM |
2.10.2 |
No |
GitOps ZTP plugin |
4.16 |
Only for GitOps ZTP deployment method |
Red Hat OpenShift GitOps |
1.12 |
Only for GitOps ZTP deployment method |
Topology Aware Lifecycle Manager (TALM) |
4.16 |
Only for GitOps ZTP deployment method |
Local Storage Operator [1] |
4.14 |
Yes |
Logical Volume Manager (LVM) Storage [1] |
4.14.2 |
Yes |
-
The persistent storage must be provided by either the LVM Storage or the Local Storage Operator, not both.
Hub cluster guidelines
If you are using Red Hat Advanced Cluster Management (RHACM), your hub cluster needs to meet the following conditions:
-
To avoid including any RHACM resources in your seed image, you need to disable all optional RHACM add-ons before generating the seed image.
-
Your hub cluster must be upgraded to at least the target version before performing an image-based upgrade on a target single-node OpenShift cluster.
Seed image guidelines
The seed image targets a set of single-node OpenShift clusters with the same hardware and similar configuration. This means that the seed cluster must match the configuration of the target clusters for the following items:
-
CPU topology
-
Number of CPU cores
-
Tuned performance configuration, such as number of reserved CPUs
-
-
MachineConfigresources for the target cluster -
IP version configuration, either IPv4, IPv6, or dual-stack networking
-
Set of Day 2 Operators, including the Lifecycle Agent and the OADP Operator
-
Disconnected registry
-
FIPS configuration
The following configurations only have to partially match on the participating clusters:
-
If the target cluster has a proxy configuration, the seed cluster must have a proxy configuration too but the configuration does not have to be the same.
-
A dedicated partition on the primary disk for container storage is required on all participating clusters. However, the size and start of the partition does not have to be the same. Only the
spec.config.storage.disks.partitions.label: varlibcontainerslabel in theMachineConfigCR must match on both the seed and target clusters. For more information about how to create the disk partition, see "Configuring a shared container partition between ostree stateroots" or "Configuring a shared container partition between ostree stateroots when using GitOps ZTP".
For more information about what to include in the seed image, see "Seed image configuration" and "Seed image configuration using the RAN DU profile".
OADP backup and restore guidelines
With the OADP Operator, you can back up and restore your applications on your target clusters by using Backup and Restore CRs wrapped in ConfigMap objects.
The application must work on the current and the target OpenShift Container Platform versions so that they can be restored after the upgrade.
The backups must include resources that were initially created.
The following resources must be excluded from the backup:
-
pods -
endpoints -
controllerrevision -
podmetrics -
packagemanifest -
replicaset -
localvolume, if using Local Storage Operator (LSO)
There are two local storage implementations for single-node OpenShift:
- Local Storage Operator (LSO)
-
The Lifecycle Agent automatically backs up and restores the required artifacts, including
localvolumeresources and their associatedStorageClassresources. You must exclude thepersistentvolumesresource in the applicationBackupCR. - LVM Storage
-
You must create the
BackupandRestoreCRs for LVM Storage artifacts. You must include thepersistentVolumesresource in the applicationBackupCR.
For the image-based upgrade, only one Operator is supported on a given target cluster.
Important
For both Operators, you must not apply the Operator CRs as extra manifests through the ImageBasedUpgrade CR.
The persistent volume contents are preserved and used after the pivot.
When you are configuring the DataProtectionApplication CR, you must ensure that the .spec.configuration.restic.enable is set to false for an image-based upgrade.
This disables Container Storage Interface integration.
lca.openshift.io/apply-wave guidelines
The lca.openshift.io/apply-wave annotation determines the apply order of Backup or Restore CRs.
The value of the annotation must be a string number.
If you define the lca.openshift.io/apply-wave annotation in the Backup or Restore CRs, they are applied in increasing order based on the annotation value.
If you do not define the annotation, they are applied together.
The lca.openshift.io/apply-wave annotation must be numerically lower in your platform Restore CRs, for example RHACM and LVM Storage artifacts, than that of the application.
This way, the platform artifacts are restored before your applications.
If your application includes cluster-scoped resources, you must create separate Backup and Restore CRs to scope the backup to the specific cluster-scoped resources created by the application.
The Restore CR for the cluster-scoped resources must be restored before the remaining application Restore CR(s).
lca.openshift.io/apply-label guidelines
You can back up specific resources exclusively with the lca.openshift.io/apply-label annotation.
Based on which resources you define in the annotation, the Lifecycle Agent applies the lca.openshift.io/backup: <backup_name> label and adds the labelSelector.matchLabels.lca.openshift.io/backup: <backup_name> label selector to the specified resources when creating the Backup CRs.
To use the lca.openshift.io/apply-label annotation for backing up specific resources, the resources listed in the annotation must also be included in the spec section.
If the lca.openshift.io/apply-label annotation is used in the Backup CR, only the resources listed in the annotation are backed up, even if other resource types are specified in the spec section or not.
apiVersion: velero.io/v1
kind: Backup
metadata:
name: acm-klusterlet
namespace: openshift-adp
annotations:
lca.openshift.io/apply-label: rbac.authorization.k8s.io/v1/clusterroles/klusterlet,apps/v1/deployments/open-cluster-management-agent/klusterlet
labels:
velero.io/storage-location: default
spec:
includedNamespaces:
- open-cluster-management-agent
includedClusterScopedResources:
- clusterroles
includedNamespaceScopedResources:
- deployments
- The value must be a list of comma-separated objects in
group/version/resource/nameformat for cluster-scoped resources orgroup/version/resource/namespace/nameformat for namespace-scoped resources, and it must be attached to the relatedBackupCR.
Extra manifest guidelines
The Lifecycle Agent uses extra manifests to restore your target clusters after rebooting with the new stateroot deployment and before restoring application artifacts.
Different deployment methods require a different way to apply the extra manifests:
- GitOps ZTP
-
You use the
lca.openshift.io/target-ocp-version: <target_ocp_version>label to mark the extra manifests that the Lifecycle Agent must extract and apply after the pivot. You can specify the number of manifests labeled withlca.openshift.io/target-ocp-versionby using thelca.openshift.io/target-ocp-version-manifest-countannotation in theImageBasedUpgradeCR. If specified, the Lifecycle Agent verifies that the number of manifests extracted from policies matches the number provided in the annotation during the prep and upgrade stages.Example for the lca.openshift.io/target-ocp-version-manifest-count annotationapiVersion: lca.openshift.io/v1 kind: ImageBasedUpgrade metadata: annotations: lca.openshift.io/target-ocp-version-manifest-count: "5" name: upgrade - Non-Gitops
-
You mark your extra manifests with the
lca.openshift.io/apply-waveannotation to determine the apply order. The labeled extra manifests are wrapped inConfigMapobjects and referenced in theImageBasedUpgradeCR that the Lifecycle Agent uses after the pivot.
If the target cluster uses custom catalog sources, you must include them as extra manifests that point to the correct release version.
Important
You cannot apply the following items as extra manifests:
-
MachineConfigobjects -
OLM Operator subscriptions