OADP timeouts
Extending a timeout allows complex or resource-intensive processes to complete successfully without premature termination. This configuration can reduce errors, retries, or failures.
Ensure that you balance timeout extensions in a logical manner so that you do not configure excessively long timeouts that might hide underlying issues in the process. Consider and monitor an appropriate timeout value that meets the needs of the process and the overall system performance.
The following OADP timeouts show instructions of how and when to implement these parameters:
Implementing restic timeout
The spec.configuration.nodeAgent.timeout parameter defines the Restic timeout. The default value is 1h.
Use the Restic timeout parameter in the nodeAgent section for the following scenarios:
-
For Restic backups with total PV data usage that is greater than 500GB.
-
If backups are timing out with the following error:
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete"
-
Edit the values in the
spec.configuration.nodeAgent.timeoutblock of theDataProtectionApplicationcustom resource (CR) manifest, as shown in the following example:apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_name> spec: configuration: nodeAgent: enable: true uploaderType: restic timeout: 1h # ...
Implementing velero resource timeout
resourceTimeout defines how long to wait for several Velero resources before timeout occurs, such as Velero custom resource definition (CRD) availability, volumeSnapshot deletion, and repository availability. The default is 10m.
Use the resourceTimeout for the following scenarios:
-
For backups with total PV data usage that is greater than 1 TB. This parameter is used as a timeout value when Velero tries to clean up or delete the Container Storage Interface (CSI) snapshots, before marking the backup as complete.
-
A sub-task of this cleanup tries to patch VSC, and this timeout can be used for that task.
-
-
To create or ensure a backup repository is ready for filesystem based backups for Restic or Kopia.
-
To check if the Velero CRD is available in the cluster before restoring the custom resource (CR) or resource from the backup.
-
Edit the values in the
spec.configuration.velero.resourceTimeoutblock of theDataProtectionApplicationCR manifest, as shown in the following example:apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_name> spec: configuration: velero: resourceTimeout: 10m # ...
Implementing velero default item operation timeout
The defaultItemOperationTimeout setting defines how long to wait on asynchronous BackupItemActions and RestoreItemActions to complete before timing out. The default value is 1h.
Use the defaultItemOperationTimeout for the following scenarios:
-
Only with Data Mover 1.2.x.
-
To specify the amount of time a particular backup or restore should wait for the Asynchronous actions to complete. In the context of OADP features, this value is used for the Asynchronous actions involved in the Container Storage Interface (CSI) Data Mover feature.
-
When
defaultItemOperationTimeoutis defined in the Data Protection Application (DPA) using thedefaultItemOperationTimeout, it applies to both backup and restore operations. You can useitemOperationTimeoutto define only the backup or only the restore of those CRs, as described in the following "Item operation timeout - restore", and "Item operation timeout - backup" sections.
-
Edit the values in the
spec.configuration.velero.defaultItemOperationTimeoutblock of theDataProtectionApplicationCR manifest, as shown in the following example:apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_name> spec: configuration: velero: defaultItemOperationTimeout: 1h # ...
Implementing Data Mover timeout
timeout is a user-supplied timeout to complete VolumeSnapshotBackup and VolumeSnapshotRestore. The default value is 10m.
Use the Data Mover timeout for the following scenarios:
-
If creation of
VolumeSnapshotBackups(VSBs) andVolumeSnapshotRestores(VSRs), times out after 10 minutes. -
For large scale environments with total PV data usage that is greater than 500GB. Set the timeout for
1h. -
With the
VolumeSnapshotMover(VSM) plugin. -
Only with OADP 1.1.x.
-
Edit the values in the
spec.features.dataMover.timeoutblock of theDataProtectionApplicationCR manifest, as shown in the following example:apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: <dpa_name> spec: features: dataMover: timeout: 10m # ...
Implementing CSI snapshot timeout
CSISnapshotTimeout specifies the time during creation to wait until the CSI VolumeSnapshot status becomes ReadyToUse, before returning error as timeout. The default value is 10m.
Use the CSISnapshotTimeout for the following scenarios:
-
With the CSI plugin.
-
For very large storage volumes that may take longer than 10 minutes to snapshot. Adjust this timeout if timeouts are found in the logs.
Note
Typically, the default value for CSISnapshotTimeout does not require adjustment, because the default setting can accommodate large storage volumes.
-
Edit the values in the
spec.csiSnapshotTimeoutblock of theBackupCR manifest, as shown in the following example:apiVersion: velero.io/v1 kind: Backup metadata: name: <backup_name> spec: csiSnapshotTimeout: 10m # ...
Implementing item operation timeout - restore
The ItemOperationTimeout setting specifies the time that is used to wait for RestoreItemAction operations. The default value is 1h.
Use the restore ItemOperationTimeout for the following scenarios:
-
Only with Data Mover 1.2.x.
-
For Data Mover uploads and downloads to or from the
BackupStorageLocation. If the restore action is not completed when the timeout is reached, it will be marked as failed. If Data Mover operations are failing due to timeout issues, because of large storage volume sizes, then this timeout setting may need to be increased.
-
Edit the values in the
Restore.spec.itemOperationTimeoutblock of theRestoreCR manifest, as shown in the following example:apiVersion: velero.io/v1 kind: Restore metadata: name: <restore_name> spec: itemOperationTimeout: 1h # ...
Implementing item operation timeout - backup
The ItemOperationTimeout setting specifies the time used to wait for asynchronous
BackupItemAction operations. The default value is 1h.
Use the backup ItemOperationTimeout for the following scenarios:
-
Only with Data Mover 1.2.x.
-
For Data Mover uploads and downloads to or from the
BackupStorageLocation. If the backup action is not completed when the timeout is reached, it will be marked as failed. If Data Mover operations are failing due to timeout issues, because of large storage volume sizes, then this timeout setting may need to be increased.
-
Edit the values in the
Backup.spec.itemOperationTimeoutblock of theBackupCR manifest, as shown in the following example:apiVersion: velero.io/v1 kind: Backup metadata: name: <backup_name> spec: itemOperationTimeout: 1h # ...