Restic issues
You might encounter the following issues when you back up applications with Restic:
-
Restic permission error for NFS data volumes with the
root_squashresource/parameter enabled -
Restic
BackupCR cannot be recreated after bucket is emptied -
Restic restore partially failing on OpenShift Container Platform 4.14 due to changed pod security admission (PSA) policy
Troubleshooting Restic permission errors for NFS data volumes
If your NFS data volumes have the root_squash parameter enabled, Restic maps set to the nfsnobody value, and do not have permission to create backups, the Restic` pod log displays the following error message:
controller=pod-volume-backup error="fork/exec/usr/bin/restic: permission denied".
You can resolve this issue by creating a supplemental group for Restic and adding the group ID to the DataProtectionApplication manifest.
-
Create a supplemental group for
Resticon the NFS data volume. -
Set the
setgidbit on the NFS directories so that group ownership is inherited. -
Add the
spec.configuration.nodeAgent.supplementalGroupsparameter and the group ID to theDataProtectionApplicationmanifest, as shown in the following example:apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication # ... spec: configuration: nodeAgent: enable: true uploaderType: restic supplementalGroups: - <group_id> # ...- Specify the supplemental group ID.
-
Wait for the
Resticpods to restart so that the changes are applied.
Troubleshooting Restic Backup CR issue that cannot be re-created after bucket is emptied
Velero does not re-create or update the Restic repository from the ResticRepository manifest if the Restic directories are deleted from object storage. For more information, see Velero issue 4421.
If you create a Restic Backup CR for a namespace, empty the object storage bucket, and then re-create the Backup CR for the same namespace, the re-created Backup CR fails. In this case, the velero pod log displays the following error message:
+ .Sample error
stderr=Fatal: unable to open config file: Stat: The specified key does not exist.\nIs there a repository at the following location?
-
Remove the related Restic repository from the namespace by running the following command:
$ oc delete resticrepository openshift-adp <name_of_the_restic_repository>In the following error log,
mysql-persistentis the problematic Restic repository. The name of the repository appears in italics for clarity.time="2021-12-29T18:29:14Z" level=info msg="1 errors encountered backup up item" backup=velero/backup65 logSource="pkg/backup/backup.go:431" name=mysql-7d99fc949-qbkds time="2021-12-29T18:29:14Z" level=error msg="Error backing up item" backup=velero/backup65 error="pod volume backup failed: error running restic backup, stderr=Fatal: unable to open config file: Stat: The specified key does not exist.\nIs there a repository at the following location?\ns3:http://minio-minio.apps.mayap-oadp- veleo-1234.qe.devcluster.openshift.com/mayapvelerooadp2/velero1/ restic/mysql-persistent\n: exit status 1" error.file="/remote-source/ src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:184" error.function="github.com/vmware-tanzu/velero/ pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:435" name=mysql-7d99fc949-qbkds
Troubleshooting restic restore partially failed issue on OpenShift Container Platform 4.14 due to changed PSA policy
OpenShift Container Platform 4.14 enforces a Pod Security Admission (PSA) policy that can hinder the readiness of pods during a Restic restore process.
If a SecurityContextConstraints (SCC) resource is not found when a pod is created, and the PSA policy on the pod is not set up to meet the required standards, pod admission is denied.
This issue arises due to the resource restore order of Velero.
\"level=error\" in line#2273: time=\"2023-06-12T06:50:04Z\"
level=error msg=\"error restoring mysql-869f9f44f6-tp5lv: pods\\\
"mysql-869f9f44f6-tp5lv\\\" is forbidden: violates PodSecurity\\\
"restricted:v1.24\\\": privil eged (container \\\"mysql\\\
" must not set securityContext.privileged=true),
allowPrivilegeEscalation != false (containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.capabilities.drop=[\\\"ALL\\\"]), seccompProfile (pod or containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.seccompProfile.type to \\\
"RuntimeDefault\\\" or \\\"Localhost\\\")\" logSource=\"/remote-source/velero/app/pkg/restore/restore.go:1388\" restore=openshift-adp/todolist-backup-0780518c-08ed-11ee-805c-0a580a80e92c\n
velero container contains \"level=error\" in line#2447: time=\"2023-06-12T06:50:05Z\"
level=error msg=\"Namespace todolist-mariadb,
resource restore error: error restoring pods/todolist-mariadb/mysql-869f9f44f6-tp5lv: pods \\\
"mysql-869f9f44f6-tp5lv\\\" is forbidden: violates PodSecurity \\\"restricted:v1.24\\\": privileged (container \\\
"mysql\\\" must not set securityContext.privileged=true),
allowPrivilegeEscalation != false (containers \\\
"restic-wait\\\",\\\"mysql\\\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.capabilities.drop=[\\\"ALL\\\"]), seccompProfile (pod or containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.seccompProfile.type to \\\
"RuntimeDefault\\\" or \\\"Localhost\\\")\"
logSource=\"/remote-source/velero/app/pkg/controller/restore_controller.go:510\"
restore=openshift-adp/todolist-backup-0780518c-08ed-11ee-805c-0a580a80e92c\n]",
-
In your DPA custom resource (CR), check or set the
restore-resource-prioritiesfield on the Velero server to ensure thatsecuritycontextconstraintsis listed in order beforepodsin the list of resources:$ oc get dpa -o yamlExample DPA CR# ... configuration: restic: enable: true velero: args: restore-resource-priorities: 'securitycontextconstraints,customresourcedefinitions,namespaces,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,datauploads.velero.io,persistentvolumes,persistentvolumeclaims,serviceaccounts,secrets,configmaps,limitranges,pods,replicasets.apps,clusterclasses.cluster.x-k8s.io,endpoints,services,-,clusterbootstraps.run.tanzu.vmware.com,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io' defaultPlugins: - gcp - openshift- If you have an existing restore resource priority list, ensure you combine that existing list with the complete list.
-
Ensure that the security standards for the application pods are aligned, as provided in Fixing PodSecurity Admission warnings for deployments, to prevent deployment warnings. If the application is not aligned with security standards, an error can occur regardless of the SCC.
Note
This solution is temporary, and ongoing discussions are in progress to address it.