Restic issues

You might encounter the following issues when you back up applications with Restic:

Restic permission error for NFS data volumes with the root_squash resource/parameter enabled
Restic Backup CR cannot be recreated after bucket is emptied
Restic restore partially failing on OpenShift Container Platform 4.14 due to changed pod security admission (PSA) policy

Troubleshooting Restic permission errors for NFS data volumes

If your NFS data volumes have the root_squash parameter enabled, Restic maps set to the nfsnobody value, and do not have permission to create backups, the Restic` pod log displays the following error message:

Sample error

controller=pod-volume-backup error="fork/exec/usr/bin/restic: permission denied".

You can resolve this issue by creating a supplemental group for Restic and adding the group ID to the DataProtectionApplication manifest.

Procedure

Create a supplemental group for Restic on the NFS data volume.
Set the setgid bit on the NFS directories so that group ownership is inherited.

Add the spec.configuration.nodeAgent.supplementalGroups parameter and the group ID to the DataProtectionApplication manifest, as shown in the following example:

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
# ...
spec:
  configuration:
    nodeAgent:
      enable: true
      uploaderType: restic
      supplementalGroups:
      - <group_id> 
# ...

Specify the supplemental group ID.

Wait for the Restic pods to restart so that the changes are applied.

Troubleshooting Restic Backup CR issue that cannot be re-created after bucket is emptied

Velero does not re-create or update the Restic repository from the ResticRepository manifest if the Restic directories are deleted from object storage. For more information, see Velero issue 4421.

If you create a Restic Backup CR for a namespace, empty the object storage bucket, and then re-create the Backup CR for the same namespace, the re-created Backup CR fails. In this case, the velero pod log displays the following error message:

+ .Sample error

stderr=Fatal: unable to open config file: Stat: The specified key does not exist.\nIs there a repository at the following location?

Procedure

Remove the related Restic repository from the namespace by running the following command:

$ oc delete resticrepository openshift-adp <name_of_the_restic_repository>

In the following error log, mysql-persistent is the problematic Restic repository. The name of the repository appears in italics for clarity.

 time="2021-12-29T18:29:14Z" level=info msg="1 errors
 encountered backup up item" backup=velero/backup65
 logSource="pkg/backup/backup.go:431" name=mysql-7d99fc949-qbkds
 time="2021-12-29T18:29:14Z" level=error msg="Error backing up item"
 backup=velero/backup65 error="pod volume backup failed: error running
 restic backup, stderr=Fatal: unable to open config file: Stat: The
 specified key does not exist.\nIs there a repository at the following
 location?\ns3:http://minio-minio.apps.mayap-oadp-
 veleo-1234.qe.devcluster.openshift.com/mayapvelerooadp2/velero1/
 restic/mysql-persistent\n: exit status 1" error.file="/remote-source/
 src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:184"
 error.function="github.com/vmware-tanzu/velero/
 pkg/restic.(*backupper).BackupPodVolumes"
 logSource="pkg/backup/backup.go:435" name=mysql-7d99fc949-qbkds

Troubleshooting restic restore partially failed issue on OpenShift Container Platform 4.14 due to changed PSA policy

OpenShift Container Platform 4.14 enforces a Pod Security Admission (PSA) policy that can hinder the readiness of pods during a Restic restore process.

If a SecurityContextConstraints (SCC) resource is not found when a pod is created, and the PSA policy on the pod is not set up to meet the required standards, pod admission is denied.

This issue arises due to the resource restore order of Velero.

Sample error

\"level=error\" in line#2273: time=\"2023-06-12T06:50:04Z\"
level=error msg=\"error restoring mysql-869f9f44f6-tp5lv: pods\\\
"mysql-869f9f44f6-tp5lv\\\" is forbidden: violates PodSecurity\\\
"restricted:v1.24\\\": privil eged (container \\\"mysql\\\
" must not set securityContext.privileged=true),
allowPrivilegeEscalation != false (containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.capabilities.drop=[\\\"ALL\\\"]), seccompProfile (pod or containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.seccompProfile.type to \\\
"RuntimeDefault\\\" or \\\"Localhost\\\")\" logSource=\"/remote-source/velero/app/pkg/restore/restore.go:1388\" restore=openshift-adp/todolist-backup-0780518c-08ed-11ee-805c-0a580a80e92c\n
velero container contains \"level=error\" in line#2447: time=\"2023-06-12T06:50:05Z\"
level=error msg=\"Namespace todolist-mariadb,
resource restore error: error restoring pods/todolist-mariadb/mysql-869f9f44f6-tp5lv: pods \\\
"mysql-869f9f44f6-tp5lv\\\" is forbidden: violates PodSecurity \\\"restricted:v1.24\\\": privileged (container \\\
"mysql\\\" must not set securityContext.privileged=true),
allowPrivilegeEscalation != false (containers \\\
"restic-wait\\\",\\\"mysql\\\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.capabilities.drop=[\\\"ALL\\\"]), seccompProfile (pod or containers \\\
"restic-wait\\\", \\\"mysql\\\" must set securityContext.seccompProfile.type to \\\
"RuntimeDefault\\\" or \\\"Localhost\\\")\"
logSource=\"/remote-source/velero/app/pkg/controller/restore_controller.go:510\"
restore=openshift-adp/todolist-backup-0780518c-08ed-11ee-805c-0a580a80e92c\n]",

Procedure

In your DPA custom resource (CR), check or set the restore-resource-priorities field on the Velero server to ensure that securitycontextconstraints is listed in order before pods in the list of resources:

$ oc get dpa -o yaml

Example DPA CR

# ...
configuration:
  restic:
    enable: true
  velero:
    args:
      restore-resource-priorities: 'securitycontextconstraints,customresourcedefinitions,namespaces,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,datauploads.velero.io,persistentvolumes,persistentvolumeclaims,serviceaccounts,secrets,configmaps,limitranges,pods,replicasets.apps,clusterclasses.cluster.x-k8s.io,endpoints,services,-,clusterbootstraps.run.tanzu.vmware.com,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io' 
    defaultPlugins:
    - gcp
    - openshift

If you have an existing restore resource priority list, ensure you combine that existing list with the complete list.

Ensure that the security standards for the application pods are aligned, as provided in Fixing PodSecurity Admission warnings for deployments, to prevent deployment warnings. If the application is not aligned with security standards, an error can occur regardless of the SCC.

Note

This solution is temporary, and ongoing discussions are in progress to address it.

Additional resources

Fixing PodSecurity Admission warnings for deployments