Simple note-to-self about restoring a volume from a Longhorn backup. 

Should be straightforward, but have made of mess of it a few times - so therefore a short 'note-to-self' about restoring a Longhorn backup. Longhorn backups is stored remotely on AWS S3/NFS contrary to snapshots. 

Requirements: Installed Longhorn, an existing backup and the Longhorn backup target should be working correctly. 

Disclaimer: I am not a Longhorn/K3s expert. Use your own judgment.

Backups have already been created. Cannot figure out how to create Longhorn backups from kubctl, so backups are made directly from Longhorn GUI or by backup jobs. All very simple and well made.

Existing backups:

kubectl get backups.longhorn.io -n longhorn-system
NAME                      SNAPSHOTNAME                           SNAPSHOTSIZE   SNAPSHOTCREATEDAT      STATE       LASTSYNCEDAT
backup-485e03e3292146d9   e5533596-6028-4f30-ae87-c486d9e0b6de   385875968      2023-04-05T14:14:13Z   Completed   2023-04-05T14:14:17Z

And the volumes from which backup backup-485e03e3292146d9 have been taken.

kubectl get backupvolumes.longhorn.io -n longhorn-system
NAME                                       CREATEDAT              LASTBACKUPNAME            LASTBACKUPAT           LASTSYNCEDAT
pvc-d735df79-6107-42ff-be45-4a8e4336df66   2023-04-05T14:13:46Z   backup-485e03e3292146d9   2023-04-05T14:14:13Z   2023-04-05T15:05:01Z

Notice longhorn backup keep the 'old' volume name (pvc-d735df79-6107-42ff-be45-4a8e4336df66).

The PVC. Connecting PVC name (wp-pv-claim) to the volume (pvc-d735df79-6107-42ff-be45-4a8e4336df66)

kubectl get persistentvolumeclaims -n wordpress
NAME             STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
wp-pv-claim      Bound         pvc-d735df79-6107-42ff-be45-4a8e4336df66   5Gi        RWO            longhorn       109m

PVC 'wp-pv-claim' has created a volume 'pvc-d735df79-6107-42ff-be45-4a8e4336df66'. From this volume there exist a least one backup: 'backup-485e03e3292146d9'. All above data is also available from Rancher/Longhorn GUI. 

Restore backup

Longhorn cannot do in-place restore jobs. I.e., volumes need to be removed or the backup will need to restore to a different volume name. Not a colossal problem, as Longhorn keep all relevant data for the volume. As volume name and namespace.

I have two types of restore jobs: A) Pods are running fine, but data is garbage and B) All is gone. Pods/workloads attached to the PVC have been deleted or failing. I do not matter if data have been delete/corrupted. You are left with the original helm charts/deployments files and the longhorn backup. Basically, start with the namespace.

For A) Restoring PV while workloads are still intact is straight forward; delete PVC and select the relevant backup set and run 'restore latest backup'.

Make absolutely sure the correct PVC/PV is deleted. Most cases you would know which workload are using which PVCs, but to make sure; data is available from Longhorn GUI or running kubectl describe pod-name -n namespace. When the PVC/PV is removed, the workload will stop working.

Find the relevant pod:

kubectl get pods -n wordpress
NAME                               READY   STATUS    RESTARTS   AGE
wordpress-5c8796574f-ltxj9         1/1     Running   0          17h
wordpress-mysql-5696494775-trv5s   1/1     Running   0          18h

Pod resources (snip from output). 

kubectl describe pod wordpress-5c8796574f-ltxj9 -n wordpress
Name:             wordpress-5c8796574f-ltxj9
...
Volumes:
  wordpress-persistent-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  wp-pv-claim
    ReadOnly:   false

PVC for pod 'wordpress-5c8796574f-ltxj9' is 'wp-pv-claim'. Make sure the relevant backups are available for 'wp-pv-claim'

PVC and volume name.

kubectl get persistentvolumeclaims -n wordpress
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
wp-pv-claim      Bound    pvc-59addab3-11d1-44cc-b81d-8315053b1df1   5Gi        RWO            longhorn-static   10m

Volume and backup name.

kubectl get backupvolumes.longhorn.io -n longhorn-system
NAME                                       CREATEDAT              LASTBACKUPNAME            LASTBACKUPAT           LASTSYNCEDAT
pvc-59addab3-11d1-44cc-b81d-8315053b1df1   2023-04-05T14:13:30Z   backup-b54cb5cebd8b4091   2023-04-05T14:14:06Z   2023-04-06T09:34:00Z

A backup exists for PVC 'wp-pv-claim' and it is called: backup-b54cb5cebd8b4091. The PVC can therefore be deleted now.

Deleting PVC 'wp-pv-claim'. 

kubectl delete pvc wp-pv-claim -n wordpress     
persistentvolumeclaim "wp-pv-claim" deleted

The process will hang in 'terminating' state until the workload is re-deployed - or by deleting the relevant pod. Confirm the volume is actually deleted.

kubectl get pvc -n wordpress
NAME             STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
wp-pv-claim      Terminating   pvc-59addab3-11d1-44cc-b81d-8315053b1df1   5Gi        RWO            longhorn-static   18h

Cannot figure out how to run the a restore job from the command line, so restore job will have to be done from Longhorn GUI. 

Below showing all available backups in Longhorn. To restore the backup simply click 'Restore Latest Backup' (the menu is uncomfortable close to 'Delete All Backups'). 

The restore menu offers a few options: 'Use previous Name.': The new volume will inherit the original volume name. As mentioned before, it is also possible to restore to a new volume (name). 'Number of replicas' and Access Mode' is most be set for your environment.

 

Longhorn

Longhorn-Restore backup

The restored volume is 'detached' and not bound by any PVC. We need to re-create the PC/PVC for the volume. Currently the volume is not available for the cluster.

 

(Re-) creating the PC and PVC for the restored volume.

 

Creating PVC/PV. Menu show values saved by Longhorn - these are all pre-filled. Again; you can create a PV/PVC with a different name - and still use it with the relevant workloads.

 

If you have re-used the old volume name, workload should have picked up the restored volume name. 

kubectl get persistentvolumeclaims -n wordpress
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
mysql-pv-claim   Bound    pvc-2a530ed0-2dfc-4f6e-a731-0eae79f246a7   5Gi        RWO            longhorn-static   19h
wp-pv-claim      Bound    pvc-59addab3-11d1-44cc-b81d-8315053b1df1   5Gi        RWO            longhorn-static   2m5s

Snip of 'kubectl describe pod'. The pod is again using the relevant PV:

kubectl describe pods wordpress-55c9ff4b54-rfnb7 -n wordpress
...
Volumes:
  wordpress-persistent-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  wp-pv-claim
    ReadOnly:   false

 

For 3.2) only the backups and the relevant helm chart or deployment, service, PVC files are available.

After restoring namespace, secrets, service, PVC and deployments, the workload is running again. But it is just a blank/default WordPress website (in my case). For a WordPress site this would be the default 'Install WordPress' page. I need to restore all the stuff I have added - configurations, templates, images, stories.

I have two volumes: pvc-59addab3-11d1-44cc-b81d-8315053b1df1 (WordPress) and pvc-d735df79-6107-42ff-be45-4a8e4336df66 (MySQL.)

kubectl get backupvolumes.longhorn.io -n longhorn-system
NAME                                       CREATEDAT              LASTBACKUPNAME            LASTBACKUPAT           LASTSYNCEDAT
pvc-59addab3-11d1-44cc-b81d-8315053b1df1   2023-04-05T14:13:30Z   backup-b54cb5cebd8b4091   2023-04-05T14:14:06Z   2023-04-05T14:15:01Z
pvc-d735df79-6107-42ff-be45-4a8e4336df66   2023-04-05T14:13:46Z   backup-485e03e3292146d9   2023-04-05T14:14:13Z   2023-04-05T14:15:01Z

View from Longhorn: Volume 'pvc-59addab3-11d1-44cc-b81d-8315053b1df1' and pod 'wordpress-5d958b88df-cmqbt' do not exist anymore, but Longhorn store name and volume and pod from which the backup was taken. It is not a problem.

The new workload has created a new PVC/PV. PVC names are identical (wp-pv-claim and mysql-pv-claim) but volumes have gotten new names.

 
kubectl get pvc -n wordpress 
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mysql-pv-claim   Bound    pvc-2a530ed0-2dfc-4f6e-a731-0eae79f246a7   5Gi        RWO            longhorn       82s
wp-pv-claim      Bound    pvc-b3da03f1-8e27-4c40-8f8d-f67c1fd3fa73   5Gi        RWO            longhorn       61s

So volume pvc-2a530ed0-2dfc-4f6e-a731-0eae79f246a7 is MySQL and pvc-b3da03f1-8e27-4c40-8f8d-f67c1fd3fa73 is WordPress.

My WordPress backup is bound to a (deleted) pod: wordpress-5d958b88df-cmqbt and a volume name: pvc-59addab3-11d1-44cc-b81d-8315053b1df1. So, I need to restore my WordPress backup as: pvc-b3da03f1-8e27-4c40-8f8d-f67c1fd3fa73 (the newly created volume name for the WordPress workload). 

When you delete the volume, your workload will fall fail until the backup is in place.

1) Delete volumen: pvc-b3da03f1-8e27-4c40-8f8d-f67c1fd3fa73
2) Restore backup pvc-6123db39-4997-4ca7-a0f2-8af8c8d9a061 and create a volume called:  pvc-b3da03f1-8e27-4c40-8f8d-f67c1fd3fa73

Again; deleting the PVC will result in nothing mush before the workloads are re-deployed.  Both PVC's is stuck in 'Terminating'.

kubectl get pvc -n wordpress
NAME             STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mysql-pv-claim   Terminating   pvc-2a530ed0-2dfc-4f6e-a731-0eae79f246a7   5Gi        RWO            longhorn       12m
wp-pv-claim      Terminating   pvc-b3da03f1-8e27-4c40-8f8d-f67c1fd3fa73   5Gi        RWO            longhorn       11m

Now restore the backup to a volume with a new name (pvc-b3da03f1-8e27-4c40-8f8d-f67c1fd3fa73). pvc-b3da03f1-8e27-4c40-8f8d-f67c1fd3fa73 is the volume created by the latest deployment. This time you cannot use the option: 'Use previous Name' (Previous name will point to a different volume than what the deployment is actually using). 
Instead use:

Name: pvc-b3da03f1-8e27-4c40-8f8d-f67c1fd3fa73 (volume name for the new deployment.)
'Number of replicas' and 'Access Mode' must also be set.

For this specific WordPress volume. Name: pvc-b3da03f1-8e27-4c40-8f8d-f67c1fd3fa73. Number of replicas: 2. Access Mode: ReadWriteOnce.

After restoring the backups. Volume matches the current deployment of the workloads. I.e., my WordPress/MySQL pods is using these volumes (the names of the 'Attached to' pods is nonsense. These are the deleted pods and it is not a problem). 

Now create PV/PVC for the volumes (Do not change suggested/saved values). The process can take a bit of time - 2-3 minutes.

Notice volumes are attached to the correct workload again.

Note: Longhorn backups are considered 'bound' by non-existing pods. I.e. after workloads have been delete, backup will still point to these pods. It is not a problem.

 

 

Lost, but not for long
Heart and mind in search of hope
Found, and we're whole again