Backup and restore is the main building block in any organisation’s disaster recovery policy and since containerised workloads are no longer a short-lived workloads that are only running in a development environments, but actually became almost the standard for running production applications, it is now crucial to design and implement a solid backup and restore solution for Kubernetes.
The challenge is that, traditional backup and restore tools are built to handle physical/traditional virtual machines and has no mechanisms to deal with Kubernetes, containers, persistent volumes and other building blocks of modern containerised workloads. In this blog post I am going to discuss how organisations can implement and use Velero as a backup/restore solution for their Tanzu Kubernetes workloads and I will try to summarise the different options available (along with the differences) when using Velero.
Velero (which is previously known as Heptio Ark) is an open-source project providing organisations with tools to backup and restore Kubernetes cluster resources and persistent volumes. Velero can be implemented with Cloud providers offering Kubernetes platform for your organisation or can be implemented as part of an on-prem deployment. Some of Velero use cases are:
Velero consists of a server component which will run as a deployment in your Kubernetes cluster and a command line tool which is used to interact with Velero server APIs.
Velero can be used to back-up VMware Tanzu Kubernetes Grid Services guest clusters (TKGS) as well as Tanzu Kubernetes Grid workload clusters (TKGm) however there are major differences on how back-up and restore work and offer in both cases, below I list some of the major differences and key takeaways if you decide to use Velero to backup your Tanzu clusters.
Customers who are implementing Tanzu Kubernetes Grid Services workloads (aka vSphere with Tanzu/Workload Management) can use Velero to backup their Tanzu workloads using one of the following methods:
This method requires customers to have enabled workloads management (supervisor cluster) on top of NSX networking, since in this mode Velero pods will be deployed as vSphere Pods which are only possible when vSphere with Tanzu is enabled on top of NSX networking and NOT VDS networking.
Advantage of using vSphere Velero plugin is that customers can perform back-up and restore operations for:
The gotchas in this scenario are:
If the above conditions are not satisfied in your Tanzu deployment you need to look into the following option.
Customers who do not have NSX as networking provider for their Tanzu clusters can still leverage Velero to perform Tanzu workloads back-ups and restores using restic open-source back-up module. When you deploy Velero and instruct it to use restic as back-up module, Velero will perform File System Backup level for Persistent Volumes present in the cluster/namespace that you want to back-up.
As everything in life, File System Backup brings advantages but also disadvantages, let’s start with the advantages of using standalone Velero and Restic File System Backups to back up your Tanzu Grid clusters:
Now for the disadvantages you might need to consider the following:
For Tanzu Kubernetes Grid clusters (Management and workload clusters) the only possibility is to leverage standalone Velero and Restic to perform back-up and restore operations, however in this scenario Tanzu Management cluster can be backed up using Velero modules.
For software versions I used the following:
For virtual hosts and appliances sizing I used the following specs:
Velero requires an S3 like object store in order to store backups to, this does not mean that you need to use AWS S3 for that but any object store will do. In my setup I used MinIO which provides an S3-like and compatible API. You can deploy MinIO using binaries directly to a server/local machine or as a container image in docker or podman, details of how to install MinIO can be found HERE.
In my lab I deployed MinIO as docker image using the following docker compose YAML (you will have to have docker and docker-compose installed prior to running the below file)
version: '3'
services:
minio:
image: minio/minio
ports:
- "9000:9000"
- "9001:9001"
volumes:
- minio_storage:/home/nsxbaas/minio/data #this is a pre-created directory which is used by MinIO to store backups
environment:
MINIO_ROOT_USER: nsxbaas
MINIO_ROOT_PASSWORD: password_of_your_choice
command: server --console-address ":9001" /data
volumes:
minio_storage: {} Once I deployed the above as docker image I was able to access the UI of MinIO using the above credentials
After that you will to create a R/W bucket (in my setup I created a bucket called tanzu)
You will also need to generate access keys which Velero will use to authenticate to MinIO to perform backup and restore operations. You can create your access id/key pair from Access Keys options on the left pane under User.
At this point, MinIO is ready to be used as our object store.
Download Velero from the Tanzu Kubernetes Grid product download page at the VMware Customer Connect portal you then need to upload the gzip archive to a Linux machine (in my setup I use a linux jumpbox) and follow the instructions in VMware documentation to extract and copy Velero cli to your local PATH.
If Velero CLI is installed correctly you should see similar output as the below
For this step, you need to create a file called credentials-minio (name must be exactly as this) and paste in it your S3 access ID/Key we generated earlier in MinIO UI
[default] aws_access_key_id = 1K7OupGVfoaWYfmHs23i aws_secret_access_key = by4mECTMyMRei1yukHjPUfXcuChINYNp80tlHVn3
At this point we are ready to start deploying Velero in our TKGS guest cluster, please note that the below command syntax I am using is valid up till Velero version 1.9.7, starting from Velero 1.10.x the syntax has changed especially for the restic part of the command, you can reference Velero open source documentation for the most recent command syntax.
To start Velero deployment, login to your TKG guest cluster and run the following command after modifying URL address to match your environment:
velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.0.0 \ --bucket tanzu \ --secret-file ./credentials-minio \ --use-volume-snapshots=false \ --use-restic \ --backup-location-config \ region=minio,s3ForcePathStyle="true",s3Url=http://services-linux.nsxbaas.homelab:9000,publicUrl=http://services-linux.nsxbaas.homelab:9000
This will kickstart Velero deployment and you should see similar output to the below
As you can see Velero and restic pods are all in running state (ignore the one in Pending this is due to some insufficient resources in my lab but has no effect on backup or restore operations).
Before we go straight to the command line, it is may be interesting to learn a bit of how Velero performs a backup operation for a namespace or a cluster. Velero can perform on-demand and/or scheduled backups for your Kubernetes resources, the workflow is the same for both and is triggered by either velero backup command (on-demand) or a pre-configured schedule, for both the workflow can be summarised as follows:
The Velero cli backup command makes an API call to the KubeAPI server in order to create backup object.
The Velero backup controller (module) will see the backup object created and performs validation.
Velero then begins the backup process. It collects the data to back up by querying the API server for resources.
Velero backup module makes a call to the object store (MinIO in our case) to upload the backup file.
I will be perform a backup operation for a namespace called microservices using the following command:
velero backup create velero-backup01 --include-namespaces microservices
If the backup operation is successful then you should see similar output to the below when you use the command velero backup describe <backup job name>
From the above output you can see that the backup job has been completed successfully.
To simulate a DR scenario, I will delete the microservices namespace and will use Velero to restore it from backup
In the above screenshot, notice the AGE of microservices namespace is 88 days, after I perform the restore operation it will show AGE in seconds since I have deleted the namespace and restored it from backup as shown also in the screenshot as well.
The above output is taken after the restore operation where you can see that the microservices namespace has been recreated 87 seconds ago and you can see the pods/containers status being created and deployed.
Hope you have found this blog post useful!
Overview NSX Advanced Load Balancer (a.k.a Avi) offers variety of advanced load balancing and application…
Overview With the release of VMware NSX 4.0 VMware announced the deprecation of NSX standard…
Overview In this blog post I am going to walk you through the configuration of…
Overview NodePortLocal is a feature that is part of the Antrea Agent, through which a…
Overview In part two of this blog post, we will be using NSX DFW to…
Overview VMware Antrea and NSX extend advanced data centre networking and security capabilities to containerised…