Disaster Recovery for VMs on Kubernetes
Tue, 24 Sep 2024 13:08:08 -0000
|Read Time: 0 minutes
Introduction
The world of virtualization is undergoing a seismic shift. In the past few years, we have seen the acquisition of VMware by Broadcom, the rise of cloud-native and serverless workloads, and the introduction of new hypervisors such as Nutanix AHV, prompting organizations to reevaluate traditional virtual machine (VM) management.
One approach gaining popularity involves using Kubernetes to manage Virtual Machines. The KubeVirt project allows Virtual Machine-based workloads to run alongside containers in Kubernetes. A common shared environment for Application Containers and Virtual Machines is highly convenient, especially if your organization is in the process of adopting Kubernetes but still has existing VM-based workloads that cannot be easily containerized. The high-level concepts of Kubevirt are explained in this lightboard video.
Dell Container Storage Modules (CSM) simplifies storage management for Kubernetes workloads. Let’s take a look at how you can provision, protect, and manage storage for VMs running on Kubernetes clusters using Dell CSM.
GitOps Disaster Recovery
Disaster Recovery for an application involves recovering the application on a different site along with its data. Mature Kubernetes organizations and teams often use GitOps to manage their cloud-native workloads, infrastructure, and application configurations. The entire state of the system is stored in Git repositories. To effect a change in the infrastructure or application configuration, a pull request with the required changes must be merged into the Git repository. GitOps tools like Argo CD are configured to automatically apply the changes from Git repositories to the infrastructure or application.
Recovering a GitOps-managed application post-disaster should be a simple pull request to change where the application runs.
Configure two Kubernetes clusters for CSM Replication
CSM Replication brings the Replication and Disaster Recovery capabilities of Dell Storage Arrays to Kubernetes Clusters. We will leverage CSM Replication to replicate the VM disks to the secondary cluster.
- Configure two Kubernetes Clusters for CSM replication:
- Configure a pair of arrays for Replication.
- Configure CSI Driver on the Primary cluster to communicate with the primary array, and configure the CSI Driver on the Secondary cluster to communicate with secondary array.
- Install CSM Replication on the two clusters.
- Configure replicated storage classes on the two clusters (csi-replicated-sc on the primary cluster and on the secondary cluster).
Configure Argo CD to deploy the VM to both clusters
The VM-based application should be deployable via Argo CD to either of the clusters. Kustomize or Helm values are typically used to override the application's default configurations when deploying to different environments. In this case, we will use Kustomize to specify environment-specific configurations.
Prepare the application
- In the Git repo, use kustomize to specify different kustomize.yaml files for deployment to the primary and secondary clusters.
- In the kustomize configuration for primary cluster:
- Use the replication source storage class (csi-replicated-sc) as the storage class for the persistent volume claim corresponding to the volume corresponding to the VM disk.
- If the VM disk image needs to be pre-populated, specify the CDI annotations on the Persistent Volume Claim to specify the source from which to fetch the image.
- Patch the VM's running field to true.
- In the kustomize configuration for the secondary cluster:
- The PVC should be patched to remove any CDI annotations.
- Patch the VM's running field to false
Configure Argo CD to deploy the VM
- Register both clusters to Argo CD.
- Create an applicationSet for the VM in Argo CD.
- Use the list generator to specify the primary and secondary clusters as destinations to which to deploy the application. A complete example is available here: https://github.com/kumarp20/gitops-sample/blob/main/applicationsets/demo-vm/applicationset-demo-vm.yaml
- Specify the git repo and the kustomized paths to use for the two clusters
Two applications are created in Argo CD corresponding to the application set above, one for the primary cluster and another for the secondary cluster. Sync the applications to the two clusters.
The VM is in Running state on the primary cluster. The Persistent volume created on the primary array for the VM is being replicated to a replica volume on the secondary array. On the secondary cluster, there is a Persistent Volume in read-only mode corresponding to the replica volume.
Disaster Recovery
If the Primary cluster loses connectivity to the Primary array or if the primary cluster and array are both hit by a disaster, the application can be recovered on the secondary site on the secondary cluster and array.
- Initiate Failover of the data using CSM Replication.
- This can be achieved via the repctl command line tool or by editing the DellCSIReplicationGroup corresponding to the volume being used by the VM. More details here: Disaster Recovery | Dell Technologies
- This changes the persistent volume on the secondary to ReadWrite mode.
- Update the kustomizations for the two clusters in the git repository
- Once the applications are synced, the VM is running on the secondary using the replica volume. Any data stored on the VM when it was running on the primary cluster is retained post-recovery on the secondary cluster.
Conclusion
Using Kubernetes to manage virtual machines offers unique benefits, such as the ability to mix cloud-native and VM workloads, leverage GitOps processes, and enable multi-cloud architectures. However, with over 20 years of existence, virtualization infrastructures have established patterns that must be adapted to fit the Kubernetes world, and there's more to discuss about network management, data protection, and other topics.
Stay tuned for further updates on Kubevirt 🤖☁️💻.
Sources
Stay informed of the latest updates of Dell CSM eco-system by subscribing to:
Authors: Pooja Prasannakumar & Florian Coulombel