VDI Data Protection - Part 2 : A Deep Dive into VMware Horizon 7 Multi-Site Scenarios
Mon, 12 Dec 2022 21:26:47 -0000
|Read Time: 0 minutes
In the first part of this blog series – VDI Data Protection - Part 1: Protecting Your VDI Environment - What You Need to Consider, we discussed major components of virtual desktop infrastructure (VDI) data protection and the parameters involved in selecting a data protection plan that is right for your VDI environment. As discussed in that blog post, disaster recovery and operational backup are two significant aspects of data protection.
A VDI outage directly impacts user productivity in ways that are usually measurable by the organization. Even if your mission-critical application servers and databases are online, users still need access to their desktops to be productive. So, it is important to have an adequately formulated Business Continuity Plan or Disaster Recovery (DR) Plan for VDI that aligns with the organization’s business objectives.
A complete risk assessment of the likely disaster scenarios and business impact should be done to formulate an optimal DR plan for your VDI environment. Disaster recovery protection and execution comes with a cost, so you should strike the right balance between cost and the availability of VDI services required by the organization. You should:
- Divide the VDI user groups into segments based on the criticality of the applications they use.
- Establish an optimal Recovery Point Objective (RPO) and Recovery Time Objective (RTO) for each desktop group so that a disaster will have the least impact on your business while also conforming with the organization’s budget. (RTO is the elapsed time until virtual desktops are available or recovered after an incident. RPO is the acceptable time duration (in minutes or hours) of data loss from a VDI environment in the event of a disaster.)
In this blog, we will deep dive into the different multi-site approaches used in a VMware Horizon 7 VDI Disaster Recovery plan.
Multi-Site Horizon 7 VDI with Cloud Pod Architecture
Cloud Pod Architecture (CPA) is the foundation for VMware’s Horizon 7 VDI Disaster Recovery. A pod is made up of a group of interconnected Connection Servers that broker connections to desktops or published applications. A pod consists of multiple blocks, and a block is a collection of one or more vSphere clusters hosting pools of desktops or applications. Each block has a dedicated vCenter Server.
With CPA, you can have multiple pods connected in a federation to improve reliability. For example, you can have pods in each of your CPA sites, and all of them can be connected via the CPA to form a federation. In a CPA environment, a site is a collection of well-connected pods in the same physical location, typically in a single data center. Connection Server instances in a pod federation use a data-sharing approach known as the Global Data Layer to replicate information about pod federation topology, user/group assignments, policies, and other CPA configuration settings.
With a global entitlement, you can publish desktop icons from Horizon pools from any of the pods in the federation. You can create a global entitlement to publish the VDI desktop icon, then assign users or user groups to the global entitlement, which gives users access to desktops across multiple sites. Each site has a minimum of one pod.
Figure 1 shows a basic CPA architecture involving two Horizon 7 sites or data centers. For more information regarding the architecture and configuration of Horizon 7 CPA, see the VMware documentation.
Figure 1: VMware Horizon 7 Multi-Site Deployment with Cloud Pod Architecture
When a user launches a VDI desktop icon, the following occurs:
- The request goes to a Global Server Load Balancer (for example, VMware NSX Advanced Load Balancer).
- The Global Server Load Balancer redirects the request to a Horizon Connection Server in one of the pods.
- The Horizon Connection Server brokers the virtual desktops.
Home Site and Scope policies defined in a global entitlement determine the scope of the search to identify a desktop for the user. A Home Site is a relationship between a user or group and a CPA site. Typically, a user’s data and profile reside in the site that is configured as a Home Site. With the Home Site option enabled, the preference will be to launch the desktop from the Home Site, irrespective of the user’s location. If the Home Site is not configured, unavailable, or does not have resources to satisfy the user's request, Horizon continues searching other sites according to the Scope policy set for the global entitlement. The Scope policy options available in Horizon 7 global entitlement, are ‘All Sites,’ ‘Within Site,’ or ‘Within Pod.’
Horizon CPA architecture is typically used with non-persistent desktop models based on instant clones or linked clones, where we can de-couple the user data from the rest of the desktop. A similar desktop virtual machine (VM), without user data, is also provisioned in other sites. The user data can then be replicated to the other sites to provide a consistent experience while accessing these desktops.
You can have an active/active or active/passive approach for your VDI DR plan based on Horizon CPA. When providing DR for Horizon-based services, it is important to consider the active or passive approach from the perspective of the user:
- With an active/active approach, services delivered to a particular user will be available on all sites.
- With an active/passive approach, only the primary site will be active, and in the event of a disaster, services will be available to the user on the secondary site.
Home Site and Scope policies defined in the global entitlement should align with the DR approach devised for your VDI environment. For example, in a VDI environment based on an active/active approach, the global entitlement policies are configured to search desktops from all the sites in the pod federation. Even if one of the sites is not available, the desktop can be searched and launched from the pods in the other active sites. In a VDI environment based on an active/passive DR approach, global entitlement policies should be set to launch desktops only from the pods in the active site. If there is an outage in the primary active site, services are enabled in the secondary passive site, with manual intervention, and desktops are launched from the secondary site. For more information about configuring global entitlement, see the VMware documentation.
Even though Horizon 7 CPA supports both approaches, the decision concerning the DR approach will depend on a variety of factors, including the availability requirements of business-critical applications, the distance between sites, the cost of the DR infrastructure, replication of data, and network connectivity.
In an active/active multi-site scenario, user data should be replicated synchronously across all sites in real-time to maintain application responsiveness and the user experience. It could be a challenge to replicate large data files across a Wide Area Network (WAN) without impacting the user experience. For example, consider a user who works with an application that processes large files. If the user is redirected to different active sites each time he logs in, he may either experience slowness or the application may not work properly because the associated data is not available in real-time in all sites. The speed of replication plays a more significant role under this combination of requirements - it depends on the distance between sites, the type of WAN network, available bandwidth, and so on. Real-time replication traffic can also consume a large chunk of your network bandwidth and may affect other traffic, including production. So, you must consider these implications before deciding on an active/active approach.
Alternatively, you can have a partial active/active DR approach, with 50% of users homed to one of the active sites. The other site will still be active, serving the other 50% of users. However, both data centers should have the capacity to run at 100% capacity, in the event of a disaster in one of the sites. With this approach, you can avoid the replication challenges described above. You can schedule the replication out of business hours on a daily or weekly basis without impacting critical production traffic. The user also gets a better experience as their desktops launch at the Home Site, where their data and profile reside.
An active/passive DR solution is the simplest approach in Horizon 7 CPA multi-site deployments because only one site is active at any given point of time. Users are assigned to global entitlements and are homed to the primary active site, while the secondary site remains passive. In this approach, you do not have to worry about the replication challenges because the desktop always launches on the Home Site, which is the primary or active site for users. However, data is replicated to the passive site based on the RPO requirements. In the event of a disaster, services in the primary site will be failed-over to the secondary passive site – with manual intervention, while global entitlement launches desktops on the secondary site.
Table 1 shows the limits of various components in CPA architecture for Horizon version 7.8 or later. For more information regarding limits refer to VMware Horizon 7 sizing limits and recommendations.
Table 1: Cloud Pod Limits in Horizon 7 CPA
Multi-Site Horizon 7 VDI with vSAN stretched clusters
Architectures based on VMware’s vSAN technology support active/passive multi-site deployment for Horizon 7 VDI. This is a truly active/passive approach that leverages a stretched cluster: one that extends a vSAN cluster across two sites or data centers. The solution builds on the VMware vSAN replication technology that replicates data across the sites involved in a stretched cluster and the VMware vSphere HA feature that provides high-availability across the hosts in a cluster. Horizon 7 management servers and desktop VMs are pinned to the active site using VMware vSphere Storage DRS, VM DRS, host DRS groups, and VM-Host affinity rules. In the event of a disaster in the active site, the VMs are failed-over and restarted in the passive site using VMware HA. The pinning of Horizon management and desktop VMs to storage and compute resources on a single site mean that these VMs only reside on a single site at any given time – either the originally active site in normal operation or the originally passive site after a DR event has occurred.
A vSAN stretched architecture consists of three fault domains called preferred (active site), secondary (passive site), and witness (third site). The latency between data sites should not be more than 5 ms round-trip-time (RTT). The latency between the data sites and the witness site should not be more than 200 ms RTT. vSAN communication between the data sites can be overstretched L2 or L3 networks, and vSAN communication between data sites and the witness site can be routed over L3. For details on a vSAN stretched, cluster-based DR approach see the VMware documentation.
A typical use-case when leveraging a vSAN stretched cluster DR approach is full-clone persistent desktop pools, where user data is tightly integrated with the desktop. vSAN storage cluster technology replicates VDI VMs across the sites in the stretched cluster in real-time. The architecture only supports data centers which are near to each other and connected via good network bandwidth with low latencies. In this DR approach, RTO is higher compared to a CPA-based DR solution because the management and desktop VMs need to be restarted at the passive site in the event of a disaster.
Dell EMC VxRail hyper-converged appliances based on VMware vSAN technology can run a multi-site Horizon 7 VDI environment using the vSAN stretched cluster architecture discussed above. For more details on deploying a VMware Horizon 7 VDI on VxRail appliances, see the Design Guide under "Designs for VMware Horizon on VxRail and vSAN Ready Nodes" on our VDI Info Hub for Ready Solutions website.
Summary
Getting your users back to being productive with a minimum loss of time and data is of vital importance in the event of a DR. With technologies where we can de-couple user data from a virtual desktop, Horizon 7 VDI disaster recovery boils down to how you handle the replication of user data across your sites while providing an equivalent set of VMs in other sites. For non-persistent desktop use-cases where user data is usually decoupled from the rest of the desktop, a Horizon CPA-based DR approach is recommended. If you are planning for an active/active DR solution, you should consider the replication challenges discussed earlier in this blog. For use-cases where full-clone persistent desktops cannot be converted to a non-persistent model for various business reasons, an active/passive DR approach based on vSAN stretched cluster architecture is the best fit.
In the next blog in this series, we will discuss the operational backup aspects of VDI data protection based on testing done by the Dell EMC Ready Solutions for VDI Engineering team. So, stay tuned!
Published By
Anand Johnson
Principal Engineer at Dell EMC, Technical Marketing ,Ready Solutions for VDI
Check out this article to learn about VDI - VMware Horizon 7 multi-site approaches. #iwork4dell #dellemc #vmware #vxrail