PowerOne and SRDF (Part 2)
Thu, 25 Jun 2020 17:55:46 -0000|
Read Time: 0 minutes
PowerOne with SRDF use cases and associated topologies
In my previous blog post (PowerOne and SRDF Part I) we introduced the business context and technologies involved in a PowerOne with SRDF scenario. In this second blog post we describe two data center use cases and the associated topologies:
- Two sites with SRDF Synchronous (SRDF/S) or Asynchronous (SRDF/A) data replication with Remote Restart (disaster restart protection for virtual infrastructure)
- Two sites in a stretch cluster configuration with synchronous data mirroring (SRDF/Metro). This stretched cluster use case relies on the existence of a third site to perform the witness role.
Protected clusters with SRDF/S and SRDF/A
In this topology, vSphere clusters built using traditional configuration techniques on PowerOne systems are recovered at a defined secondary site on a per-cluster basis. The recovery process can take up to several minutes, depending on the number of servers involved.
The functional requirements for this approach are:
- Deploy two PowerOne Systems with PowerMax, one at the primary site and one at the recovery site. These systems must be licensed for SRDF and VMware SRM.
- Create, modify or delete a cluster at the primary site. Add volumes to the replication set. Select the mode of protection based on RPO with sync for zero data loss (or) async for data loss (from a few seconds of data loss to minutes, depending on the RPO).
- Add stretched application VLAN(s), typically called overlay networks, such as VMware VXLAN using NSX, so that IP addressing will work at the primary or the recovery site. Any mechanism to re-IP and modify DNS entries as needed in the secondary site may also be valid.
- Create, modify or delete remote array connections, called SRDF Groups. Links must be scalable so you can add links to increase throughput.
- Invoke failover/failback to prove all technologies and operational processes are functioning as expected.
- In order to be able to attach the replicated storage volumes for failover and failback, create storage-less clusters and add them to the vCenter instance at the secondary site.
- For non-disruptive failover testing, SRM provides an orchestrated recovery validation mechanism called Bubble. Bubble provides a recovery area using SnapVX clones of R2 volumes without impacting the replication process. Isolate the Bubble recovery to specific VLANs or subnets (so that it does not overlap with production or recovery networks).
This approach describes a typical Disaster Recovery scenario. Through the integration of PowerOne, SRDF/S/A, and VMware SRM, we can create an automated DR architecture that, in case of a site failure, will failover the production CRGs to the secondary disaster site. VMware SRM automates this multi-step recovery of virtual machines. For more details about pre-requisites, supported devices, and configurations, see Implementing Dell EMC SRDF SRA with VMware SRM.
Stretched clusters with SRDF/Metro
In this topology, PowerOne clusters are always on. If one site goes down, VMs are restarted on surviving servers. Application architecture determines recovery time. For example, this means that monolithic applications that require all VMs to be restarted will have a wait time, whereas distributed or cloud-native applications will continue to run without interruption at lower capacity levels until the failed VMs restart.
Figure 1: PowerOne with SRDF Metro architecture
The functional requirements for this approach are:
- Deploy two PowerOne Systems, with PowerMax licensed for SRDF/Metro, within metro distance. Create a low-latency communication channel between them to avoid disk write delays.
- Create, modify, or delete remote array connections, called SRDF Groups, which require redundant ports and replication adapters using Ethernet or Fibre Channel protocols. Links must be scalable so you can add links to increase throughput.
- Create, modify, or delete VMware Metro Storage Cluster(s) using SRDF device pairing. Split servers 50/50 across both systems so that storage is bidirectionally mirrored across both sites.
- Add stretched application VLAN(s), such as VMware VXLAN using NSX, so that IP addressing works regardless of which half of the cluster runs the application VM.
- (Optional) Implement any fine-grained workload migration controls to address restarting the workloads if application-specific needs or dependencies arise.
- Create, delete, or modify SRDF pairs. Suspend and deactivate SRDF/Metro failure recovery controls.
- Implement either a bias or witness mechanism to prevent data inconsistencies with multi-access at both sides. The witness is an external arbitrator reachable by both sites. The witness can be a Virtual Witness (vWitness) or a physical array acting as a witness.
In the third and final part of this blog series, we will explore the network architecture, best practices, and recommendations for the different PowerOne with SRDF scenarios we have presented.
Related Blog Posts
PowerOne and SRDF (Part I)
Tue, 23 Jun 2020 18:03:50 -0000|
Read Time: 0 minutes
Planning for disaster recovery is essential for IT organizations who are designing their environments to support business-critical applications. Each new application or instance must be deployed with enough resiliency to overcome common hazards such as floods, fire, power failures, and human error.
To design a resilient IT infrastructure, we must first consider the datacenter site, or sites. If there is a single datacenter site, simply duplicating the IT infrastructure will not provide the desired resiliency if the failure event impacts the entire site. If we deploy our business applications across more than one site, we would need mechanisms to replicate the information across sites. In the event of a site loss, even with information replicated, we would still need to introduce processes or tools that would help during the subsequent failover and failback operations.
All industries and geographies share a need for a resilient IT architecture, one that is manifested in IT architecture proposals that consider factors such as:
- Site Distance – Depending on the distance between sites, the required technologies will vary, and the recovery scenarios will be different. Relatively short distances (under 100 km with Round Trip Time under 10 ms) allow the use of more powerful tools in order to minimize the following two factors (RTO and RPO).
- Recovery Time Objective (RTO) – Every business or application may allow for a different length of time during which to recover when a failure occurs. Some will only support a few seconds of application downtime or no downtime at all, while others may be able to withstand minutes or even hours. This factor greatly influences the architectural requirements.
- Recovery Point Objective (RPO) – Another key factor when defining a solution is the amount of data a business can afford to lose in the event of an application outage or site loss. In some cases, a business might be able to withstand recovery of its applications to a data state that existed minutes or even hours before the failure; in other cases, the business could not withstand the loss of a single transaction.
In this context, we propose a solution to address this business need with a highly effective and function-rich architecture, featuring Dell EMC PowerOne with Dell EMC Symmetrix Remote Data Facility (SRDF), and VMware Site Recovery Manager.
This blog post is part one of a three-part series. In this first installment, we will expose the business context and technologies involved. In part two we will deal with the main use cases that this blog addresses. In the third and last blog post, we will share some technology recommendations and best practices.
PowerOne and SRDF technology overview
Dell EMC PowerOne combines compute, storage, and networking in a fully engineered and highly automated converged infrastructure that provides autonomous operations, all-in-one simplicity, and flexible consumption options. With PowerOne, IT organizations can start moving from traditional operations to modern cloud outcomes.
Based on vSphere clusters, PowerOne delivers business outcomes. During daily tasks, such as provisioning workloads, the customer is never required to specify low-level details about IP stack configuration parameters, storage array configuration object names, and so on. Instead, the customer is asked only to identify the capacity required to support the target workload. All other information required to deliver the desired outcome is derived from system standards and best practices.
Dell EMC Symmetrix Remote Data Facility (SRDF) solutions provide near real-time copies of application data from a production storage array to one or more remote storage arrays. The main use cases are:
- Disaster recovery
- High availability
- Data migration
In a traditional SRDF device pair relationship, the secondary device (“R2”), is read-only, and writes are disabled. Only the primary device (“R1”) is enabled for read and write activity. With SRDF/Metro, the R2 is also write-enabled and accessible by the host or application. The R2 takes on the personality of the R1, including the World Wide Name (WWN). A host would see both the R1 and R2 as the same device.
When SRDF/Metro is used in conjunction with VMware vSphere across various hosts in two sites, a VMware vSphere Metro Storage Cluster (vMSC) is formed. A VMware vMSC infrastructure is a stretched cluster -- an architecture that extends local network and storage configuration across remote sites, enabling on-demand and nonintrusive workload mobility.
VMware Site Recovery Manager (SRM) is another technology that can play a key role in simplifying operations in multi-site architectures. VMware SRM provides workflow and business continuity, and disaster restart process management for VMware vSphere workloads. For the SRDF/Metro use case, because we can build a vMSC, SRM is not required because the multi-site deployment is perceived by vSphere workloads as a single stretched site. However, SRM is a mainstream technology for SRDF/S/A for handling failover and failback operations. In a second use case documented in the white paper Protecting Business-Critical Workloads with Dell EMC SRDF and PowerOne, VMware SRM leverages SRDF replication to protect PowerOne Cluster Resource Groups (CRGs).
The integration of VMware SRM with SRDF automates storage-based disaster restart operations on PowerOne systems. In the white paper, we focus on the availability and disaster recovery scenarios made possible by PowerOne.
Figure 1: PowerOne with SRDF basic architecture
In my next blog post, we will explore two use cases and their associated technologies:
- Two data center sites with SRDF Synchronous or Asynchronous (SRDF/S or SRDF/A)
- Two sites in a stretch cluster configuration with synchronous data mirroring (SRDF/Metro).
Extending Dell Technologies Cloud Platform Availability for Mission Critical Applications
Mon, 29 Jun 2020 14:48:57 -0000|
Read Time: 0 minutes
Reference Architecture Validation Whitepaper Now Available!
Many of us here at Dell Technologies regularly have conversations with customers and talk about what we refer to as the “Power of the Portfolio.” What does this mean exactly? It is essentially a reference to the fact that, as Dell Technologies, we have a robust and broad portfolio of modern IT infrastructure products and solutions across storage, networking, compute, virtualization, data protection, security, and more! At first glance, it can seem overwhelming to many. Some even say it could be considered complex to sort through. But we, as Dell Technologies, on the other hand, see it as an advantage. It allows us to solve a vast majority of our customers’ technical needs and support them as a strategic technology partner.
It is one thing to have the quality and quantity of products and tools to get the job done -- it’s another to leverage this portfolio of products to deliver on what customers want most: business outcomes.
As Dell Technologies continues to innovate, we are making the best use of the technologies we have and are developing ways to use them together seamlessly in order to deliver better business outcomes for our customers. The conversations we have are not about this product OR that product but instead they are about bringing together this set of products AND that set of products to deliver a SOLUTION giving our customers the best of everything Dell Technologies has to offer without compromise and with reduced risk.
Figure 1: Cloud Foundation on VxRail Platform Components
The Dell Technologies Cloud Platform is an example of one of these solutions. And there is no better example that illustrates how to take advantage of the “Power of the Portfolio” than one that appears in a newly published reference architecture white paper that focuses on validating the use of the Dell EMC PowerMax system with SRDF/Metro in a Dell Technologies Cloud Platform (VMware Cloud Foundation on a Dell EMC VxRail) multi-site stretched-cluster deployment configuration (Extending Dell Technologies Cloud Platform Availability for Mission Critical Applications).This configuration provides the highest levels of application availability for customers who are running mission-critical workloads in their Cloud Foundation on VxRail private cloud that would otherwise not be possible with core DTCP alone.
Let’s briefly review some of the components used in the reference architecture and how they were configured and tested.
Using external storage with VCF on VxRail
Customers commonly ask whether they can use external storage in Cloud Foundation on VxRail deployments. The answer is yes! This helps customers ease into the transition to a software-defined architecture from an operational perspective. It also helps customers leverage the investments in their existing infrastructure for the many different workloads that might still require external storage services.
External storage and Cloud Foundation have two important use cases: principal storage and supplemental storage.
- Principal storage - SDDC Manager provisions a workload domain that uses vSAN, NFS, or Fiber Channel (FC) storage for a workload domain cluster’s principal storage (the initial shared storage that is used to create a cluster). By default, VCF uses vSAN storage as the principal storage for a cluster. The option to use NFS and FC-connected external storage is also available. This option enables administrators to create a workload domain cluster whose principal storage can be a previously provisioned NFS datastore or an FC-based VMFS datastore instead of vSAN. External storage as principal storage is only supported on VI Workload Domains as vSAN is the required principal storage for the management domain in VCF.
- Supplemental storage - This involves mounting previously provisioned external NFS, iSCSI, vVols, or FC storage to a Cloud Foundation workload domain cluster that is using vSAN as the principal storage. Supporting external storage for these workload domain clusters is comparable to the experience of administrators using standard vSphere clusters who want to attach secondary datastores to those clusters.
At the time of writing, Cloud Foundation on VxRail supports supplemental storage use cases only. This is how external storage was used in the reference architecture solution configuration.
The Dell EMC PowerMax is the first Dell EMC hardware platform that uses an end-to-end Non-Volatile Memory Express (NVMe) architecture for customer data. NVMe is a set of standards that define a PCI Express (PCIe) interface used to efficiently access data storage volumes based on Non-Volatile Memory (NVM) media, which includes modern NAND-based flash along with higher-performing Storage Class Memory (SCM) media technologies. The NVMe-based PowerMax array fully unlocks the bandwidth, IOPS, and latency performance benefits that NVM media and multi-core CPUs offer to host-based applications—benefits that are unattainable using the previous generation of all-flash storage arrays. For a more detailed technical overview of the PowerMax Family, please check out the whitepaper Dell EMC PowerMax: Family Overview.
The following figure shows the PowerMax 2000 and PowerMax 8000 models.
Figure 2: PowerMax product family
The Symmetrix Remote Data Facility (SRDF) maintains real-time (or near real-time) copies of data on a PowerMax production storage array at one or more remote PowerMax storage arrays. SRDF has three primary applications:
- Disaster recovery
- High availability
- Data migration
In the case of this reference architecture, SRDF/Metro was used to provide enhanced levels of high availability across two availability zone sites. For a complete technical overview of SRDF, please check out this great SRDF whitepaper: Dell EMC SRDF.
Now that we are familiar with the components used in the solution, let’s discuss the details of the solution architecture that was used.
This overall solution design provides enhanced levels of flexibility and availability that extend the core capabilities of the VCF on VxRail cloud platform. The VCF on VxRail solution natively supports a stretched-cluster configuration for the management domain and a VI workload domain between two availability zones by using vSAN stretched clusters. A PowerMax SRDF/Metro with Metro Stretched Cluster (vMSC) configuration is added to protect VI workload domain workloads by using supplementary storage for the workloads that are running on them.
Two types of vMSC configurations are verified with stretched Cloud Foundation on VxRail: uniform and non-uniform.
- Uniform host access configuration - vSphere hosts from both sites are all connected to a storage node in the storage cluster across all sites. Paths presented to vSphere hosts are stretched across a distance.
- Non-uniform host access configuration - vSphere hosts at each site are connected only to storage nodes at the same site. Paths presented to vSphere hosts from storage nodes are limited to the local site.
The following figure shows the topology used in the reference architecture of the Cloud Foundation uniform stretched-cluster configuration with PowerMax SRDF/Metro.
Figure 3: Cloud Foundation on VxRail uniform stretched-cluster config with PowerMax SRDF/Metro
The following figure shows the topology used in the reference architecture of the Cloud Foundation on VxRail non-uniform stretched cluster configuration with PowerMax SRDF/Metro.
Figure 4: Cloud Foundation on VxRail non-uniform stretched-cluster config with PowerMax SRDF/Metro
Solution Validation Testing Methodology
We completed solution validation testing across the following major categories for both iSCSI and FC connected devices:
- Functional Verification Tests - This testing addresses the basic operations that are performed when PowerMax is used as supplementary storage with VMware VCF on VxRail.
- High Availability Tests - HA testing helps validate the capability of the solution to avoid a single point of failure, from the hardware component port level up to the IDC site level.
- Reliability Tests - In general, reliability testing validates whether the components and the whole system are reliable enough with a certain level of stress running on them.
For complete details on all of the individual validation test scenarios that were performed, and the pass/fail results, check out the whitepaper.
To summarize, this white paper describes how Dell EMC engineers integrated VMware Cloud Foundation on VxRail with PowerMax SRDF/Metro and provides the design configuration steps that they took to automatically provision PowerMax storage by using the PowerMax vRO plug-in. The paper validates that the Cloud Foundation on VxRail solution functions as expected in both a PowerMax uniform vMSC configuration and a non-uniform vMSC configuration by passing all the designed test cases. This reference architecture validation demonstrates the power of the Dell Technologies portfolio to provide customers with modern cloud infrastructure technologies that deliver the highest levels of application availability for business-critical and mission-critical applications running in their private clouds.
Find the link to the white paper below along with other VCF on VxRail resources and see how you can leverage the “Power of the Portfolio” to support your business!
Twitter - @vwhippersnapper