Home > Integrated Products > VxBlock 1000 and 3-Tier Platform Reference Architectures > White Papers > VMware Cloud Foundation Stretched Cluster on Dell VxBlock 1000 Multi-Site > Testing and validation
The solution went through a testing and validation cycle to ensure the functionality of the deployment. The test cases encompassed general network tests, equipment failure scenarios, and a simulated availability zone failure.
Number | Passed | Scenario | Expected Behavior |
1 | Yes | Multiple ESXi host failure(s) – Power off | VMware HA restarts the virtual machines on any of the surviving ESXi hosts within the VMware HA Cluster. |
2 | Yes | Multiple ESXi host failure(s) – Network disconnect | HA continues to exchange cluster heartbeat through the shared datastore. No virtual machine failovers occur. |
3 | Yes | ESXi host experiences APD (All Paths down) – Encountered when the ESXi host loses access to its storage volumes (in this case, VPLEX Volumes). | In an APD (All Paths Down) scenario, the ESXi host must be rebooted to recover. If the ESXi Server is restarted, this will cause VMware HA to restart the failed virtual machines on other surviving ESXi Servers within the VMware HA cluster. |
4 | Yes | VPLEX cluster failure (The VPLEX at either site-A or site-B has failed, but ESXi and other LAN/WAN/SAN components are intact.) | The I/O continues to be served on all the volumes on the surviving site. |
5 | Yes | VPLEX Storage volume is unavailable (for example, it is accidentally removed from the storage view or the ESXi initiators are accidentally removed from the storage view) | VPLEX continues to serve I/O on the other site where the Volume is available. |
6 | Yes | VPLEX inter-site link failure; vSphere cluster management network intact | VPLEX transitions Distributed Virtual Volumes on the non-preferred site to the I/O failure state. On the preferred site, the Distributed Virtual Volumes continue to provide access. |
7 | Yes | VPLEX Cluster Witness failure | VPLEX continues to serve I/O at both sites. |
8 | Yes | VPLEX inter-site WAN link failure and simultaneous Cluster Witness to site-B link failure | The VPLEX fails I/O on the Distributed Virtual Volumes at site-B and continues to serve I/O on site-A. |
9 | Yes | VPLEX inter-site WAN link failure and simultaneous Cluster Witness to site-A link failure | The VPLEX fails I/O on the Distributed Virtual Volumes at site-A and continues to serve I/O on site-B. |
10 | Yes | Multi AZ Power down Edge hosts in AZ1 | Edge VMs move to AZ2 hosts and are reachable after HA powers them up. |
11 | Yes | Multi AZ Bring Down ToR 9k switches in AZ1 | Traffic should be brought up on ToR switches in AZ2. |
12 | Yes | Multi AZ Verify BGP Peering in AZ2 after failover | BGP was successful. |
13 | Yes | Multi AZ Test ECMP functionality in AZ2 | ECMP is functioning correctly. |
14 | Yes | Functional Test Multi AZ Verify East/West Network Connectivity after failover | Ping and traceroutes were successful. |
15 | Yes | Functional Test Multi AZ Verify North/South Network Connectivity after failover | Ping and traceroute were successful. |
16 | Yes | Multi AZ Power up Edge hosts in AZ1 | Edge VMs should be moved by vSphere HA and powered up on AZ1 hosts |
17 | Yes | Multi AZ Verify BGP Peering in AZ1 after edge VM's are back online in AZ1 | BGP was successful. |
18 | Yes | Multi AZ BGP Connectivity tests to TOR switches | Routing and BGP connectivity should be up. |
19 | Yes | Using the CIMC, simultaneously power down all hosts in AZ2 | VMs running on the failed site are powered off by vSAN. Non-Stretch VMs will not be restarted. Stretch VMs running in AZ2 will be restarted in AZ1. |
20 | Yes | Complete site-B failure (The failure includes all ESXi hosts and the VPLEX cluster at site-B.) | VPLEX continues to serve I/O on the surviving site (site-A). When the VPLEX at site-B is restored, the Distributed Virtual Volumes are synchronized automatically from the active site (site-A). |
21 | Yes | Using the CIMC, simultaneously power down all hosts in AZ1 | VMs running on the failed site are powered off by vSAN. Non-Stretch VMs will not be restarted. Stretch VMs running in AZ1 will be restarted in AZ2. |
22 | Yes | Complete site-A failure (The failure includes all ESXi hosts and the VPLEX cluster at site-A.) | VPLEX continues to serve I/O on the surviving site (site-B). When the VPLEX at the failed site (site-A) is restored, the Distributed Virtual Volumes are synchronized automatically from the active site (site-B). |
23 | Yes | Using the CIMC power down all hosts in AZ2 and verify must run affinity rules for AZ2 | Verify dur-m01-cl01-secondary-az-nonstretch-vms-must-run (srvs 1-4). Verify host rule is enforced with no applicable VMs restarting in AZ1. |
24 | Yes | Using the CIMC power down all hosts in AZ1 and verify must run affinity rules for AZ1 | Verify dur-m01-cl01-primary-az-nonstretch-vms-must-run (srvs 1-4). Verify host rule is enforced with no applicable VMs restarting in AZ2. |
25 | Yes | On the vSAN Witness Appliance ToR Switches A & B, configure an access list on each switch to block any IP traffic to and from the VSAN Witness Appliance vmk0 address to the AZ1 Management host vmk0 IP addresses. Upon test completion remove the ACL. | Management VMs should continue to run from their host location. |
26 | Yes | On the vSAN Witness Appliance ToR Switches A & B, configure an access list on each switch to block any IP traffic to and from the VSAN Witness Appliance vmk0 address to the AZ2 Management host vmk0 IP addresses. Upon test completion remove the ACL. | Management VMs should continue to run from their host location. |
27 | Yes | On the Witness Appliance, verify vmk0 is set to MTU 9000 and vmkping test passes with jumbo frames and DF bit turned on to AZ1 and AZ2 hosts. | Jumbo frame ping test with DF bit set should pass between vSAN Witness vmk0 and AZ1 & AZ2 hosts to the VSAN vmk IP address. |
28 | Yes | On the Witness Appliance, verify vmk0 is set to MTU 1500 and vmkping test passes with jumbo frames and DF bit turned off to AZ1 and AZ2 hosts. | Jumbo frame ping test with DF bit off should pass between vSAN Witness vmk0 and AZ1 & AZ2 hosts VSAN vmk IP address. |
29 | Yes | In AZ1, place each host sequentially into maintenance mode using the vSAN data migration option "Ensure accessibility" and verify HA failover and restart for the VMs. Note - Only two hosts can be placed into maintenance mode. | Components on that host will be marked as absent. HA will restart any VMs running on that host. |
30 | Yes | In AZ2, place a host into maintenance mode using the vSAN data migration option "Ensure accessibility" and verify HA failover and restart for the VMs. Note - Only two hosts can be placed into maintenance mode. | Components on that host will be marked as absent. HA will restart any VMs running on that host. |
31 | Yes | In AZ1, identify the hosts for NSX Manager A and NSX Manager B. Sequentially place each host into maintenance mode. | vSphere HA will restart one NSX Manager on AZ1 hosts and the second NSX Manager on AZ2 hosts. This will occur due to VM Separation Affinity rules. |
32 | Yes | In AZ1, using CIMC power down a host and verify HA restarts VMs on another host in AZ1 | Components on that host will be marked as absent. HA will restart any VMs running on that host. |
33 | Yes | In AZ2, using CIMC power down a host and verify HA restarts VMs on another host in AZ2 | Components on that host will be marked as absent. HA will restart any VMs running on that host. |
34 | Yes | Select EM VM which can run in both AZ1 and AZ2. If necessary, vMotion the VM to be sure it is not running on the two hosts to be powered down. Using the CIMC power down both hosts. Verify that the VM is still functioning. | Components on those hosts will be marked as absent. The VM should continue to run on the same host without an HA restart. If necessary, the VM should be able to access object components in AZ2. |
35 | Yes | Select EM VM which can only run in AZ2. If necessary, vMotion the VM to be sure it is not running on the two hosts to be powered down. Using the CIMC power down both hosts. Verify that the VM is still functioning. | Components on those hosts will be marked as absent. The VM should continue to run on the same host without an HA restart. |
36 | Yes | General communication from AZ1 to AZ2 | Hosts in AZ1 should be able to ping hosts in AZ2. |
37 | Yes | Datastore Failover between AZ1 and AZ2 | Datastore from VPLEX in AZ1 should failover to AZ2 and vice versa. |
38 | Yes | Shutdown SVI interfaces of the 4th highest priority ToR switch | No impact |
39 | Yes | Shutdown SVI interfaces of the 3rd highest priority ToR switch | No impact |
40 | Yes | Shutdown SVI interfaces of the 2nd highest priority ToR switch | No impact |
41 | Yes | Shutdown SVI interfaces of the highest priority ToR switch | No impact |
42 | Yes | Reboot the AZ1 ToR B switch to simulate a switch down scenario | No impact |
43 | Yes | Reboot the AZ1 ToR A switch to simulate a switch down scenario | No impact |
44 | Yes | Reboot both the AZ1 ToR A & B switch to simulate a switch down scenario | No impact |
45 | Yes | Reboot the AZ2 ToR B switch to simulate a switch down scenario | No impact |
46 | Yes | Reboot the AZ2 ToR A switch to simulate a switch down scenario | No impact |
47 | Yes | Reboot both the AZ2 ToR A & B switch to simulate a switch down scenario | No impact |
48 | Yes | Reboot the AZ1 Mgmt B switch to simulate a switch down scenario | No impact |
49 | Yes | Reboot the AZ1 Mgmt A switch to simulate a switch down scenario | No impact |
50 | Yes | Reboot both the AZ1 Mgmt A & B switch to simulate a switch down scenario | No impact |
51 | Yes | Test Management cluster anti-affinity rules and host enters maintenance mode | All management hosts should reside on unique hosts, that is, no two NSX Manager appliances should reside on the same AMP host. |
52 | Yes | Test Management cluster anti-affinity rules and single host failure | HA powers on VM on a different unique host, that is, no two NSX Manager appliances should reside on the same AMP host. |
53 | Yes | Test Management cluster anti-affinity rules and double host failure | HA powers on VM on a different unique host, that is, no two NSX Manager appliances should reside on the same AMP host. |
54 | Yes | Test Management Edge VM anti-affinity rules and host enters maintenance mode | All management Edge VM should reside on unique hosts, that is, no two NSX Edge VMs should reside on the same AMP host. |
55 | Yes | Test management edge VM anti-affinity rules and single host failure | HA powers on VMs. All management hosts should reside on unique hosts, that is, no two MSX edge VMs should reside on the same AMP host. |
56 | Yes | Test management edge VM anti-affinity rules and double host failure | HA Powers on VMs. All management hosts should reside on unique hosts, that is, no two MSX edge VMs should reside on the same AMP host. |
57 | Yes | Test Management domain ECMP functionality | ECMP was using Different N/S traffic Paths. |
58 | Yes | Failure Testing - Management Edge BFD fail over | BFD Failed over in the correct parameters. |
59 | Yes | Test Management domain BGP functionality | BGP was successful. |
60 | Yes | General zone communication through Edge | Ping continued with 1 ping packet lost. |
61 | Yes | Verify the NSX Manager Appliance cluster is fully operational | Cluster was stable. |
62 | Yes | Down host with active Tier-0 Edge node VM and observe failover | Network connectivity stayed up with no packets lost. |
63 | Yes | Test Workload ECMP functionality | ECMP is functioning. |
64 | Yes | Down an active transport node and observe failover | Ping continued with 1 ping packet lost. |
65 | Yes | Test Workload NSX Manager cluster anti-affinity rules and host enters maintenance mode | NSX Manager node migrated to a different AMP host using DRS. |
66 | Yes | Test Workload NSX Manager cluster anti-affinity rules and single host failure | NSX Manager node migrated to a different AMP host using DRS. |
67 | Yes | Test Workload NSX Manager cluster anti-affinity rules and host enters maintenance mode | NSX Manager node migrated to a different AMP host using DRS. |
68 | Yes | Test Workload NSX Manager cluster anti-affinity rules and single host failure | NSX Manager node migrated to a different AMP host using DRS. |
69 | Yes | Test Workload NSX Manager cluster anti-affinity rules and double host failure | NSX Manager went disconnected until the host was back online. |
70 | Yes | Failure Testing - Workload Edge BFD fail over | BFD Failed over in the correct parameters. |
71 | Yes | Verify ECMP is using multiple paths for N/S traffic flows | ECMP was using Different N/S traffic Paths. |
72 | Yes | Test Workload Edge VM anti-affinity rules and host enters maintenance mode | NSX Edge VM moved to a different host. |
73 | Yes | Test workload edge VM anti-affinity rules and single host failure | NSX Edge VM moved to a different host. |
74 | Yes | Test workload edge VM anti-affinity rules and double host failure | HA moved one NSX Edge VM to an AZ2 Host. |
75 | Yes | Functional testing Workload VM East/West same segment different host | Ping and traceroute were successful. |
76 | Yes | Functional testing Workload VM East/West different segment different host | Ping and traceroute were successful. |
77 | Yes | Functional testing Workload VM East/West same segment same host | Ping and traceroute were successful. |
78 | Yes | Functional testing Workload VM East/West different segment same host | Ping and traceroute were successful. |
79 | Yes | Functional testing Workload VM North/South Segment 1 | Ping and traceroute were successful. |
80 | Yes | Functional testing Workload VM North/South Segment 2 | Ping and traceroute were successful. |
81 | Yes | Test Workload domain BGP functionality | BGP was successful. |