Short articles related to Microsoft HCI Solutions from Dell Technologies
Microsoft HCI Solutions from Dell Technologies: Designed for extreme resilient performance
Wed, 02 Jun 2021 02:31:13 -0000|
Read Time: 0 minutes
Dell EMC Integrated System for Microsoft Azure Stack HCI (Azure Stack HCI) is a fully productized HCI solution based on our flexible AX node family as the foundation.
Before I get into some exciting performance test results, let me set the stage. Azure Stack HCI combines the software-defined compute, storage, and networking features of Microsoft Azure Stack HCI OS, with AX nodes from Dell Technologies to deliver the perfect balance for performant, resilient, and cost-effective software-defined infrastructure.
Figure 1 illustrates our broad portfolio of AX node configurations with a wide range of component options to meet the requirements of nearly any use case – from the smallest remote or branch office to the most demanding database workloads.
Figure 1: current platforms supporting our Microsoft HCI Solutions from Dell Technologies
Each chassis, drive, processor, DIMM module, network adapter and their associated BIOS, firmware, and driver versions have been carefully selected and tested by the Dell Technologies Engineering team to optimize the performance and resiliency of Microsoft HCI Solutions from Dell Technologies. Our Integrated Systems are designed for 99.9999% hardware availability*.
* Based on Bellcore component reliability modeling for AX-740xd nodes and S5248S-ON switches a) in 2- to 4-node clusters configured with N + 1 redundancy, and b) in 4- to 16-node clusters configured with N + 2 redundancy, March 2021.
Comprehensive management with Dell EMC OpenManage Integration with Windows Admin Center, rapid time to value with Dell EMC ProDeploy options, and solution-level Dell EMC ProSupport complete this modern portfolio.
You'll notice in that table that we have a new addition -- the AX-7525: a dual-socket, AMD-based platform designed for extreme performance and high scalability.
The AX-7525 features direct-attach NVMe drives with no PCIe switch, which provides full Gen4 PCIe potential to each storage device, resulting in massive IOPS and throughput at minimal latency.
To get an idea of how performant and resilient this platform is, our Dell Technologies experts put a 4-node AX-7525 cluster to the test. Each node had the following configuration:
The easy headline would be that this setup consistently delivered nearly 6M IOPs at sub 1ms latency. One could think that we doctored these performance tests to achieve these impressive figures with just a 4-node cluster!
The reality is that we sought to establish the ‘hero numbers’ as a baseline – ensuring that our cluster was configured optimally. However, we didn’t stop there. We wanted to find out how this configuration would perform with real-world IO patterns. This blog won’t get into the fine-grained details of the white paper, but we’ll review the test methodology for those different scenarios and explain the performance results.
Figure 2 shows the 4-node cluster and fully converged network topology that we built for the lab:
Figure 2: Lab setup
We performed two differentiated sets of tests in this environment:
We chose the three-way mirror resiliency type for the volumes we created with VMFleet because of its superior performance versus erasure coding options in Storage Spaces Direct.
Now that we have a clearer idea of the lab setup and the testing methodology, let’s move on to the results for the four tests.
Here are the details of the workload profile and the performance we obtained:
100% random read
100% random write
100% sequential read
100% sequential write
* The reason for this slightly higher latency is because we are pushing too many Outstanding IOs and we already plateaued on performance. We noticed that even with 32 VMs, we hit the same IOs, because all we are doing from that point on is adding more load that a) isn’t driving any additional IOs and b) just adds to the latency.
This test sets the bar for the limits and maximum performance we can obtain from this 4-node cluster: almost 6 million read IOs, 700k write IOs, and a bandwidth of 105 GB/s for reads, and 8 GB/s for writes.
The IO profiles for this test encompass a broad range of real-life scenarios:
The following figure shows the details and results we obtained for all the different tested IO patterns:
Figure 3: Test 2 results
Super impressive results and important to notice (on the left) the 1.6 million IOPS at 1.2 millisecond average latency for the typical OLTP IO profile of 8 KB block size and 30% random write. Even at 32k block size and 50% write IO ratio, we measured 400,000 IOs at under 7 milliseconds latency.
Also, very remarkable is the extreme throughput we witnessed during all the tests, with special emphasis on the incredible 29.65 GB/s with an IO profile of 512k block size and 20% write ratio.
To simulate a one-node failure (Test 3), we shut down node 4, which caused node 2 to take additional ownership of the 32 restarted VMs from node 4, for a total of 64 VMs on node 2.
Similarly, to simulate a two-node failure (Test 4), we shut down nodes 3 and 4, leading to a VM reallocation process from node 3 to node 1, and from node 4 to node 2. Nodes 1 and 2 ended up with 64 VMs each.
The cluster environment continued to produce impressive results even in this degraded state. The table below compares the testing scenarios that used IO profiles aimed at identifying the maximum thresholds.
One node failure
Two node failure
Figure 4 illustrates the test results for real-life workload scenarios for the healthy cluster and for the one-node and two-node degraded states.
Figure 4: Test 3 and 4 results
Once more, we continued to see outstanding performance results from an IO, latency, and throughput perspective, even with one or two nodes failing.
One important consideration we observed is that for the 4k and 8k block sizes, IOs decrease and latency increases as one would expect, whereas for the 32k and higher block sizes we realized that:
There are two reasons for this:
We are happy to share with you these figures about the extreme-resilient performance our integrated systems deliver, during normal operations or in the event of failures.
Dell EMC Integrated System for Microsoft Azure Stack HCI, especially with the AX-7525 platform, is an outstanding solution for customers struggling to support their organization’s increasingly heavy demand for resource intensive workloads and to maintain or improve their corresponding service level agreements (SLAs).
Azure Stack HCI Stretch Clustering: because automatic disaster recovery matters
Mon, 29 Mar 2021 18:19:31 -0000|
Read Time: 0 minutes
If history has taught us anything, it’s that disasters are always around the corner and tend to appear in any shape or form when they’re least expected.
To overcome these circumstances, we need the appropriate tools and technologies that can guarantee resuming operations back to normal in a secure, automatic, and timely manner.
Traditional disaster recovery (DR) processes are often complex and require a significant infrastructure investment. They are also labor intensive and prone to human error.
Since December 2020, the situation has changed. Thanks to the new release of Microsoft Azure Stack HCI, version 20H2, we can leverage the new Azure Stack HCI stretched cluster feature on Dell EMC Integrated System for Microsoft Azure Stack HCI (Azure Stack HCI).
The integrated system is based on our flexible AX nodes family as the foundation, and combines Dell Technologies full stack life cycle management with the Microsoft Azure Stack HCI operating system.
It is important to note that this technology is only available for the integrated system offering under the certified Azure Stack HCI catalog.
Azure Stack HCI stretch clustering provides an easy and automatic solution (no human interaction if desired) that assures transparent failovers of disaster-impacted production workloads to a safe secondary site.
It can also be leveraged to perform planned operations (such as entire site migration, or disaster avoidance) that, until now, required labor intensive and error prone human effort for execution.
Stretch clustering is one type of Storage Replica configuration. It allows customers to split a single cluster between two locations—rooms, buildings, cities, or regions. It provides synchronous or asynchronous replication of Storage Spaces Direct volumes to provide automatic VM failover if a site disaster occurs.
There are two different topologies:
Azure Stack HCI stretch clustering topologies: Active-Passive and Active-Active
To be truly cost-effective, the best data protection strategies incorporate a combination of different technologies (deduplicated backup, archive, data replication, business continuity, and workload mobility) to deliver the right level of data protection for each business application.
The following diagram highlights the fact that just a reduced data set holds the most valuable information. This is the sweet spot for stretch clustering.
For a real-life experience, our Dell Technologies experts put Azure Stack HCI stretched clustering to the test in the following lab setup:
Test lab cluster network topology
Note these key considerations regarding the lab network architecture:
For all the details, see this white paper: Adding Flexibility to DR Plans with Stretch Clustering for Azure Stack HCI.
In this blog though, I only want to focus on summarizing the results we obtained in our labs for the following four scenarios:
Simulated failure or maintenance event
Unplanned node failure
Node 1 in Site 1 power-down
Impacted VMs should failover to another local node
In around 5 minutes, all 10 VMs in Node 1 Site 1 fully restarted in Node 2 Site 1.
This is expected behavior since Site 1 has been configured as preferred site; otherwise, the active volume could have been moved to Site 2, and the VMs would have been restarted on a cluster node in Site 2.
Outage in Site 1
Simultaneous power-down of Nodes 1 and 2 in site 1
Impacted VMs should failover to nodes on the secondary site
In 25 minutes, all VMs were restarted, and the included web application was fully responsive.
The volumes owned by the nodes in Site 2 remained online throughout this failure scenario.
The replica volumes remained offline until Site 1 was restored to full health.
Once Site 1 was back online, synchronous replication began again from the source volumes in Site 2 to their destination replica partners in Site 1.
Switch Direction operation on a volume from Windows Admin Center
Selected VMs and workloads should transparently move to secondary site
Within 0 to 3 mins, the application hosted by the affected VMs was reachable without service interruption (time depends on whether IP reassignment is required).
First, the owner node for the volumes changed to Node 2 in Site 2, and owner node for the replica volumes changed to Node 2 in Site 1. No service interruption.
At this time, the test VM was running in Site 1, but its virtual disk that resided on the volume was running in Site 2. Performance problems can result because I/O is traversing the replication links across sites. After approximately 10 minutes, a Live Migration of the test VM would occur automatically (if not manually initiated earlier) so that the VM would be on the same node as its virtual disk.
Update all nodes in the cluster by using Single-click Full Stack Cluster Aware Updating (CAU) in Windows Admin Center
Stretched cluster and CAU should work seamlessly together to provide full stack cluster update without service interruption and local only workload mobility for the Live Migrated VMs
The total process of applying the operating system and firmware updates to the stretched cluster took approximately 3 hours, and the process had no application impact.
Each node was drained, and its VMs were live migrated to the other node in the same site.
The intersite links between Site 1 and Site 2 were never used during update operations. In addition, the process required only a single reboot per node.
This behavior was consistent throughout the update of all the nodes in the stretched cluster.
To sum up, Azure Stack HCI Stretch Clustering has been shown to work as expected under difficult circumstances. It can easily be leveraged to cover a wide range of data protection scenarios, such as:
This technology may make the difference for businesses to automatically stand up after disaster strikes, a total game changer in the automatic disaster recovery landscape.
Thank you for your time reading this blog and don’t forget to check out the full white paper!!!
Dell EMC OpenManage Integration with Microsoft Windows Admin Center v2.0 Technical Walkthrough
Thu, 18 Mar 2021 19:29:13 -0000|
Read Time: 0 minutes
Dell EMC Integrated System for Microsoft Azure Stack HCI is a fully integrated HCI system for hybrid cloud environments that delivers a modern, cloud-like operational experience on-premises from a mature market leader.
The integrated system is based on our flexible AX nodes family as the laying foundation, and combines Dell Technologies full stack life cycle management with the Microsoft Azure Stack HCI operating system.
This blog focuses on one of the most important and critical parts of Azure Stack HCI: the management layer. Check this blog for additional background.
We will show how at Dell Technologies we make the good - Microsoft Windows Admin Center (WAC) - even better, through our OpenManage Integration with Microsoft Windows Admin Center v2.0 (OMIMSWAC).
The following diagram illustrates a typical Dell Technologies Azure Stack HCI setup:
To learn more about Microsoft HCI Solutions from Dell Technologies and get details on each of the different components, check out this video where our Dell Technologies experts examine the solution thoroughly from the bottom up.
WAC provides the option to leverage easy-to-use workflows to perform many tasks, including automatic deployments (coming soon) and updates.
Dell Technologies has developed specialized snap-ins that integrate OpenManage with WAC to further extend the capabilities of Microsoft’s WAC extensions.
The following table describes the three key elements highlighted in the previous diagram as (1), (2), and (3). We examine each in detail in the next three sections.
|Item||Type||Integrates with||Developed by||Description|
Microsoft Cluster Aware Updating extension
Microsoft Failover Cluster Tool Extension 1.250.0.nupkg release*
* Min version validated
WAC workflow to apply cluster aware OS updates
Dell EMC Integrated Full Stack Cluster Aware Updating
Microsoft CAU extension
Integration snap-in to main CAU workflow to provide BIOS, firmware and driver updates while performing OS updates
OMIMSWAC v2.0 Standalone extension
OpenManage WAC extension for Infrastructure Life cycle management, plus cluster monitoring, inventory and troubleshooting
Cluster Creation extension
Microsoft Cluster Creation Extension
* Min version validated
WAC workflow to create Azure Stack HCI Clusters
Integrated Deployment and Update (coming soon)
Microsoft IDU extension
Integration snap-in to main Cluster Creation workflow to provide BIOS, firmware and driver updates during the cluster creation process
Windows Admin Center extensions and integrations
You can install Microsoft Cluster Aware Updating extension within WAC by selecting the “Gear” icon on the top right corner, then under “Gateway”, navigate to “Extensions”. Under “Available extensions”, find the desired extension and select “Install”. For details, see the install guide. Please refer to the extensions product documentation for the latest updates.
To get to Microsoft WAC Azure Stack HCI Cluster Aware Updating extension, login to WAC and follow these steps:
It is important to note that you can select either to run only one operation at a time by skipping the other or run both in one single process and reboot.
You may select, if available, any Operating system update and click “Next: Hardware updates”.
This takes us to the second step of the sequence - Hardware updates - a key phase for the automated end-to-end cluster aware update process.
This is where the Dell Technologies snap-in integrates with Microsoft’s original workflow, allowing us to seamlessly provide automated BIOS, firmware, and driver updates (and OS updates if also selected) to all the nodes in the cluster with a single reboot. Let’s look at this process in detail in the next section.
Once you click “Next: Hardware updates” on the original Microsoft’s Azure Stack HCI Cluster Aware Updating workflow, you are taken to Dell EMC Cluster Aware Updating integration.
If the integration is not installed, there is an option to install it from inside the workflow.
Click “Get updates”.
Our snap-in for Cluster Aware Updating (CAU) takes us through the following sequence of five steps.
1. Prerequisites (screenshot above)
A validation process occurs, checking that all AX nodes are:
Click “Next: Update source”.
2. Update source
Here we can select the source for our BIOS, firmware, and driver repository, whether online [Update Catalog for Microsoft HCI Solutions] or offline (edge or disconnected) [Dell EMC Repository Manager Catalog]. Dell Technologies has created and keeps these solution catalogs updated.
Click “Next: Compliance report”.
3. Compliance report
Now we can check how compliant our nodes are and select for BIOS, firmware, and/or driver remediation. All the recommended components are selected by default.
The compliance operation runs in parallel for all nodes, and the report is shown consolidated across nodes.
Click “Next: Summary”.
All selections from all nodes are shown in Summary for review before we click “Next: Download updates”.
5. Download updates
This window provides the statistics regarding the download process (start time, download status).
When all downloads are completed, we can click “Next: Install”, which takes us back again to Step 3 of the main workflow (“Install”), to begin the installation process of OS and hardware updates (if both were selected) on the target nodes.
If any of the updates requires a restart, servers will be rebooted one at a time, moving cluster roles such as VMs between servers to prevent downtime and guaranteeing business continuity.
Once the process is finished for all the nodes, we can go back to “Updates” to check for the latest update status and/or Update history for previous updates.
It is important to note that the Cluster Aware Updating extension is supported only for Dell EMC Integrated System for Microsoft Azure Stack HCI.
The standalone extension applies to Windows Server HCI and Azure Stack HCI, and continues to provide monitoring, inventory, troubleshooting, and hardware updates with CAU.
New to OMIMSWAC 2.0 is the option to schedule updates during a programmed maintenance window for greater flexibility and control during the update process.
It is important to note that OMIMSWAC Standalone version provides the Cluster Aware Updating feature for the hardware (BIOS, firmware, drivers) in a single reboot, although this process is not integrated with operating system updates. It provides full lifecycle management just for the hardware, not the OS layer.
Another key takeaway is that OMIMSWAC Standalone version fully supports Dell EMC HCI Solutions from Microsoft Windows Server and even certain qualified previous solutions (Dell EMC Storage Spaces Direct Ready Nodes).
Dell Technologies has developed OMIMSWAC to make integrated systems’ lifecycle management a seamless and easy process. It can fully guarantee controlled end-to-end cluster hardware and software update processes during the lifespan of the service.
The Dell EMC OMIMSWAC automated and programmatic approach provides obvious benefits, like mitigating risk caused by human intervention, significantly fewer steps to update clusters, and significantly less focused attention time for IT administrators. In small 4-node cluster deployments, this can mean up to 80% fewer steps and up to 90% less focused attention from an IT operator.
Full details on the benefits of performing these operations automatically through OMIMSWAC versus doing it manually are explained in this white paper.
Thank you for reading this far and stay tuned for more blog updates in this space!
Boost Performance on Dell EMC HCI Solutions for Microsoft Server using Intel Optane Persistent Memory
Tue, 14 Jul 2020 13:09:24 -0000|
Read Time: 0 minutes
Modern IT applications have a broad range of performance requirements. Some of the most demanding applications use Online Transactional Processing (OLTP) database technology. Typical organizations have many mission critical business services reliant on workloads powered by these databases. Examples of such services include online banking in the financial sector and online shopping in the retail sector. If the response time of these systems is slow, customers will likely suffer a poor user experience and may take their business to competitors. Dissatisfied customers may also express their frustration through social media outlets resulting in incalculable damage to a company’s reputation.
The challenge in maintaining an exceptional consumer experience is providing databases with performant infrastructure while also balancing capacity and cost. Traditionally, there have been few cost-effective options that cache database workloads, which would greatly improve end-user response times. Intel Optane persistent memory (Intel Optane PM) offers an innovative path to accelerating database workloads. Intel Optane PM performs almost as well as DRAM, and the data is preserved after a power cycle. We were interested in quantifying these claims in our labs with Dell EMC HCI Solutions for Microsoft Windows Server.
Windows Server HCI running Microsoft Windows Server 2019 provides industry-leading virtual machine performance with Microsoft Hyper-V and Microsoft Storage Spaces Direct technology. The platform supports Non-Volatile Memory Express (NVMe), Intel Optane PM, and Remote Direct Memory Access (RDMA) networking. Windows Server HCI is a fully productized, validated, and supported HCI solution that enables enterprises to modernize their infrastructure for improved application uptime and performance, simplified management and operations, and lower total cost of ownership. AX nodes from Dell EMC, powered by industry-leading PowerEdge server platforms, offer a high-performance, scalable, and secure foundation on which to build a software-defined infrastructure.
In our lab testing, we wanted to observe the impact on performance when Intel Optane PM was added as a caching tier to a Windows Server HCI cluster. We set up two clusters to compare. One cluster was configured as a two-tier storage subsystem with Intel Optane PM in the caching tier and SATA Read-Intensive SSDs in the capacity tier. We inserted 12 x 128 GB Intel Optane PM modules into this cluster for a total of 1.5 TB per node. The other cluster’s storage subsystem was configured as a single-tier of SATA Read-Intensive SSDs. With respect to CPU selection, memory, and Ethernet adapters, the two clusters were configured identically.
Only the Dell EMC AX-640 nodes currently accommodate Intel Optane PM. The clusters were configured as follows:
Without Intel Optane PM
With Intel Optane PM
Number of nodes
2 x Intel 6248 CPU @ 2.50 GHz (3.90 GHz with TurboBoost)
2 x Intel 6248 CPU @ 2.50 GHz (3.90 GHz with TurboBoost)
384 GB RAM
384 GB RAM
10 x 2.5 in. 1.92 TB Intel S4510 RI SATA SSD
10 x 2.5 in. 1.92 TB Intel S4510 RI SATA SSD
Mellanox ConnectX-5 EX Dual Port 100 GbE
Mellanox ConnectX-5 EX Dual Port 100 GbE
12 x 128 GB Intel Optane PM per node
Volumes were created using three-way mirroring for the best balance between performance and resiliency. Three-way mirroring protects data by enabling the cluster to safely tolerate two hardware failures. For example, data on a volume would be successfully preserved even after the simultaneous loss of an entire node and a drive in another node.
Intel Optane PM has two operating modes – Memory Mode and App Direct Mode. Our tests used App Direct Mode. In App Direct Mode, the operating system uses Intel Optane PM as persistent memory distinct from DRAM. This mode enables extremely high performing storage that is byte-addressable-like, memory coherent, and cache coherent. Cache coherence is important because it ensures that data is a uniformly shared resource across all nodes. In the four-node Windows Server HCI cluster, cache coherence ensured that when data was read or written from one node that the same data was available across all nodes.
VMFleet is a storage load generation tool designed to perform I/O and capture performance metrics for Microsoft failover clusters. For the small block test, we used VMFleet to generate 100 percent reads at a 4K block size. The baseline configuration without Intel Optane PM sustained 2,103,412 IOPS at 1.5-millisecond (ms) average read latency. These baseline performance metrics demonstrated outstanding performance. However, OLTP databases target 1 ms or less latency for reads.
Comparatively, the Intel Optane PM cluster demonstrated 43 percent faster IOPS and decreased latency by 53 percent. Overall, this cluster sustained slightly over 3 million IOPS at .7 ms average latency. Benefits include:
When exploring storage responsiveness, testing large block read and write requests is also important. Data warehouses and decision-support systems are examples of workloads that read larger blocks of data. For this testing, we used 512 KB block sizes and sequential reads as part of the VMFleet testing. This test provided insight into the ability of Intel Optane PM cache to improve storage system throughput.
The cluster populated with Intel Optane PM was 109% faster than the baseline system. Our comparisons of 512 KB sequential reads found total throughput of 11 GB/s for the system without Intel Optane PM and 23 GB/s for the system with Intel Optane PM caching. Benefits include:
Overall, the VMFleet tests were impressive. Both Windows Server HCI configurations had 40 SSDs across the four nodes for approximately 76 TB of performant storage. To accelerate the entire cluster required 12 Intel Optane PM 128 GB modules per server for a total of 48 modules across the four nodes. Test results show that both OLTP and data-warehouse type workloads would exhibit significant performance improvements.
Testing 100 percent reads of 4K blocks showed:
Testing 512 KB sequential reads showed:
The configuration presented in this lab testing scenario will not be appropriate for every application. Any Windows Server HCI solution must be properly scoped and sized to meet or exceed the performance and capacity requirements of its intended workloads. Work with your Dell Technologies account team to ensure that your system is correctly configured for today’s business challenges and ready for expansion in the future. To learn more about Microsoft HCI Solutions from Dell Technologies, visit our Info Hub page.
Value Optimized AX-6515 for ROBO Use Cases
Tue, 14 Jul 2020 13:09:24 -0000|
Read Time: 0 minutes
Small offices and remote branch office (ROBO) use cases present special challenges for IT organizations. The issues tend to revolve around how to implement a scalable, resilient, secure, and highly performant platform at an affordable TCO. The infrastructure must be capable enough to efficiently run a highly diversified portfolio of applications and services and yet be simple to deploy, update, and support by a local IT generalist. Dell Technologies and Microsoft help you accelerate business outcomes in these unique ROBO environments with our Dell EMC Solutions for Microsoft Azure Stack HCI.
In this blog post, we share VMFleet results observed in the Dell Technologies labs for our newest AX-6515 two-node configuration – ideal for ROBO environments. Optimized for value, the small but powerful AX-6515 node packs a dense, single-socket 2nd Gen AMD EPYC processor in a 1RU chassis delivering peak performance and excellent TCO. We also included the Dell EMC PowerSwitch S5212F-ON in our testing to provide 25GbE network connectivity for the storage, management, and VM traffic in a small form factor. The Dell EMC Solutions for Azure Stack HCI Deployment Guide was followed to construct the test lab and applies only to infrastructure that is built with validated and certified AX nodes running Microsoft Windows Server 2019 Datacenter from Dell Technologies.
We were quite impressed with the VMFleet results. First, we stressed the cluster’s storage subsystem to its limits using scenarios aimed at identifying maximum IOPS, latency, and throughput. Then, we adjusted the test parameters to be more representative of real-world workloads. The following summary of findings indicated to us that this two-node, AMD-based, all-flash cluster could meet or exceed the performance requirements of workload profiles often found in ROBO environments:
The following diagram illustrates the environment created in the Dell Technologies labs for the VMFleet testing. Ancillary services required for cluster operations such as DNS, Active Directory, and a file server for cluster quorum are not depicted.
Figure 1 Network topology
Table 1 Cluster configuration
Cluster Design Elements
Number of cluster nodes
Cluster node model
Number of network switches for RDMA and TCP/IP traffic
Network switch model
Dell EMC PowerSwitch S5212F-ON
Fully-converged network configuration. RDMA and TCP/IP traffic traversing 2 x 25GbE network connections from each host.
Network switch for OOB management
Dell EMC PowerSwitch S3048-ON
Usable storage capacity
Approximately 12 TB
Table 2 Cluster node resources
Resources per Cluster Node
Single-socket AMD EPYC 7702P 64-Core Processor
256 GB DDR4 RAM
Storage controller for OS
BOSS-S1 adapter card
Physical drives for OS
2 x Intel 240 GB M.2 SATA drives configured as RAID 1
Storage controller for Storage Spaces Direct (S2D)
8 x 1.92 TB Mixed Use KIOXIA SAS SSDs
Mellanox ConnectX-5 Dual Port 10/25GbE SFP28 Adapter
Windows Server 2019 Datacenter
The architectures of Azure Stack HCI solutions are highly opinionated and prescriptive. Each design is extensively tested and validated by Dell Technologies Engineering. Here is a summary of the key quality attributes that define these architectures followed by a section devoted to our performance findings.
We leveraged VMFleet to benchmark the storage subsystem of our 2-node cluster. Many Microsoft customers and partners rely on this tool to help them stress test their Azure Stack HCI clusters. VMFleet consists of a set of PowerShell scripts that deploy virtual machines to a Hyper-V cluster and execute Microsoft’s DiskSpd within those VMs to generate IO. The following table presents the range of VMFleet and DiskSpd parameters used during our testing in the Dell Technologies labs.
Table 3 Test parameters
VMFleet and DiskSpd Parameters
Number of VMs running per node
vCPUs per VM
Memory per VM
VHDX size per VM
VM Operating System
Windows Server 2019
Block sizes (B)
4k – 512k
Thread count (T)
Outstanding IOs (O)
Write percentages (W)
0, 20, 50, 100
IO patterns (P)
We first selected DiskSpd scenarios aimed at identifying the maximum IOPS, latency, and throughput thresholds of the cluster. By pushing the limits of the storage subsystem, we confirmed that the networking, compute, operating systems, and virtualization layer were configured correctly according to our Deployment Guide and Network Integration and Host Network Configuration Options guide. This also ensured that that no misconfiguration occurred during initial deployment that could skew the real-world storage performance results. Our results are depicted in Table 4.
Table 4 Maximums test results
Parameter Values Explained
Block size: 4k
Thread count: 2
Outstanding IO: 32
IO pattern: 100% random read
Read latency: 245 microseconds
CPU utilization: 48%
Block size: 4k
Thread count: 2
Outstanding IO: 32
IO pattern: 100% random write
Write latency: 4 milliseconds
CPU utilization: 25%
Block size: 512k
Thread count: 2
Outstanding IO: 8
IO pattern: 100% sequential read
Throughput: 12 GB/s
Block size: 512k
Thread count: 2
Outstanding IO: 8
IO pattern: 100% sequential write
Throughput: 6 GB/s
We then stressed the storage subsystem using IO patterns more reflective of the types of workloads found in a ROBO use case. These applications are typically characterized by smaller block sizes, random I/O patterns, and a variety of read/write ratios. Examples include general enterprise and small office LOB applications and OLTP workloads. The testing results in Figure 2 below indicate that the cluster has the potential to accelerate OLTP workloads and make enterprise applications highly responsive to end users.
Figure 2 Performance results with smaller block sizes
Other services like backups, streaming video, and large dataset scans have larger block sizes and sequential IO patterns. With these workloads, throughput becomes the key performance indicator to analyze. The results shown in the following graph indicate an impressive sustained throughput that can greatly benefit this category of IT services and applications.
Figure 3 Performance results with larger block sizes
Customers could make modifications to the lab configuration to accommodate different requirements in the ROBO use case. For example, Dell Technologies completely supports a dual-link full mesh topology for Azure Stack HCI. This non-converged storage switchless topology eliminates the need for network switches for storage communications and enables you to use existing infrastructure for management and VM traffic. This approach will result in similar or improved performance metrics versus those mentioned in this blog due to the 2 x 25 GB direct connections between the nodes and the isolation of the storage traffic on these dedicated connections.
Figure 4 Two-node back-to-back architecture option
There may be situations in ROBO scenarios where there are no IT generalists near the site to address hardware failures. When a drive or entire node fails, it may take days or weeks before someone can service the nodes and return the cluster to full functionality. Consider nested resiliency instead of two-way mirroring to handle multiple failures on a two-node cluster. Inspired by RAID 5 + 1 technology, workloads remain online and accessible even in the following circumstances:
Figure 5 Nested resiliency option
Be aware that there is a capacity efficiency cost when using nested resiliency. Two-way mirroring is 50% efficient, meaning 1 TB of data takes up 2 TB of physical storage capacity. Depending on the type of nested resiliency you choose to configure, capacity efficiency can range between 25% - 40%. Therefore, ensure you have an adequate amount of raw storage capacity if you intend to use this technology. Performance is also going to be affected when using nested resiliency – especially on workloads with a higher percentage of write IO since more copies of the data need to be maintained on the cluster.
If you need greater flexibility in cluster resources, Dell Technologies offers Azure Stack HCI configurations to meet any workload profile and business requirement. The table below shows the different resource options available for each AX node. To find more detailed specifications about these configurations, please review the detailed product specifications on our product page.
Table 5 Azure Stack HCI configuration options
Visit our website for more details on Dell EMC Solutions for Azure Stack HCI.
Dell EMC Solutions for Azure Stack HCI Furthers Customer Value
Mon, 23 Mar 2020 22:39:10 -0000|
Read Time: 0 minutes
As customers address the upgrade cycle of retiring Microsoft Windows Server 2008 into software defined infrastructures using Windows Server 2019, the core tenets of hyperconverged infrastructure (HCI) and hybrid cloud enablement continue to be desired goals. Many customers, however, are unsure how to best leverage their investments in Windows Server to modernize their datacenters to take advantage of software defined infrastructure.
At Dell Technologies, we have leadership positions in converged, hyperconverged, and cloud infrastructures covering several platforms, including being a founding launch partner with Microsoft’s Azure Stack HCI solution. Built over three decades of partnership with Microsoft, we bring the insights and expertise to help our customers with their IT transformation utilizing software defined features of Windows Server 2019, the foundational platform for Azure Stack HCI.
Built on globally available and supported Storage Spaces Direct (S2D) Ready Nodes, Dell EMC offers a wide range of Azure Stack HCI Solutions that provide an excellent value proposition for customers who have standardized on Microsoft Hyper-V and looking to modernize IT infrastructure while utilizing their existing investments and expertise in Windows Server.
As we head to Microsoft’s largest customer event – Microsoft Ignite 2019 – we are delighted to share some new enhancements and offerings to our Azure Stack HCI solution portfolio.
Simplifying Managing Azure Stack HCI via Windows Admin Center (WAC)
With a goal of simplifying Azure Stack HCI management, we have integrated monitoring of S2D Ready Nodes into the Windows Admin Center (WAC) console. The Dell EMC OpenManage Extension for WAC allows our customers to manage Azure Stack HCI clusters from a single pane of glass. The current integration provides health monitoring, hardware inventory, and firmware compliance reporting of S2D Ready Nodes, the core building block of our Azure Stack HCI solution. By using this extension, infrastructure administrators can monitor all their clusters in real time and check if the nodes are compliant to Dell EMC recommended firmware and driver versions. Customers wanting to leverage Azure public cloud to either extend or protect their on-prem applications can do so within the WAC console to utilize services such as Azure Back up, Azure Site Recovery, Azure Monitor, etc.
Here is what Greg Altman, IT Infrastructure Manager at Swiff-Train and one our early customers had to say about our OpenManage integration with WAC:
"The Dell EMC OpenManage Integration with Microsoft Windows Admin Center gives us full visibility to Dell EMC Solutions for Microsoft Azure Stack HCI, enabling us to more easily respond to situations before they become critical. With the new OpenManage integration, we can also manage Microsoft Azure Stack HCI from anywhere, even simultaneously managing our clusters located in different cities."
New HCI Node optimized for Edge and ROBO Use Cases
Customers looking at modernizing infrastructure at edge, remote or small office locations now have an option of utilizing the new Dell EMC R440 S2D Ready Node which provides both hybrid and all-flash options. A 2-node Azure Stack HCI cluster provides a great solution for such use cases that need limited hardware infrastructure, yet superior performance and availability and ease of remote management.
The dual socket R440 S2D Ready Node is shallower (depth of 27.26 in) than a typical rack server, comes with up to 8 or 10 2.5” drive configurations providing up to 76.6TB of all-flash capacity in a single 1U node.
The table below summarizes our S2D Ready Node portfolio.
R440 S2D RN
R640 S2D RN
R740xd S2D RN
R740xd2 S2D RN
Edge/ROBO and space (depth) constrained locations
Density optimized node for applications needing balance of high-performance storage and compute
Capacity and performance optimized node for applications needing balance of compute and storage
Capacity optimized node for data intensive applications and use cases such as backup and archive
Hybrid & All-Flash
Hybrid, All-Flash, All-NVMe including Intel Optane DC Persistent Memory
Hybrid, All-Flash, and All-NVMe
Hybrid with SSDs and 3.5” HDDs
For detailed node specifications, please refer to our website.
Stepping up the Performance Capabilities
With applications and growing data analysis needs increasingly driving the lower latency and higher capacity requirements, it’s imperative the underlying infrastructure does not create performance bottlenecks. The latest refresh of our solution includes several updates to scale infrastructure performance:
As we drove new hardware enhancements to our Azure Stack HCI portfolio, we also put a configuration to test the performance we can expect from a representative configuration. With just a four node Azure Stack HCI cluster with R640 S2D Ready Nodes configure all NVMe drives and 100Gb Ethernet, we observed:
Yes, you got it right. Not only the solution is compact, easy to manage but also provides a tremendous performance capability.
Read our detailed blog for more information on our lab performance test results.
Overall, we are very excited to bring so many new capabilities to our customers. We invite you to come meet us at Microsoft Ignite 2019 at Booth 1547, talk to Dell EMC experts and see live demos. Besides the show floor, Dell EMC experts will also be available at Hyatt Regency Hotel, Level 3, Discovery 43 Suite for detailed conversations. Register here for time with our experts.
Evaluating Performance Capabilities of Dell EMC Solutions for Azure Stack HCI
Mon, 23 Mar 2020 22:39:11 -0000|
Read Time: 0 minutes
Just the facts:
User experience is everything. In today’s world, fast and intuitive applications are a necessity, and anything less might be labeled slow and not very useful. Once an application is labeled slow, it’s hard to change that impression with end users. Thus, architecting a system for performance is a key consideration in ensuring a good application experience.
In this blog, we explore a Dell EMC Storage Spaces Direct solution that delivered amazing performance in our internal tests. Storage Spaces Direct is part of Azure Stack HCI and enables customers to use industry-standard servers with locally attached drives to create high-performance and high-availability storage. Azure Stack HCI enables the IT organization to run virtual machines with cloud services on-premises. Benefits include:
Database and other storage-intensive applications could benefit from the faster NVMe drives. NVMe is an open logical device specification that has been designed for low latency and internal parallelism of solid-state storage devices. The result is a significant boost in storage performance because data can be accessed faster and with less I/O overhead.
In our labs, we created a Storage Spaces Direct performance cluster consisting of four Dell EMC PowerEdge R640 nodes. Each storage node had two Intel 6248 Cascade Lake processors, ten P4510 Intel NVMe drives, and one Mellanox CX5 dual-port 100 GbE adapter. Networking between the nodes consisted of a Dell EMC S5232 switch that supports up to thirty-two 100 GbE ports. Our goal was to drive simplicity in the configuration while showing performance value.
We used Storage Spaces Direct three-way mirroring because this configuration offers the greatest performance and protection. Protection does have a cost in terms of capacity. The capacity efficiency of a three-way mirror is 33 percent, meaning 3 TB equates to 1 TB of usable storage space. The data protection benefit with three-way mirroring is that the storage cluster can safely tolerate at least two hardware problems—for example, the loss of a drive and server at the same time. The following diagram is a simple representation of the four-node performance configuration of the Storage Spaces Direct cluster.
Figure 1: Storage Spaces Direct Cluster with four PowerEdge R640 nodes
We ran VM Fleet on the storage cluster to test performance, and the results were impressive! Here is the first test configuration:
Thus, this VM Fleet test used 4 KB block sizes, 100 percent reads, and a random-access pattern. This Storage Spaces Direct configuration achieved 2,953,095 IOPS with an average read latency of 242 microseconds. A microsecond is equal to one-millionth of a second. This is the kind of performance that can really accelerate online transaction processing (OLTP) workloads and make enterprise applications highly responsive to the end users.
We also tested a 100 percent random-write workload on the storage cluster. All the VM Fleet configuration settings remained the same, except the write ratio was 100. With 100 percent writes, the storage cluster achieved 818,982 IOPS at an average write latency of 4 milliseconds. We could have been less aggressive in our internal tests and delivered even lower write latency, but the goal was to push the storage cluster in terms of performance. Both these tests were done internally in our Dell EMC labs, and it’s important to note that results will vary.
Figure 2: Summary of internal test findings for 100 percent read and write workloads for IOPS and latency
Some applications, such as business intelligence and decision support systems, and some analytical workloads are more dependent on throughput. Throughput is defined by the amount of data that is delivered over a fixed period. The greater the throughput the more data that can be read and the faster the analysis or report. Our labs used the following VM Fleet configuration to test throughput:
The throughput test configuration uses larger blocks at 512 KB, 100 percent reads, and a sequential read pattern that is like scanning large datasets. The storage cluster sustained 63 gigabytes per second (GB/s). This throughput could enable faster analytics for the business and provide the capability to make timely decisions.
We also ran the same test with 100 percent writes, which simulates a data load activity such as streaming data from an IoT gateway to an internal database. In this test case, the storage cluster sustained a throughput of 9 GB/s for writes. Both the read and write throughput tests show the strength of this all-NVMe configuration from Dell EMC.
Figure 3: Summary of internal test findings for 100 percent read and write workloads for throughput
If performance is what you need, then Dell EMC can use NVMe technology to accelerate your applications. But flexibility is another factor that can be equally important. Not every application requires high IOPS and very low latencies. Dell EMC offers an expanded portfolio of Storage Spaces Direct nodes that can meet most any business requirements. A great resource for reviewing the Dell EMC Storage Spaces Direct options is the Azure Stack HCI certification pages. The following table summarizes all the Dell EMC options but doesn’t contain CPU, RAM, and other details that can be found on the certification pages.
Intel Optane SSD Cache + SDD
NVMe + HDD
NVMe (AIC) + HDD
SDD + HDD
Start with a minimal configuration using the R440 Ready Nodes, which can have up to 44 cores, 1 TB of RAM, and 19.2 TB of storage. Or go big with the R740xd2 hybrid with up to 44 cores, 384 GB of RAM, and 240 TB of storage capacity. The range of options provides you with the flexibility to configure a Storage Spaces Direct solution to meet your business needs.
The Dell EMC Ready Nodes have been configured to work with Windows 2019, so they are future-ready. For example, the Ready Nodes integrate with Windows Admin Center, so you can tier storage, implement resiliency, provision VMs and storage, configure networking, and monitor health and performance, all with just a few clicks. With your Windows Server 2019 Datacenter licenses, no separate hypervisor license is needed for VMs. You can create unlimited VMs, achieve high-availability clusters, and secure your tenants or applications with shielded VMs.
Dell EMC Storage Spaces Direct nodes have been designed to make storage in your Azure Stack HCI easy. If you are interested in learning more, see Dell EMC Cloud for Microsoft Azure Stack HCI and contact a Dell EMC expert.