The cluster configuration that we deployed for the DBaaS platform included hardware components, BIOS, firmware, and drivers that our engineering teams deliberately selected and validated to balance the requirements of the following design goals:
The integrated system consisted of four AX-740xd nodes from Dell Technologies, two Dell EMC PowerSwitch S5248F-ON Top of Rack (ToR) network switches, and one Dell EMC PowerSwitch S3048-ON for out-of-band (OOB) management. Our engineering organization validated all AX nodes and network topologies. The AX nodes were factory pre-installed with Azure Stack HCI, version 20H2.
We set up a four-node cluster for the best possible resiliency, allowing us to sustain a failure of two entire nodes without impacting the running workloads. To ensure that we could use larger VM sizes to deploy the AKS-HCI workload clusters, we populated the cluster with 192 CPU cores and 1.5 TB of RAM. Larger AKS-HCI workload clusters allowed us to provision more Azure Arc-enabled SQL Managed Instances and Microsoft SQL databases for functional and performance testing.
Note: Azure Stack HCI, version 21H2, introduces dynamic processor compatibility mode. However, Dell Technologies continues to strongly recommend that all nodes in an Azure Stack HCI cluster be homogenous – identical CPUs, memory, disk drives, and Ethernet adapter symmetry. There should also be identical BIOS, firmware, and driver revisions on the components that are provided by the Azure Stack HCI solutions catalog from Dell Technologies.
For the storage subsystem, we selected a single-tier, all-flash SSD configuration because of its high performance and ease of configuration and maintenance. We chose the three-way mirror resiliency type for the volumes to maximize database performance compared with parity options in Storage Spaces Direct. The following tables detail the cluster configuration and AX-740xd node specifications that we built in the lab.
Cluster design elements
Cluster node model
Number of cluster nodes
Network switch model
Dell EMC PowerSwitch S5248F-ON featuring 48 x 25 GbE SFP28 + 2 x 200 GbE QSFP23-DD + 4 x 100 GbE QSFP28 ports
Number of ToR network switches
Usable storage capacity
Approximately 16 TB
Resources per cluster node
Dual-socket Intel Xeon Gold 6248R 3.0G, 24C/48T Processor
Storage controller for operating system
BOSS-S1 controller card
Physical drives for operating system
2 x M.2 240 GB SATA drives configured as RAID 1
Storage controller for Storage Spaces Direct
HBA330 Controller Adapter, Low Profile
Physical drives for Storage Spaces Direct
8 x 1.92 TB SSD SAS Mixed Use
Network adapter for management and VM traffic
Intel X710 Dual Port 10 GbE SFP+ rNDC
Network adapter for storage traffic
1 x QLogic FastLinQ 41262 Dual Port 10/25GbE SFP28 Adapter
Microsoft Azure Stack HCI, version 20H2
Solution update catalog version for deployment
We followed the to prepare and deploy the integrated system. The guide provides the PowerShell commands necessary to automate significant portions of the deployment. We followed the post-deployment procedures, such as Azure onboarding for Azure Stack HCI, version 20H2, creating virtual disks, and managing and monitoring with Windows Admin Center, in the .
The following figures depict the integrated system’s networking configuration. We employed a scalable, nonconverged network topology and the S5248F-ON ToR switches for their high port count density. This ensured we could scale to the maximum cluster size of 16 AX nodes as database and application demands grew. Management and VM traffic traversed a dual port rNDC configured in Hyper-V as a Switch Embedded Team (SET). Storage traffic passed through a dedicated dual port adapter with no teaming in Hyper-V. Storage networking was configured to use iWARP for RDMA.
Note: In the test configuration, we found that iWARP was a better option when compared to RoCE, because iWARP did not require any additional configuration steps on the ToR switches. To enable the functionality on the AX nodes, iWARP only required BIOS and driver setting changes, which are noted in the HCI Deployment Guide.
We followed the to implement this scalable, nonconverged network topology. This guide provides the PowerShell commands that are required to configure the networking on each AX node. We leveraged the sample switch configurations listed in the for the S5248F-ON switch configurations.
The following table lists the tools that we used to perform the deployment, life cycle management, and monitoring of the infrastructure layer.
Microsoft Windows Admin Center
Dell EMC OpenManage Integration with Microsoft Windows Admin Center
5.1 or Core
After initial deployment of ASHCLUSTER, we added it to Microsoft Windows Admin Center. We used Windows Admin Center throughout the testing scenarios to monitor processor, memory, network, and storage performance and capacity at the physical infrastructure and VM levels. We also used it for some troubleshooting and maintenance tasks. The following figure shows the integrated system added to Windows Admin Center.
We installed the OpenManage Integration extension in Windows Admin Center using the instructions in the Operations Guide. We used the stand-alone extension to verify the health of the ASHCLUSTER hardware before we performed administrative tasks and to prepare for cluster expansion. To orchestrate operating system, BIOS, firmware, and driver updates in a single workflow with no interruption to running workloads, the extension also installed a snap-in to the Microsoft Cluster-Aware Updating extension. The operational tasks are covered in more detail in the testing scenarios section. The following figure shows the first step in the 1-click full stack life cycle management using Cluster-Aware Updating workflow.
Note: We checked the OpenManage Integration with Microsoft Windows Admin Center v2.1 compatibility matrix to ensure we were using the correct versions of all supported software. For example, we had to update the Microsoft Failover Cluster Tool Extension to the 1.128.0.nupkg release to ensure that Cluster-Aware Updating would function as expected.