The cluster configuration that we deployed in our lab included hardware components, BIOS, firmware, and drivers that our engineering teams deliberately selected and validated to balance the requirements of the following design goals:
The Dell Integrated System for Microsoft Azure Stack HCI consisted of three AX-7525 nodes from Dell, two Dell PowerSwitch S5248F-ON Top of Rack (ToR) network switches, and one Dell PowerSwitch N3248 network switch for Out-Of-Band (OOB) management. The AX nodes were factory preinstalled with Azure Stack HCI, version 21H2, and updated to 22H2 using Cluster-Aware Updating before we attempted any performance or functional testing.
We setup a three-node cluster and configured the cluster volumes with three-way mirroring resiliency to allow the cluster to safely tolerate at least two hardware issues (drive or server) at a time. In addition, three-way mirroring maximized database performance compared with the dual-parity resiliency type. A single-tier, all-flash NVMe storage subsystem significantly contributed to consistently high performance and operational efficiency.
We initially populated the cluster with 144 CPU cores and 1.5 TB of RAM. Dell OpenManage Integration with Microsoft Windows Admin Center provided features that simplified scalability. For example, dynamic CPU core management allowed us to adjust active processor cores to balance workload performance with host-based subscription costs. Also, the extension provided recommendations when preparing to expand the storage subsystem to maintain the required number of drives for storage capacity and performance. To automate the process of adding more cluster nodes and to keep the cluster symmetrical and adhere to Dell best practices, we relied on the Expand Cluster feature.
Note: Azure Stack HCI, version 21H2, introduced dynamic processor compatibility mode. However, Dell Technologies continues to strongly recommend that all nodes in an Azure Stack HCI cluster be homogenous – identical CPUs, memory, disk drives, and Ethernet adapter symmetry. There should also be identical BIOS, firmware, and driver revisions on the components selected from the Support Matrix for Microsoft HCI Solutions.
The following tables detail the cluster configuration and AX-7525 node specifications that we built in the lab.
Table 1. a7525r06c01 cluster configuration
Cluster design elements | Description |
Cluster node model | AX-7525 |
Number of cluster nodes | 3 |
Network switch model | Dell PowerSwitch S5248F-ON featuring 48 x 25 GbE SFP28 + 2 x 200 GbE QSFP23-DD + 4 x 100 GbE QSFP28 ports |
Number of ToR network switches | 2 |
Network topology | Non-Converged Network Configuration |
Volume resiliency | Three-way mirror |
Usable storage capacity | Approximately 20 TB |
Table 2. Cluster node specifications
Resources per cluster node | Description |
CPU | Dual-socket AMD EPYC 7413 2.65 GHz, 24C/48T Processor |
Memory | 512 GB |
Storage controller for operating system | BOSS-S1 controller card |
Physical drives for operating system | 2 x M.2 240 GB SATA drives configured as RAID 1 |
Physical drives for Storage Spaces Direct | 8 x 3.2 TB NVMe Mixed Use |
Network adapter for management and VM traffic | Intel X710 Dual Port 10 GbE SFP+ OCP |
Network adapter for storage traffic | 1 x Mellanox ConnectX-5 Dual Port 10/25GbE SFP28 Adapter |
Operating system | Microsoft Azure Stack HCI, version 22H2 |
Solution update catalog version for deployment | December 2022 |
We followed the End-to-End Deployment Guide to prepare and deploy the integrated system. The guide provided the PowerShell commands necessary to automate significant portions of the deployment. We also followed the post-deployment procedures, such as Azure registration, creating virtual disks, and managing and monitoring with Windows Admin Center.
The following figures depict the integrated system’s networking configuration. We employed a scalable, non-converged network topology and the S5248F-ON ToR switches for their high port count density. This ensured we could scale to the maximum cluster size of 16 AX nodes as database and application demands grew. Management and VM traffic traversed a dual port OCP configured in Hyper-V as a Switch Embedded Team (SET). Storage traffic passed through a dedicated dual port adapter with no teaming configured in Hyper-V. Storage networking was configured to use RoCE for RDMA.
Figure 3. Physical network connectivity
Figure 4. Hyper-V virtual networking configuration
The End-to-End Deployment Guide also included all the steps required to implement the scalable, non-converged network topology. The guide contained the PowerShell commands required to configure the networking on each AX node. We leveraged the sample Switch Configurations - RoCE Only (Mellanox Cards) for the S5248F-ON network switches.
The following table lists the tools that we used to perform the deployment, life cycle management, and monitoring of the infrastructure layer.
Table 3. Inventory of tools at the infrastructure layer
Tool | Version |
Microsoft Windows Admin Center | 2211 |
Dell OpenManage Integration with Microsoft Windows Admin Center | 3.0 |
Azure Stack HCI Insights in Azure Monitor | N/A |
Grafana | Server version: 9.0.6 |
Azure Arc Connected Machine Agent | 1.24.02147.651 |
PowerShell | PowerShell Core |
After initial deployment of a7525r06c01, we added it to Microsoft Windows Admin Center. We used Windows Admin Center throughout the testing scenarios to perform basic health checks and monitoring at the physical infrastructure and VM levels. We also used it for some troubleshooting and maintenance tasks.
Figure 5. a7525r06c01 in Windows Admin Center dashboard
For performance monitoring during the load testing at this layer, we leveraged Grafana.
Figure 6. a7525r06c01 added to Grafana dashboard with no workload
We installed the Dell OpenManage Integration extension in Windows Admin Center using the instructions in the End-to-End Deployment Guide. Then, we used the stand-alone extension to verify cluster hardware health. Figure 7 shows that multiple updates were needed in the Hardware Compliance tab before a7525r06c01 was considered fully compliant.
Figure 7. Overall health status of a7525r06c01
The AX-7525 nodes were shipped from the factory with Azure Stack HCI operating system, version 21H2 pre-installed. Before proceeding any further with our lab efforts, we wanted a7525r06c01 running the latest feature, quality, and security updates. We also wanted the cluster hardware to be fully compliant.
First, we used Microsoft’s Updates extension in Windows Admin Center to apply the 22H2 feature update. Then, we used the same extension to orchestrate the installation of operating system, BIOS, firmware, and driver updates in a single workflow. The hardware updates were integrated into the workflow using a snap-in installed with the Dell OpenManage Integration. We documented our experience running full stack life cycle management using Cluster-Aware Updating with no interruption to the SQL databases in the functional testing section of this white paper.
Figure 8. Cluster-Aware Updating feature update
Figure 9. Update checklist for hardware updates in Updates extension