Home > Integrated Products > Microsoft HCI Solutions from Dell Technologies > White Papers > Building a Hybrid Database-as-a-Service Platform with Azure Stack HCI > Infrastructure setup
The cluster configuration that we deployed in our lab included hardware components, BIOS, firmware, and drivers that our engineering teams deliberately selected and validated to balance the requirements of the following design goals:
The Dell Integrated System for Microsoft Azure Stack HCI consisted of three AX-7525 nodes from Dell, two Dell PowerSwitch S5248F-ON Top of Rack (ToR) network switches, and one Dell PowerSwitch N3248 network switch for Out-Of-Band (OOB) management. The AX nodes were factory preinstalled with Azure Stack HCI, version 21H2, and updated to 22H2 using Cluster-Aware Updating before we attempted any performance or functional testing.
We setup a three-node cluster and configured the cluster volumes with three-way mirroring resiliency to allow the cluster to safely tolerate at least two hardware issues (drive or server) at a time. In addition, three-way mirroring maximized database performance compared with the dual-parity resiliency type. A single-tier, all-flash NVMe storage subsystem significantly contributed to consistently high performance and operational efficiency.
We initially populated the cluster with 144 CPU cores and 1.5 TB of RAM. Dell OpenManage Integration with Microsoft Windows Admin Center provided features that simplified scalability. For example, dynamic CPU core management allowed us to adjust active processor cores to balance workload performance with host-based subscription costs. Also, the extension provided recommendations when preparing to expand the storage subsystem to maintain the required number of drives for storage capacity and performance. To automate the process of adding more cluster nodes and to keep the cluster symmetrical and adhere to Dell best practices, we relied on the Expand Cluster feature.
Note: Azure Stack HCI, version 21H2, introduced dynamic processor compatibility mode. However, Dell Technologies continues to strongly recommend that all nodes in an Azure Stack HCI cluster be homogenous – identical CPUs, memory, disk drives, and Ethernet adapter symmetry. There should also be identical BIOS, firmware, and driver revisions on the components selected from the Support Matrix for Microsoft HCI Solutions.
The following tables detail the cluster configuration and AX-7525 node specifications that we built in the lab.
Cluster design elements | Description |
Cluster node model | AX-7525 |
Number of cluster nodes | 3 |
Network switch model | Dell PowerSwitch S5248F-ON featuring 48 x 25 GbE SFP28 + 2 x 200 GbE QSFP23-DD + 4 x 100 GbE QSFP28 ports |
Number of ToR network switches | 2 |
Network topology | Non-Converged Network Configuration |
Volume resiliency | Three-way mirror |
Usable storage capacity | Approximately 20 TB |
Resources per cluster node | Description |
CPU | Dual-socket AMD EPYC 7413 2.65 GHz, 24C/48T Processor |
Memory | 512 GB |
Storage controller for operating system | BOSS-S1 controller card |
Physical drives for operating system | 2 x M.2 240 GB SATA drives configured as RAID 1 |
Physical drives for Storage Spaces Direct | 8 x 3.2 TB NVMe Mixed Use |
Network adapter for management and VM traffic | Intel X710 Dual Port 10 GbE SFP+ OCP |
Network adapter for storage traffic | 1 x Mellanox ConnectX-5 Dual Port 10/25GbE SFP28 Adapter |
Operating system | Microsoft Azure Stack HCI, version 22H2 |
Solution update catalog version for deployment | December 2022 |
We followed the End-to-End Deployment Guide to prepare and deploy the integrated system. The guide provided the PowerShell commands necessary to automate significant portions of the deployment. We also followed the post-deployment procedures, such as Azure registration, creating virtual disks, and managing and monitoring with Windows Admin Center.
The following figures depict the integrated system’s networking configuration. We employed a scalable, non-converged network topology and the S5248F-ON ToR switches for their high port count density. This ensured we could scale to the maximum cluster size of 16 AX nodes as database and application demands grew. Management and VM traffic traversed a dual port OCP configured in Hyper-V as a Switch Embedded Team (SET). Storage traffic passed through a dedicated dual port adapter with no teaming configured in Hyper-V. Storage networking was configured to use RoCE for RDMA.
The End-to-End Deployment Guide also included all the steps required to implement the scalable, non-converged network topology. The guide contained the PowerShell commands required to configure the networking on each AX node. We leveraged the sample Switch Configurations - RoCE Only (Mellanox Cards) for the S5248F-ON network switches.
The following table lists the tools that we used to perform the deployment, life cycle management, and monitoring of the infrastructure layer.
Tool | Version |
Microsoft Windows Admin Center | 2211 |
Dell OpenManage Integration with Microsoft Windows Admin Center | 3.0 |
Azure Stack HCI Insights in Azure Monitor | N/A |
Grafana | Server version: 9.0.6 |
Azure Arc Connected Machine Agent | 1.24.02147.651 |
PowerShell | PowerShell Core |
After initial deployment of a7525r06c01, we added it to Microsoft Windows Admin Center. We used Windows Admin Center throughout the testing scenarios to perform basic health checks and monitoring at the physical infrastructure and VM levels. We also used it for some troubleshooting and maintenance tasks.
We installed the Dell OpenManage Integration extension in Windows Admin Center using the instructions in the End-to-End Deployment Guide. Then, we used the stand-alone extension to verify cluster hardware health. Figure 7 shows that multiple updates were needed in the Hardware Compliance tab before a7525r06c01 was considered fully compliant.
The AX-7525 nodes were shipped from the factory with Azure Stack HCI operating system, version 21H2 pre-installed. Before proceeding any further with our lab efforts, we wanted a7525r06c01 running the latest feature, quality, and security updates. We also wanted the cluster hardware to be fully compliant.
First, we used Microsoft’s Updates extension in Windows Admin Center to apply the 22H2 feature update. Then, we used the same extension to orchestrate the installation of operating system, BIOS, firmware, and driver updates in a single workflow. The hardware updates were integrated into the workflow using a snap-in installed with the Dell OpenManage Integration. We documented our experience running full stack life cycle management using Cluster-Aware Updating with no interruption to the SQL databases in the functional testing section of this white paper.