The Evolution of Azure Stack HCI Lifecycle Management
Wed, 24 Apr 2024 15:39:15 -0000
|Read Time: 0 minutes
Today, Dell Technologies announced the general availability of Dell APEX Cloud Platform for Microsoft Azure. This on-premises, turnkey infrastructure platform is collaboratively engineered with Microsoft to optimize the Azure hybrid cloud experience.
It is the first offer in Premier Solutions for Microsoft Azure Stack HCI, a new category in the Azure Stack HCI catalog reserved for key partners with the greatest levels of engagement with Microsoft and deepest integrations into familiar Microsoft management tools.
The secret sauce
Dell APEX Cloud Platform for Microsoft Azure comes bundled with fully automated management and orchestration, delivered by Dell APEX Cloud Platform Foundation Software. This software runs in a virtual appliance on each cluster and functions as the brains of the solution stack. The Cloud Platform Manager VM communicates with the underlying infrastructure and injects automation workflows into Microsoft Windows Admin Center via the Dell APEX Cloud Platform extension, as depicted in the following diagram.
Features that deliver breakthrough operational efficiency from Day 1 through Day 2/N include:
- Deployment and cluster creation automation – Fastest path to Azure hybrid cloud providing an 88% reduction in steps versus a manual deployment approach.
- Physical hardware views – Intuitive user interface for rapid identification of MC node components and cluster health.
- Integrations with Dell ProSupport – Accelerates time to issue resolution with log collection, remote support, and phone home capabilities.
- Intrinsic infrastructure security management – Toggle Dell Infrastructure Lock to prevent unauthorized changes to configuration settings and to block updates to the platform. Secured-core server establishes a hardware root of trust and provides firmware protection and virtualization-based security.
- End-to-end cluster expansion – Scale-out a cluster in a highly efficient and fully automated manner using a guided wizard-driven workflow.
In this blog, we will focus on one of the most compelling and highly anticipated features of Dell APEX Cloud Platform Foundation Software – next generation full stack lifecycle management (LCM).
Our latest approach to LCM keeps Dell APEX Cloud Platform for Microsoft Azure operating in a Continuously Validated State (CVS) – advancing from one Known Good State (KGS) to the next inclusive of hardware, operating system, and systems management software. We have dramatically accelerated time to value with our latest approach to LCM, providing near instantaneous availability of new Microsoft updates within just four hours of being released.
The following graphic depicts the journey of an update from development to installation.
History lesson
Dell Technologies is no stranger to efficiently applying updates to Azure Stack HCI clusters, having done so using a fully automated, cluster-aware approach with no impact to running workloads since 2019.
We first introduced this automation in our Dell OpenManage Integration with Microsoft Windows Admin Center v1.1. At that time, we provided the ability to generate a compliance report within our standalone extension that compared the currently running BIOS, firmware, and driver versions with an engineering-validated solution baseline catalog. Simply choose between targeting an online catalog or creating an offline catalog using Dell Repository Manager, and then our standalone extension would orchestrate the updates using Cluster-Aware Updating.
Version 2.0 of our OpenManage Integration extension went a step further to deliver our first foray into full stack cluster-aware updating through a snap-in developed for Microsoft’s Updates extension.
Using this snap-in, Azure Stack HCI operating system updates and Dell hardware updates (i.e., BIOS, firmware, and drivers) were applied using a single, consolidated workflow. This workflow only required one reboot per cluster node and was completely non-disruptive to running workloads. Once again, IT administrators could view a compliance report and select an online or DRM-created offline catalog for the Dell updates.
Maintaining a Continuously Validated State
We’ve developed an entirely new Windows Admin Center extension with integrated Dell APEX Cloud Platform Foundation Software workflow automation. We continue to build on the pedigree we’ve established over the last four years with our OpenManage Integration extension, improving further by now incorporating proven and market-leading intellectual property (IP) from our other hyperconverged infrastructure (HCI) and software defined storage (SDS) offerings. Some of this innovative IP is derived from our highly successful VxRail HCI System software and results in a new standard for lifecycle management in a turnkey infrastructure platform.
When freshly deployed, Dell APEX Cloud Platform for Microsoft Azure runs at peak performance and resiliency to support your current workloads. The platform also comes secure by default with the following protection:
- BIOS and operating system settings are configured correctly to enable secured-core server. Secured-core server establishes a hardware backed root of trust, provides defense against firmware level attacks, and enables virtualization-based security.
- Data-at-rest encryption is enabled on all volumes using BitLocker.
- Microsoft Defender Antivirus is built into Azure Stack HCI and provides real-time always-on antivirus protection with automatic definition updates.
- Azure Stack HCI has more than 200 security settings enabled out-of-the-box. These settings provide a consistent security baseline. For example, security posture is improved by disabling legacy protocols and ciphers.
- Windows Defender Application Control (WDAC) is a software-based security layer that reduces the attack surface by enforcing an explicit list of software that is allowed to run. Dell APEX Cloud Platform for Microsoft Azure comes with WDAC enabled and enforced by default.
This pristine operating environment is known as the platform’s current Known Good State (KGS). Rest assured that the entire platform is running in a condition that is collaboratively validated by Dell and Microsoft engineering. To maintain the robust default security posture and optimal performance and resiliency, you need to keep the platform in a Continuously Validated State (CVS). Comprehensively advancing the end-to-end platform from one KGS to the next is accomplished with zero interruption to running workloads. The following graphic shows an example of a quarterly update that includes multiple software and hardware update components.
(Note: The release versions in this graphic are examples only and do not align with any official Dell APEX Cloud Platform for Microsoft Azure planned releases.)
Release terminology
The following table summarizes the different platform components that must be routinely updated to be compliant with the current or target KGS.
Component | Provider | Description | Example versioning |
Azure Stack HCI Solution | Microsoft | This contains OS quality and security updates, feature updates, emergency patches, and the Azure Stack HCI supplemental package | 10.2306.1.11 |
Dell APEX Cloud Platform Foundation Software | Dell Technologies | All software and services running inside the Cloud Platform Manager virtual machine | 01.00.00.00 |
Solution Builder Extension (SBE) | Dell Technologies | Package that contains all hardware updates for BIOS, iDRAC, firmware and drivers | 4.0.2307.1 |
The Azure Stack HCI Solution component follows the Modern Lifecycle policy, which defines the products and services that are continuously serviced and supported. To keep your Azure Stack HCI service in a supported state, you have up to six months to install updates. Dell and Microsoft recommend installing all updates as they are released to capitalize on the rapid pace of innovation and inclusion of new features. To learn more, see Azure Stack HCI release information.
Dell and Microsoft release the following types of updates for this platform:
Update type | Description | Typical cadence |
Baseline updates | Baseline updates include new features and improvements. They typically require host system reboots and might take longer. | Quarterly |
Patch Updates | Patch updates primarily contain quality and reliability improvements. They might include OS LCUs or hot patches. Some patches require host system reboots, while others don't. To fix critical or security issues, patches might be released sooner than monthly. | Monthly |
Hotfix | Hotfixes address blocking issues that could prevent regular patch or baseline updates. | On-demand |
Microsoft Azure and Dell update sites are periodically queried to discover applicable updates. These updates are listed in the Updates tab in the Dell APEX Cloud Platform extension in Windows Admin Center.
All updates – even emergency patches from Microsoft that address critical security vulnerabilities or bug fixes – will appear in the Dell extension within just four hours of being released. This near immediate availability of patches is unprecedented in a turnkey, on-premises infrastructure platform. And whether the updates are from Microsoft, Dell, or both organizations, you’ll never need to navigate away from the Dell APEX Cloud Platform extension interface to apply them.
Engineering rigor produces stress-free LCM
In the past, Dell validated hardware updates and Microsoft validated operating system updates independently. With our enhanced lifecycle management approach, every update discovered by Dell APEX Cloud Platform Foundation Software has been jointly tested and validated by Dell and Microsoft. We incorporate new periodic builds of hardware, OS, and systems management components into our respective validation CI/CD pipelines. This raises the bar to an entirely new level of confidence and peace-of-mind for IT administrators.
In the relentless pursuit of delivering worry-free updates, the full stack lifecycle management workflow performs extensive prechecks before any update operations are initiated. For example, all platform components are checked to ensure they comply with the current KGS. If Dell Infrastructure Lock is enabled on the platform, a dialog box informs you that it will be temporarily disabled to allow updates and re-enabled after the update workflow is complete to maintain a strong security posture.
The entire update process relies heavily on Azure Stack HCI’s Lifecycle Manager feature, which employs Cluster-Aware Updating (CAU) to ensure no workloads are interrupted. One cluster node is placed into maintenance mode at a time, which triggers the Live Migration of VMs. CAU installs the updates, restarts the node if required, returns the node to a fully functional state, and proceeds to the next node in the cluster. When the LCM workflow is complete, a new compliance check is triggered to confirm that the platform has fully transitioned to the new target KGS.
Seeing is believing
The best way to summarize all the incredible benefits I’ve discussed about our evolved LCM approach is with a demo. Experience for yourself how stress-free LCM can be in this short video vignette.
Resources
We have tons of great content to help you deep-dive into Dell APEX Cloud Platform for Microsoft Azure powered by Dell APEX Cloud Platform Foundation Software.
- InfoHub (White Papers, Blogs, Interactive Journey, and more) – https://infohub.delltechnologies.com/t/cloud-platforms/
- YouTube playlist with educational and demo videos – https://www.youtube.com/playlist?list=PL2nlzNk2-VMEkNM7E8m0ia_lLHWlOuT5h
- Main product page with spec sheets, solution briefs, infographics, and other great collateral – https://www.dell.com/azure
- Dell Support site with administrator guides – https://www.dell.com/support/home/en/product-support/product/apex-cloud-pf-ms-azure/docs
And as always, please reach out to your Dell account team if you would like to have more in-depth discussions about the Dell APEX Cloud Platforms family. If you don’t currently have a Dell contact, we’re here to help on our corporate website.
Author: Michael Lamia, Engineering Technologist at Dell Technologies
Follow me on Twitter: @Evolving_Techie
LinkedIn: https://www.linkedin.com/in/michaellamia/
Email: michael.lamia@dell.com
Related Blog Posts
Experts Recommend Automation for a Healthier Lifestyle
Wed, 20 Oct 2021 19:59:25 -0000
|Read Time: 0 minutes
Like any good techie, I can get a little obsessed with gadgets that improve my quality of life. Take, for example, my recent discovery of wearable technology that eases the symptoms of motion sickness. For most of my life, I’ve had to take over-the-counter or prescription medicine when boating, flying, and going on road trips. Then, I stumbled across a device that I could wear around my wrist that promised to solve the problem without the side effects. Hesitantly, I bought the device and asked a friend to drive like a maniac around town while I sat in the back seat. It actually worked – no headache, no nausea, and no grogginess from meds! Needless to say, I never leave home without my trusty gizmo to keep motion sickness at bay.
Throughout my career in managing IT infrastructure, stress has affected my quality of life almost as much as motion sickness. There is one responsibility that has always caused more angst than anything else: lifecycle management (LCM). To narrow that down a bit, I’m specifically talking about patching and updating IT systems under my control. I have sometimes been derelict in my duties because of annoying manual steps that distract me from working on the fun, highly visible projects. It’s these manual steps that can cause the dreaded DU/DL (data unavailable or data loss) to rear its ugly head. Can you say insomnia?
Innovative technology to the rescue once again! While creating a demo video last year for our Dell EMC OpenManage Integration with Microsoft Windows Admin Center (OMIMSWAC), I was blown away by how easy we made the BIOS, firmware, and driver updates on clusters. The video did a pretty good job of showing the power of the Cluster-Aware Updating (CAU) feature, but it didn’t go far enough. I needed to quantify its full potential to change an IT profressional’s life by pitting an OMIMSWAC’s automated, CAU approach against a manual, node-based approach. I captured the results of the bake off in Dell EMC HCI Solutions for Microsoft Windows Server: Lifecycle Management Approach Comparison.
Automation Triumphs!
For this white paper to really stand the test of time, I knew I needed to be very clever to compare apples-to-apples. First, I referred to HCI Operations Guide—Managing and Monitoring the Solution Infrastructure Life Cycle, which detailed the hardware updating procedures for both the CAU and node-based approaches. Then, I built a 4-node Dell EMC HCI Solutions for Windows Server 2019 cluster, performed both update scenarios, and recorded the task durations. We all know that automation is king, but I didn’t expect the final tally to be quite this good:
- The automated approach reduced the number of steps in the process by 82%.
- The automated approach required 90% less of my focused attention. In other words, I was able to attend to other duties while the updates were installing.
- If I was in a production environment, the maintenance window approved by the change control board would have been cut in half.
- The automated process left almost no opportunity for human error.
As you can see from the following charts taken from the paper, these numbers only improved as I extrapolated them out to the maximum Windows Server HCI cluster size of 16 nodes.
I thought these results were too good to be true, so I checked my steps about 10 times. In fact, I even debated with my Marketing and Product Management counterparts about sharing these claims with the public! I could hear our customers saying, “Oh, yeah, right! These are just marketecture hero numbers.” But in this case, I collected the hard data myself. I am still confident that these results will stand up to any scrutiny. This is reality – not dreamland!
Just when I thought it couldn’t get any better
So why am I blogging about a project I did last year? Just when I thought the testing results in the white paper couldn’t possibly get any better, Dell EMC Integrated System for Microsoft Azure Stack HCI came along. Azure Stack HCI is Microsoft’s purpose-built operating system delivered as an Azure service. The current release when writing this blog was Azure Stack HCI, version 20H2. Our Solution Brief provides a great overview of our all-in-one validated HCI system, which delivers efficient operations, flexible consumption models, and end-to-end enterprise support and services. But what I’m most excited about are two lifecycle management enhancements – 1-click full stack LCM and Kernel Soft Reboot – that will put an end to the old adage, “If it looks too good to be true, it probably is.”
Let’s invite OS updates to the party
OMIMSWAC was at version 1.1 when I did my testing last year. In that version, the CAU feature focused on the hardware – BIOS, firmware, and drivers. In OMIMSWAC v2.0, we developed an exclusive snap-in to Microsoft’s Failover Cluster Tool Extension to create 1-click full stack LCM. Only available for clusters running Azure Stack HCI, a simple workflow in Windows Admin Center automates not only the hardware updates – but also the operating system updates. How do I see this feature lowering my blood pressure?
- Applying the OS and hardware updates can typically require multiple server reboots. With 1-click full stack LCM, reboots are delayed until all updates are installed. A single reboot per node in the cluster results in greater time savings and shorter maintenance windows.
- I won’t have to use multiple tools to patch different aspects of my infrastructure. The more I can consolidate the number of management tools in my environment, the better.
- A simple, guided workflow that tightly integrates the Microsoft extension and OMIMSWAC snap-in ensures that I won’t miss any steps and provides one view to monitor update progress.
- The OMIMSWAC snap-in provides necessary node validation at the beginning of the hardware updates phase of the workflow. These checks verify that my cluster is running validated AX nodes from Dell Technologies and that all the nodes are homogeneous. This gives me peace of mind knowing that my updates will be applied successfully. I can also rest assured that there will be no interruption to the workloads running in my VMs and containers since this feature leverages CAU.
- The hardware updates leverage the Microsoft HCI solution catalog from Dell Technologies. Each BIOS, firmware, and driver in this catalog is validated by our engineering team to optimize the Azure Stack HCI experience.
The following screen shots were taken from the full stack CAU workflow. The first step indicates which OS updates are available for the cluster nodes.
Node validation is performed first before moving forward with hardware updates.
If the Windows Admin Center host is connected to the Internet, the online update source approach obtains all the systems management utilities and the engineering validated solution catalog automatically. If operating in an edge or disconnected environment, the solution catalog can be created with Dell EMC Repository Manager and placed on a file server share accessible from the cluster nodes.
The following image shows a generated compliance report. All non-compliant components are selected by default for updating. After this point, all the OS and non-compliant hardware components will be updated together with only a single reboot per node in the cluster and with no impact to running workloads.
Life is too short to wait for server reboots
Speaking of reboots, Kernel Soft Reboot (KSR) is a new feature coming in Azure Stack HCI, version 21H2 that also has the potential to make my white paper claims even more jaw dropping. KSR will give me the ability to perform a “software-only restart” on my servers – sparing me from watching the paint dry during those long physical server reboots. Initially, the types of updates in scope will be OS quality and security hotfixes since these don’t require BIOS/firmware initialization. Dell Technologies is also working on leveraging KSR for the infrastructure updates in a future release of OMIMSWAC.
KSR will be especially beneficial when using Microsoft’s CAU extension in Windows Admin Center. The overall time savings using KSR multiplies for clusters because faster restarts means less resyncing of data after CAU resumes each cluster node. Each node should reboot with Mach Speed if there are only Azure Stack HCI OS hotfixes and Dell EMC Integrated System infrastructure updates that do not require the full reboot. I will definitely be hounding my Product Managers and Engineering team to deliver KSR for infrastructure updates in our OMIMSWAC extension ASAP.
Bake off rematch
I decided to hold off on doing a new bakeoff until Azure Stack HCI, version 21H2 is released with KSR. I also want to wait until we bring the benefits of KSR to OMIMSWAC for infrastructure updates. The combination of OMIMSWAC 1-click full stack CAU and KSR will continue to make OMIMSWAC unbeatable for seamless lifecycle management. This means better outcomes for our organizations, improved blood pressure and quality of life for IT pros, and more motion-sickness-free adventure vacations. I’m also looking forward to spending more time learning exciting new technologies and less time with routine administrative tasks.
If you’d like to get hands-on with all the different features in OMIMSWAC, check out the Interactive Demo in Dell Technologies Demo Center. Also, check out my other white papers, blogs, and videos in the Dell Technologies Info Hub.
What’s New in Dell APEX Cloud Platform for Microsoft Azure: Dell Software Defined Storage with Dell PowerFlex
Tue, 23 Apr 2024 17:10:13 -0000
|Read Time: 0 minutes
A few weeks ago, Dell Technologies announced a new release of Dell APEX Cloud Platform for Microsoft Azure that introduced a multitude of new features as well as support for Microsoft Azure Stack HCI version 23H2. Today, we are announcing the availability of a new architectural option for Dell APEX Cloud Platform for Microsoft Azure customers – extending the Azure Stack HCI storage fabric to Dell PowerFlex software-defined storage
Dell APEX Cloud Platform for Microsoft Azure has an architecture designed around common building blocks for compute, software-defined storage, and automated management and operations.
Figure 1. Dell APEX Cloud Platforms common infrastructure
The new capability offers customers the unique advantage of having the flexibility to utilize Microsoft Storage Spaces Direct (S2D) alongside Dell PowerFlex, Dell Technologies’ enterprise-class software-defined storage option. APEX Cloud Platform for Azure is the only offer in the Microsoft Premier Solutions for Azure Stack HCI category that supports linear scaling of storage independently from Azure Stack HCI resources. APEX Cloud Platform for Azure with PowerFlex utilizes PowerFlex storage nodes for block storage. Any of the Dell PowerFlex product variants—rack, appliance, or custom node—can be utilized to satisfy the solution. PowerFlex provides a multitude of essential features:
- Ability to host a wide range of business workloads, including:
- Mission-critical databases and applications that require high transactional performance and low latency
- Applications that require consistent I/O performance, such as those used for streaming, ingestion, and reporting transactions within AI and data analytics environments
- Highly-available platform at six 9s (99.9999%), which translates to just 31.5 seconds of platform downtime a year. On top of that resiliency, the platform incorporates exceptionally fast rebuild and rebalance operations.
- Modular scale-out architecture, built to support exponential data growth
This new storage option, designed to co-exist with—not replace—Microsoft S2D, has been tested in a proof of concept by both Dell Technologies and Microsoft engineering teams, ensuring that the solution supports the uses cases and features described previously.
“As we continue to innovate and empower our customers with cutting-edge solutions, the partnership with Dell and Microsoft is paramount. With the Dell APEX Cloud Platform for Microsoft Azure, customers gain great flexibility and simplicity by leveraging Microsoft Storage Spaces Direct alongside Dell’s enterprise-class software defined block storage. By integrating Microsoft Azure Stack HCI with Dell software-defined block storage, this solution gives businesses the autonomy to deploy workloads across on-premises and in the Azure cloud.” – Meena Gowdar, Group Principal PM Manager, Azure Edge and Platform, at Microsoft.
The architecture of the tested solution is illustrated in the following figure:
Figure 2. Dell APEX Cloud Platform for Microsoft Azure with Dell PowerFlex high-level architecture
As you can see in Figure 2, the components of a Microsoft S2D-only option overlap completely with the option integrating Dell PowerFlex custom nodes.
This architecture has been designed with the following principles in mind:
- High availability – designed to avoid single points of failure in the compute, storage, and networking domains
- Scalability – enabling you to size the infrastructure accordingly to meet initial deployment demands while also allowing infrastructure assets to grow as demand scales up
- Manageability – leveraging a set of familiar tools to manage the environment
- Simplicity – providing a seamless experience for presenting this new external storage to Azure Stack HCI for IT administrators already familiar with S2D
Tested configuration
The Dell Technologies engineering team performed the testing and validation of the PowerFlex storage configuration using the following building blocks:
- Eight MC-760 HCI nodes
- Eight Dell PowerFlex R760 custom nodes
This was deployed and connected as shown in the following figure:
Figure 3. Dell SDS validated configuration
Note: A fully-supported Dell Validated Design will be available at a later date, providing a more complete set of guidance for configuration, deployment, and design.
To learn more, read the detailed architectural white paper, Enabling Mission Critical Workloads on APEX Cloud Platform for Azure with Dell PowerFlex.
For more information on Dell APEX Cloud Platform for Microsoft Azure, please check the Resources section.
Resources
- Enabling Mission Critical Workloads on APEX Cloud Platform for Azure with Dell PowerFlex
- YouTube Playlists about Dell APEX Cloud Platform for Microsoft Azure:
- White papers, blogs, videos, and interactive demos about APEX Cloud Platforms
- Dell APEX Cloud Platform for Microsoft Azure main product page
- Dell APEX Cloud Platform for Microsoft Azure Support site with official user manuals and guides
Author: William Leslie, Director, Cloud Platform Technical Marketing
X: @williamleslie8