Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Dell.com Contact Us
United States/English
Ignacio Borrero
Ignacio Borrero

Senior Principal Engineering Technologist at Dell Technologies



Linkedin: https://www.linkedin.com/in/inakiborrero/           


X (formerly Twitter): https://twitter.com/virtualpeli

Assets

Home > Integrated Products > VxBlock 1000 and 3-Tier Platform Reference Architectures > Blogs

VMware VxBlock CI

Dell EMC VxBlock 1000: Some history and What’s New

Ignacio Borrero Ignacio Borrero

Wed, 24 Apr 2024 12:57:21 -0000

|

Read Time: 0 minutes

First for the “some history” part! Converged Infrastructure (CI) is not a new concept – it’s been with us for more than 10 years. Hey, one could even consider that we were the “inventors” of CI back in the days when we publicly announced it on November 03, 2009 (Press Release).

Figure 1. EMC Joe Tucci (center) unveils the Virtual Computing Environment coalition with VMware's Paul Maritz (left) and Cisco's John Chambers (right).

Nowadays the CI concept is well understood, but in 2009 it was kind of ground breaking since this approach had never been taken before.

“All datacenter requirements in just one system? How could that be possible?”

Those were the days of separated, disconnected, and siloed domains (compute, storage, networks) and CI was a new disruptive technology solution that would require a complete transformation in how IT would architect and consume datacenter infrastructure.

To have a clearer understanding of how Dell Technologies comprehends CI – then and now, we could define our CI end-to-end engineered turnkey system as:

VxBlock 1000, Industry-leading Converged Infrastructure, simplifies all aspects of IT by seamlessly integrating all the compute, network, storage and data protection and cloud management technologies you need into one engineered system. It is an all-in-one, “data center in a box.” You can offload the complexities and risks associated with managing enterprise-grade data center infrastructure so that your IT teams can confidently focus on higher-value activities. (from Top Reasons Why Organizations Choose VxBlock 1000 Converged Infrastructure).

For those new to VxBlock 1000, here are some of the most important values VxBlock has provided:

  • High availability & data protection
    • No single point of failure
    • High availability in all components, fault tolerance as an option
    • Broad data protection suite
  • High scalability & consistent performance
    • 100+ PB storage
    • 1000+ blade & rack servers
    • Scale-up & scale-out, symmetrically or asymmetrically
    • NVMe end-to-end
  • Rich data services & highest efficiency
    • In-line, all-the-time services
    • Compression, de-dupe, replication, encryption, copy data management & machine learning
  • Cloud operating model & converged management
    • Tight VMware vRealize Integration
    • Consistent tool set across hybrid cloud

And what has happened during this 10+ year period?

Many things. Many milestones. Many systems sold. Many successful customer stories and projects that have led VxBlock to a very effective and consistent $1 billion annual run rate business just four years into its existence. Check the diagram below to reflect on some of the key milestones VxBlock and Dell Technologies CI as a whole have delivered during this decade.

Figure 2. VxBlock 1000 one-decade journey

Today, with more than 4500 systems installed in over 100 countries, VxBlock 1000 keeps on leading the way, innovating the CI arena in four key areas that address the second part of this blog post, namely “What’s New”:

  • Flexible technology choices
  • Converged management and automation
  • Life cycle management
  • Future proof design and support

Flexible technology choices

VxBlock System 1000 gives you a choice of industry-leading technologies to meet the needs of all your different workloads, ranging from mission-critical, general purpose (virtualized or not), Artificial Intelligence/Machine Learning, End User Computing/Virtual Desktops… you name it!

Mix and match powerful Dell EMC storage and data protection options, Cisco UCS blade and rack servers, Cisco LAN and SAN networking, and VMware virtualization and cloud management. For more details on infrastructure see VxBlock 1000 data sheet and specs.

Since VxBlock 1000 is not just a reference architecture or a bill of materials, it eliminates the traditional risks associated with “Do It Yourself” approaches. It’s a fully integrated system that is engineered, manufactured, managed, supported, and sustained as one product, delivering a turnkey experience. Dell Technologies validates interoperability of components and provides a predictable system maintenance process that improves availability and productivity.

Converged management and automation

VxBlock 1000 leverages its deep VMware integration to simplify automation of everything from daily infrastructure provisioning tasks to delivery of IaaS and SaaS. At the foundation is VxBlock Central software that provides a single unified interface and access point for converged infrastructure operations.

Figure 3. VxBlock Central, a single pane of glass for management, automation, and LCM

VxBlock Central software dramatically simplifies daily administration by providing enhanced system-level awareness, automation and analytics, including launch points to:

  • VMware vRealize Orchestrator (vRO) with workflows for automating daily operational tasks
  • vRealize Operations (vROps) for deep VxBlock analytics and simplified capacity management

As customers place workloads on top of VxBlock 1000, VxBlock Central helps to provide and maintain these services by managing the infrastructure underneath. See here for more great info about VxBlock Central Workflow Automation and the 40+ workflows available in the Workflow Automation Library.

Life cycle management

Dell EMC CloudIQ for VxBlock features next-generation lifecycle management (LCM) that enables IT teams to more flexibly plan ahead and control converged hardware lifecycle, further reducing risk with proactive SaaS-based insights. You gain granular control over hardware inventory, milestones, support interoperability, and upgrade scenarios.

Future proof design and support

VxBlock 1000 is built with a perpetual design, meaning it will ensure that your system stays ready to support the introduction of next-generation technologies within any of the fundamental domains of the system, whether storage, compute, or network. You can address increased performance and scalability requirements while maximizing the return on your system investment.

Dell Technologies delivers fully integrated 24/7 support with a single call. There’s never any finger-pointing between vendors. You can always rely on our fully cross-trained team for a fast resolution to any problem. Our portfolio of services (including deployment services, migration services, and residency services) accelerates speed of deployment and integration into your IT environment. It also minimizes downtime by ensuring your software and hardware remains up to date throughout the product lifecycle.

Wrapping up

One decade ago, Dell Technologies defined the foundations for CI and created a platform that has evolved to what today is VxBlock 1000. This system (compute, network, storage and management layer) is created (engineered and manufactured), maintained (single management and support), and sustained (in ongoing certified code upgrades) by Dell EMC during its entire journey. Customers simply take the keys of the car and drive.

Ignacio Borrero - LinkedIn, Twitter: @virtualpeli

Home > APEX > Cloud Platforms > Blogs

Azure APEX Cloud Platforms ACP ACP for Azure Monitoring

Monitoring the Dell APEX Cloud Platform for Microsoft Azure with Azure Insights

Ignacio Borrero Ignacio Borrero

Wed, 06 Mar 2024 15:29:18 -0000

|

Read Time: 0 minutes

Some background

In September 2023, we officially released Dell APEX Cloud Platform for Microsoft Azure, the first offer in the market for Premier Solutions for Microsoft Azure Stack HCI.

Collaboratively built with Microsoft, this new platform extends and optimizes Azure Hybrid Cloud to on-premises, delivering three fundamental benefits:

  • Simplifying deployment and operations
  • Accelerating application modernization
  • Optimizing workload placementThis figure shows the Dell APEX Cloud Platform for Microsoft Azure architecture, with MC nodes for the infrastructure layer, Microsoft SDDC layer next to it, and Azure Portal on top with our integrations for Windows Admin Center and Azure-Arc

Figure 1. Dell APEX Cloud Platform for Microsoft Azure Architecture

The innovation at Dell Technologies never stops. We are constantly developing and improving our products, and we have just launched our first update to the platform. Briefly, this release introduces and enhances:

  • New Features:
    • Azure Stack HCI, version 23H2
    • Single-node expansion to a 2-node cluster
    • Convert 2-node switchless storage network cluster to 2-node switched storage
  • ACP foundation software improvements:
    • Day 0/1 Operations (Automated cluster deployment up to 16 nodes, JSON file upload, Day 1 RESTful APIs)
    • Day 2 Operations (GUI for drive add/replace, GUI for node repair/replace)
  • Integrations:
    • Serviceability (Add ESE/RSC logs to support log bundle collection)
    • Azure Update Manager
    • Event Monitoring for Dell APEX Cloud Platform for Microsoft Azure

Check out those updates in greater detail in these blogs: What's New with the Dell APEX Cloud Platform for Microsoft Azure March 2024 Release and Dell Technologies First to Deliver Azure Stack HCI 23H2.

In this blog, we want to put the spotlight on one particularly significant, useful, and easy to consume new capability – Event Monitoring for Dell APEX Cloud Platform for Microsoft Azure.

Event Monitoring for Dell APEX Cloud Platform for Microsoft Azure

Dell APEX Cloud Platform for Microsoft Azure seamlessly integrates with Microsoft’s Azure Portal, providing the ability to monitor events generated on both Dell APEX Cloud Platform for Microsoft Azure hardware and the Cloud Platform Manager VM.

This new Insights for Azure Stack HCI monitor feature allows our customers to directly visualize in Azure Portal informational event data generated by the multicloud (MC) node hardware and the Cloud Platform Manager VM using an Insights integrated workbook.

With this workbook, we are empowering users to effectively manage and optimize their clusters and, in turn, receive the benefit of accelerated issue detection and time to resolution. I know, we’re excited too.

Enabling Event Monitoring for Dell APEX Cloud Platform for Microsoft Azure: Is it difficult?

Not really. Simply follow these steps:

  • First, ensure you meet these mandatory prerequisites in your cluster:
    1. Azure Stack HCI, version 23H2 (registered and connected to Azure)
    2. Arc-enabled (Azure Monitor extension installed)
    3. Insights enabled
  • Once you have completed the prerequisites, navigate to the Insights page of your cluster in Azure portal and:This figure displays the steps to enable our workbook in Azure Portal Insights: 1st: Select the Event Monitoring for Dell APEX Cloud Platform for Microsoft Azure workbook 2nd: Click Enable selected 3rd: Click Enable to enable the workbook

Figure 2. Enabling Event Monitoring in Azure portal

    1. Select the Event Monitoring for Dell APEX Cloud Platform for Microsoft Azure workbook
    2. Click Enable selected
    3. Click Enable to enable the workbook

Boom. Done. That was easy, and now the workbook is enabled…what is next?

What does Event Monitoring for Dell APEX Cloud Platform for Microsoft Azure look like?

Once the page refreshes, you’ll be taken to the first of the two tabs of the workbook – the Overview tab – which provides a brief description of what this workbook is and the information it can provide to its users.

This figure shows an overview of the workbook.

Figure 3. Event Monitoring for Dell APEX Cloud Platform Overview tab

The second tab in the workbook – the Health tab – presents a summary of the alerts or events that have occurred on the cluster, broken down into Warning, Critical, and Informational alerts.

The Health tab also provides a Nodes table with a high-level overview of each node for the selected time range, including which cluster it belongs to, the node name, health status, node state, uptime, and domain.

This figure shows the Health tab, with an overview of the events on top, followed by two tables - one for nodes details and another for all the alerts details.

Figure 4. Event Monitoring for APEX Cloud Platform for Microsoft Azure Health tab

A second table – the Alerts table – shows each alert in greater detail, including its corresponding node, component and subcomponent, severity level, event code, product service tag number, reported time, a short description, and even a knowledgebase article for issue diagnosis and troubleshooting guidance.

Note that you can leverage the Search bar to filter the information based on a given search term and the Time Range drop-down menu to show the events that occurred on all the MC nodes for the cluster within a specific time range.

Conclusion

Our workbook, Event Monitoring for Dell APEX Cloud Platform for Microsoft Azure, makes real the ability to monitor events generated on both Dell APEX Cloud Platform for Microsoft Azure hardware and the Cloud Platform Manager within the Azure Portal.

This powerful integration provides a great deal of value, significantly reducing the issue detection time and time to resolution.

Thanks for reading, and stay tuned for more updates in Info Hub!

Resources

We have tons of great content to help you deep-dive into Dell APEX Cloud Platform for Microsoft Azure powered by Dell APEX Cloud Platform Foundation Software:

And as always, please reach out to your Dell Technologies account team if you would like to have more in-depth discussions about the Dell APEX Cloud Platforms family. If you don’t currently have a Dell Technologies contact, we’re here to help on our corporate website.

 

Author: Ignacio Borrero, Senior Principal Engineer, Technical Marketing Dell CI & HCI

@virtualpeli


Appendix

Concept

Definition

Dell APEX Cloud Platform for Microsoft Azure hardware

A turnkey on-premises infrastructure platform, collaboratively engineered between Dell Technologies and Microsoft to optimize Azure hybrid cloud operations. Based on multicloud (MC) nodes as the cluster(s) foundation.

Cloud Platform Manager VM

Each cluster runs the Dell APEX Cloud Platform Foundation Software in a Cloud Platform Manager VM. This software is responsible for communicating with the underlying infrastructure and integrating automation workflows into Microsoft Windows Admin Center.

Azure Workbook

A flexible canvas for data analysis and the creation of rich visual reports within the Azure portal.

 


Home > APEX > Cloud Platforms > Blogs

security Azure APEX Cloud Platforms ACP

End to End Secured and Shielded Dell APEX Cloud Platform for Microsoft Azure

Ignacio Borrero Ignacio Borrero

Thu, 15 Feb 2024 12:34:55 -0000

|

Read Time: 0 minutes

On September 26, 2023, we introduced to the market the new Dell APEX Cloud Platform for Microsoft Azure. It is the first offer for Premier Solutions for Microsoft Azure Stack HCI, a new category in the Azure Stack HCI catalog reserved for key partners with the greatest levels of engagement with Microsoft and deepest integrations into familiar Microsoft management tools.

Dell APEX Cloud Platform for Microsoft Azure is a fully integrated infrastructure platform designed to optimize Microsoft Azure hybrid cloud deployments by optimizing operations, accelerating time-to-value across on-prem, edge, and Azure cloud deployments. It greatly simplifies initial deployments and on-going operations across the complete technology stack.


Security built in at every phase in the lifecycle

Figure 1. APEX Cloud Platform for Microsoft Azure security: The platformSecurity for Dell APEX Cloud Platform for Microsoft Azure is not an afterthought, but rather an integral part of the overall platform design process that leverages our Cyber Resilient Architecture and inherits Dell’s hardened server and software design to protect, detect, and recover from cyberattacks.

Full stack lifecycle management is key to maintaining a strong security posture throughout the life of your APEX Cloud Platforms, continuously and consistently applying Dell and Microsoft updates without risks to the platform and running workloads.

Dell APEX Cloud Platform for Microsoft Azure also leverages intrinsic infrastructure security management through Dell Infrastructure Lock and Secured-core server functionalities.

  • Infrastructure Lock protects against unintentional or malicious changes to critical configuration settings in the BIOS or iDRAC. It also prevents any updates to BIOS, iDRAC, firmware, or drivers while enabled.
  • Secured-core functionality helps proactively defend against and disrupt many of the paths attackers might use to exploit a system by establishing a hardware root-of-trust, protecting firmware, and introducing virtualization-based security.

You can learn more on these platform features in this video.

 

Azure Stack HCI, Microsoft Defender for Cloud, and Azure Policy security features

Dell APEX Cloud Platform for Microsoft Azure takes full advantage of the security features that come with Azure Stack HCI:

  • Encryption and data protection
    • Data-at-rest encryption enabled with BitLocker by default
    • Self-Encrypting Drives (SED) require authentication independent of the OS
  • Always-on antivirus protection
    • Microsoft Defender Antivirus enabled by default on cluster nodes for real-time detection
    • Automatic definition updates
  • Recommended security baseline
    • Over 200 security settings enabled out-of-the-box
    • Disables legacy protocols and ciphers
    • Closely meets CIS benchmark and DISA STIG requirements
  • Reduced attack surface
    • Figure 2. APEX Cloud Platform for Microsoft Azure: Microsoft integrationWindows Defender Application Control (WDAC) enabled by default
    • WDAC enforces an explicit list of applications and code allowed to run

Microsoft Defender for Cloud and Azure Policy assess, secure, and defend Dell APEX Cloud Platform for Microsoft Azure at-scale:

  • Continuously assess -- understand your current security posture, identify and track vulnerabilities.
  • Secure -- harden connected resources and services by following customized and prioritized recommendations.
  • Defend -- detect and resolve threats to those resources and services. With prioritized security alerts, focus on what matters most and present to the right audience.

With this approach, the entire platform stack is covered – Azure Stack HCI, VMs, AKS hybrid workload cluster, and virtualized and cloud-native applications.

You can learn more on these platform features in this video.


Security Configuration Guide

If you want to go deeper and learn about all the different elements that come into play to properly guarantee the end to end secured and shielded protection for the platform, you can read our Dell APEX Cloud Platform for Microsoft Azure Security Configuration Guide, where we provide the configuration details for:

  • Product and subsystem security: authentication, authorization, network security
  • Cryptography: cryptographic modules
  • Certificate management
  • Event monitoring, auditing, and logging
  • Integrity: security updates


Conclusion

Dell APEX Cloud Platform for Microsoft Azure enhances Azure operations for edge and on-premises deployments by providing consistent management with centralized Azure tools while mitigating security and compliance risks with an intrinsic approach to security that extends Azure governance across all deployment environments.

Thanks for reading and… stay tuned for more updates in Info Hub!

 

Author: Ignacio Borrero, Senior Principal Engineer, HCI and Multicloud Technical Marketing

@virtualpeli


Home > Integrated Products > Microsoft HCI Solutions from Dell Technologies > Blogs

Azure Stack HCI Azure Stack Hub Windows Server HCI AS HCI AS Hub WS HCI

2023 Updates for Azure Stack HCI and Hub (Part I)

Ignacio Borrero Ignacio Borrero

Wed, 30 Aug 2023 22:05:17 -0000

|

Read Time: 0 minutes

The first half of 2023 has been quite prolific for the Dell Azure Stack HCI ecosystem, providing many important incremental updates in the platform. This article summarizes the most relevant changes inside the program.

Azure Stack HCI

Dell Integrated System for Microsoft Azure Stack HCI delivers a fully productized, validated, and supported hyperconverged infrastructure solution that enables organizations to modernize their infrastructure for improved application uptime and performance, simplified management and operations, and lower total cost of ownership. The solution integrates the software-defined compute, storage, and networking features of Microsoft Azure Stack HCI with AX nodes from Dell to offer the high-performance, scalable, and secure foundation needed for a software-defined infrastructure. 

With Azure Arc, we can now unlock new hybrid scenarios for customers by extending Azure services and management to our HCI infrastructure. This allows customers to build, operate, and manage all their resources for traditional, cloud-native, and distributed edge applications in a consistent way across the entire IT estate.

What’s new with Azure Stack HCI?

A lot. There have been so many updates in the Azure Stack HCI front that it is difficult to detail all of them in just a single blog. So, let’s focus on the most important ones.

Azure Stack HCI software and hardware updates

From a software and hardware perspective, the biggest change during the first half of 2023 was the introduction of Azure Stack HCI, version 22H2 (factory install and field support). The most important features in this release are Network ATC, GPU partitioning (GPU-P), and security improvements.

  • Network ATC simplifies the virtual network configuration by leveraging intent-based network deployment and incorporating Microsoft best practices by default. It provides the following advantages over manual deployment:
    • Reduces network configuration deployment time, complexity, and incorrect input errors
    • Uses the latest Microsoft validated network configuration best practices
    • Ensures configuration consistency across all nodes in the cluster
    • Eliminates configuration drift, with periodic consistency checks every 15 minutes
  • GPU-P allows sharing a physical GPU device among several VMs. By leveraging single root I/O virtualization (SR-IOV), GPU-P provides VMs with a dedicated and isolated fractional part of the physical GPU. The obvious advantage of GPU-P is that it enables enterprise-wide utilization of highly valuable and limited GPU resources.
  • Azure Stack HCI OS 22H2 security has been improved with more than 200 security settings enabled by default within the OS (“Secured-by-default”), enabling customers to closely meet Center for Internet Security (CIS) benchmark and Defense Information System Agency (DISA) Security Technical Implementation Guide (STIG) requirements for the OS. All these security changes improve the security posture by also disabling legacy protocols and ciphers.

From a hardware perspective, these are the most relevant additions to the AX node family:

  • More NIC options:
    • Mellanox ConnectX-6 25 GbE
    • Intel E810 100 GbE; also adds RoCEv2 support (now iWARP or RoCE)
  • More GPU options for GPU-P and Discrete Device Assignment (DDA):
    • GPU-P validation for: NVIDIA A40, A16, A2
    • DDA options: NVIDIA A30, T4

To better understand GPU-P and DDA, check this blog.

Azure Stack HCI End of Life (EOL) for several components

As the platforms mature, it is inevitable that some of the aging components are discontinued or replaced with newer versions. The most important changes on this front have been:

  • EOL for AX-640/AX-740xd nodes: Azure Stack HCI 14G servers, AX-640, and AX-740xd reached their EOL on March 31, 2023, and are therefore no longer available for quoting, or new orders. These servers will be supported for up to seven years, until their End of Service Life (EOSL) date. Azure Stack HCI 15G AX-650/AX-750/AX-6515/AX-7525 platforms will continue to be offered to address customer demands.
  • EOL for Windows Server 2019: While Windows Server 2019 will be reaching end of sales/distribution from Dell on June 30, 2023, the product will continue to be in Microsoft Mainstream Support life cycle until January 9, 2024. That means that our customers will be eligible for security and quality updates from Microsoft free of charge until that date. After January 9, 2024, Windows Server will enter a 5-year Extended Support life cycle that will provide our customers with security updates only. Any quality and product fixes will be available from Microsoft for a fee. It is highly recommended that customers migrate their current Windows Server 2019 workloads to Windows Server 2022 to maintain up-to-date support.

Finally, we have introduced a set of short and easily digestible training videos (seven minutes each, on average) to learn everything you need to know about Azure Stack HCI, from the AX platform and Microsoft’s Azure Stack HCI operating system, to the management tools and deploy/support services.

Conclusion

It’s certainly a challenge to synthesize the last six months of incredible innovation into a brief article, but we have highlighted the most important updates: focus on learning all the new Azure Stack HCI 22H2 features and updates, keep current with the hardware updates, and… stay tuned for important announcements by the last quarter of the year: big things are coming in Part 2!!!

Thank you for reading.

Author: Ignacio Borrero, Senior Principal Engineer, Technical Marketing

@virtualpeli



Home > Integrated Products > Microsoft HCI Solutions from Dell Technologies > Blogs

Azure Stack HCI Microsoft Azure Stack HCI training

Dell and Azure Stack HCI Made Easy: the Video Series

Ignacio Borrero Ignacio Borrero

Tue, 25 Apr 2023 17:05:23 -0000

|

Read Time: 0 minutes

It is incredible how time flies and it still feels like yesterday since December 10, 2020, when Microsoft initially released Azure Stack HCI. 

Today, Azure Stack HCI is a huge success and, in combination with Azure Arc, the foundation for any real Microsoft hybrid strategy.

But believe it or not, 850+ days later, Azure Stack HCI is still a big unknown for part of the Microsoft community. In our daily customer engagements, we keep on observing that there are knowledge gaps around the Azure Stack HCI program itself and the partner ecosystem that surrounds it.

In these circumstances, we have decided to take action and create a very short and easy-to-follow video series explaining everything you need to know about Azure Stack HCI from a technical perspective.

What you will find

This initial video training library consist of five videos, each averaging seven minutes in length. Here’s a summary of what you will discover in each of the videos:

Video: What Is Inside Azure Stack HCI

Learn the basics and fundamental components of Azure Stack HCI and get to know the Dell Integrated System for Microsoft Azure Stack HCI platform.

Video: AX nodes

Meet the AX node platform and take the Dell Integrated System for Microsoft Azure Stack HCI route to deliver consistent Azure Stack HCI deployments.

Video: Topology and networking

Explore topology and network deployment options for Dell Integrated System for Microsoft Azure Stack HCI. Make good Azure Stack HCI environments even better with the Dell PowerSwitch family.


 Video: Local management

Learn about Azure Stack HCI local management with Windows Admin Center and OpenManage. This is the perfect combination for quick and easy controlled local deployments…and a solid foundation for true hybrid management.

Video: Services and support

Describes end to end deployment and support for the Dell Integrated System for Microsoft Azure Stack HCI platform with ProDeploy and ProSupport services.

Will there be more?

Absolutely. 

We are already working on the next series where we’ll be covering other important topics that are beyond the scope for this initial launch (such as best practices and stretched clusters).

Conclusion

There is no doubt that Azure Stack HCI is a very hot topic. In fact, it is the key foundational element that enables a true Microsoft hybrid strategy by delivering on-premises infrastructure fully integrated with Azure. This video series explains the different elements that make this possible.

All videos in the series are important, none should be skipped… but if there is one not to be missed, please, go for Dell Azure Stack HCI: Local Management. This topic is actually the hook for the next release (Hint -> Hybrid management is the next big thing!). 

Thanks for reading and… stay tuned for additional videos on the Info Hub!

Author: Ignacio Borrero, Senior Principal Engineer, Technical Marketing Dell CI & HCI

@virtualpeli


Home > Integrated Products > Microsoft HCI Solutions from Dell Technologies > Blogs

AI Azure Stack HCI machine learning GPU

GPU Acceleration for Dell Azure Stack HCI: Consistent and Performant AI/ML Workloads

Ignacio Borrero Ignacio Borrero

Wed, 01 Feb 2023 15:50:35 -0000

|

Read Time: 0 minutes

The end of 2022 brought us excellent news: Dell Integrated System for Azure Stack HCI introduced full support for GPU factory install.

As a reminder, Dell Integrated System for Microsoft Azure Stack HCI is a fully integrated HCI system for hybrid cloud environments that delivers a modern, cloud-like operational experience on-premises. It is intelligently and deliberately configured with a wide range of hardware and software component options (AX nodes) to meet the requirements of nearly any use case, from the smallest remote or branch office to the most demanding business workloads.

With the introduction of GPU-capable AX nodes, now we can also support more complex and demanding AI/ML workloads.

New GPU hardware options

Not all AX nodes support GPUs. As you can see in the table below, AX-750, AX-650, and AX-7525 nodes running AS HCI 21H2 or later are the only AX node platforms to support GPU adapters.

Table 1: Intelligently designed AX node portfolio

Note: AX-640, AX-740xd, and AX-6515 platforms do not support GPUs.

The next obvious question is what GPU type and number of adapters are supported by each platform.

We have selected the following two NVIDIA adapters to start with:

  • NVIDIA Ampere A2, PCIe, 60W, 16GB GDDR6, Passive, Single Wide
  • NVIDIA Ampere A30, PCIe, 165W, 24GB HBM2, Passive, Double Wide

The following table details how many GPU adapter cards of each type are allowed in each AX node:

Table 2: AX node support for GPU adapter cards


AX-750AX-650AX-7525
NVIDIA A2Up to 2Up to 2Up to 3
NVIDIA A30Up to 2--Up to 3
Maximum GPU number 
(must be same model)
223

Use cases

The NVIDIA A2 is the entry-level option for any server to get basic AI capabilities. It delivers versatile inferencing acceleration for deep learning, graphics, and video processing in a low-profile, low-consumption PCIe Gen 4 card.

The A2 is the perfect candidate for light AI capability demanding workloads in the data center. It especially shines in edge environments, due to the excellent balance among form factor, performance, and power consumption, which results in lower costs.

The NVIDIA A30 is a more powerful mainstream option for the data center, typically covering scenarios that require more demanding accelerated AI performance and a broad variety of workloads:

  • AI inference at scale
  • Deep learning training
  • High-performance computing (HPC) applications
  • High-performance data analytics

Options for GPU virtualization

There are two GPU virtualization technologies in Azure Stack HCI: Discrete Device Assignment (also known as GPU pass-through) and GPU partitioning.

Discrete Device Assignment (DDA)

DDA support for Dell Integrated System for Azure Stack HCI was introduced with Azure Stack HCI OS 21H2. When leveraging DDA, GPUs are basically dedicated (no sharing), and DDA passes an entire PCIe device into a VM to provide high-performance access to the device while being able to utilize the device native drivers. The following figure shows how DDA directly reassigns the whole GPU from the host to the VM:

Figure 1: Discrete Device Assignment in action

To learn more about how to use and configure GPUs with clustered VMs with Azure Stack HCI OS 21H2, you can check Microsoft Learn and the Dell Info Hub.

GPU partitioning (GPU-P)

GPU partitioning allows you to share a physical GPU device among several VMs. By leveraging single root I/O virtualization (SR-IOV), GPU-P provides VMs with a dedicated and isolated fractional part of the physical GPU. The following figure explains this more visually:

Figure 2: GPU partitioning virtualizing 2 physical GPUs into 4 virtual vGPUs

The obvious advantage of GPU-P is that it enables enterprise-wide utilization of highly valuable and limited GPU resources.

Note these important considerations for using GPU-P:

  • Azure Stack HCI OS 22H2 or later is required.
  • Host and guest VM drivers for GPU are needed (requires a separate license from NVIDIA).
  • Not all GPUs support GPU-P; currently Dell only supports A2 (A16 coming soon).
  • We strongly recommend using Windows Admin Center for GPU-P to avoid mistakes.

 You’re probably wondering about Azure Virtual Desktop on Azure Stack HCI (still in preview) and GPU-P. We have a Dell Validated Design today and will be refreshing it to include GPU-P during this calendar year.  

To learn more about how to use and configure GPU-P with clustered VMs with Azure Stack HCI OS 22H2, you can check Microsoft Learn and the Dell Info Hub (Dell documentation coming soon).

Timeline

As of today, Dell Integrated System for Microsoft Azure Stack HCI only provides support for Azure Stack HCI OS 21H2 and DDA.

Full support for Azure Stack HCI OS 22H2 and GPU-P is around the corner, by the end of the first quarter, 2023.

Conclusion

The wait is finally over, we can now leverage in our Azure Stack HCI environments the required GPU power for AI/ML highly demanding workloads.

Today, DDA provides fully dedicated GPU pass-through utilization, whereas with GPU-P we will very soon have the choice of providing a more granular GPU consumption model.

Thanks for reading, and stay tuned for the ever-expanding list of validated GPUs that will unlock and enhance even more use cases and workloads!

 

Author: Ignacio Borrero, Senior Principal Engineer, Technical Marketing Dell CI & HCI

@virtualpeli

Home > Integrated Products > Microsoft HCI Solutions from Dell Technologies > Blogs

Azure Stack HCI Microsoft hybrid cloud Windows Admin Center systems management

Dell Hybrid Management: Azure Policies for HCI Compliance and Remediation

Ignacio Borrero Ignacio Borrero

Mon, 30 May 2022 17:05:47 -0000

|

Read Time: 0 minutes

Dell Hybrid Management: Azure Policies for HCI Compliance and Remediation

Companies that take an “Azure hybrid first” strategy are making a wise and future-proof decision by consolidating the advantages of both worlds—public and private—into a single entity.

Sounds like the perfect plan, but a key consideration for these environments to work together seamlessly is true hybrid configuration consistency.

A major challenge in the past was having the same level of configuration rules concurrently in Azure and on-premises. This required different tools and a lot of costly manual interventions (subject to human error) that resulted, usually, in potential risks caused by configuration drift. 

But those days are over.

We are happy to introduce Dell HCI Configuration Profile (HCP) Policies for Azure, a revolutionary and crucial differentiator for Azure hybrid configuration compliance.

 

Figure 1: Dell Hybrid Management with Windows Admin Center (local) and Azure/Azure Arc (public)

So, what is it? How does it work? What value does it provide?

Dell HCP Policies for Azure is our latest development for Dell OpenManage Integration with Windows Admin Center (OMIMSWAC). With it, we can now integrate Dell HCP policy definitions into Azure Policy. Dell HCP is the specification that captures the best practices and recommended configurations for Azure Stack HCI and Windows-based HCI solutions from Dell to achieve better resiliency and performance with Dell HCI solutions.

The HCP Policies feature functions at the cluster level and is supported for clusters that are running Azure Stack HCI OS (21H2) and pre-enabled for Windows Server 2022 clusters.

IT admins can manage Azure Stack HCI environments through two different approaches:

  • At-scale through the Azure portal using the Azure Arc portfolio of technologies
  • Locally on-premises using Windows Admin Center

 

Figure 2: Dell HCP Policies for Azure - onboarding Dell HCI Configuration Profile

By using a single Dell HCP policy definition, both options provide a seamless and consistent management experience.

Running Check Compliance automatically compares the recommended rules packaged together in the Dell HCP policy definitions with the settings on the running integrated system. These rules include configurations that address the hardware, cluster symmetry, cluster operations, and security.


Figure 3: Dell HCP Policies for Azure - HCP policy compliance

Dell HCP Policy Summary provides the compliance status of four policy categories:

  • Dell Infrastructure Lock Policy - Indicates enhanced security compliance to protect against unintentional changes to infrastructure
  • Dell Hardware Configuration Policy - Indicates compliance with Dell recommended BIOS, iDRAC, firmware, and driver settings that improve cluster resiliency and performance
  • Dell Hardware Symmetry Policy - Indicates compliance with integrated-system validated components on the support matrix and best practices recommended by Dell and Microsoft
  • Dell OS Configuration Policy - Indicates compliance with Dell recommended operating system and cluster configurations

Figure 4: Dell HCP Policies for Azure - HCP Policy Summary

 

To re-align non-compliant policies with the best practices validated by Dell Engineering, our Dell HCP policy remediation integration with WAC (unique at the moment) helps to fix any non-compliant errors. Simply click “Fix Compliance.”

Figure 5: Dell HCP Policies for Azure - HCP policy remediation

Some fixes may require manual intervention; others can be corrected in a fully automated manner using the Cluster-Aware Updating framework.

Conclusion

The “Azure hybrid first” strategy is real today. You can use Dell HCP Policies for Azure, which provides a single-policy definition with Dell HCI Configuration Profile  and a consistent hybrid management experience, whether you use Dell OMIMSWAC for local management or Azure Portal for management at-scale.

With Dell HCP Policies for Azure, policy compliance and remediation are fully covered for Azure and Azure Stack HCI hybrid environments.

You can see Dell HCP Policies for Azure in action at the interactive Dell Demo Center.

Thanks for reading!


Author: Ignacio Borrero, Dell Senior Principal Engineer CI & HCI, Technical Marketing

Twitter: @virtualpeli

 

Home > Integrated Products > Microsoft HCI Solutions from Dell Technologies > Blogs

Azure Stack HCI Microsoft security

Azure Stack HCI automated and consistent protection through Secured-core and Infrastructure lock

Ignacio Borrero Ignacio Borrero

Mon, 21 Feb 2022 17:45:58 -0000

|

Read Time: 0 minutes

Global damages related to cybercrime were predicted to reach USD 6 trillion in 2021! This staggering number highlights the very real security threat faced not only by big companies, but also for small and medium businesses across all industries.

Cyber attacks are becoming more sophisticated every day and the attack surface is constantly increasing, now even including the firmware and BIOS on servers.

 

Figure 1: Cybercrime figures for 2021 

However, this isn’t all bad news, as there are now two new technologies (and some secret sauce) that we can leverage to proactively defend against unauthorized access and attacks to our Azure Stack HCI environments, namely: 

  1. Secured-core Server
  2. Infrastructure lock

Let’s briefly discuss each of them.

Secured-core is a set of Microsoft security features that leverage the latest security advances in Intel and AMD hardware. It is based on the following three pillars:

  • Hardware root-of-trust: requires TPM 2.0 v3, verifies for validly signed firmware at boot times to prevent tamper attacks
  • Firmware protection: uses Dynamic Root of Trust of Measurement (DRTM) technology to isolate the firmware and limit the impact of vulnerabilities
  • Virtualization-based security (VBS): in conjunction with hypervisor-based code integrity (HVCI), VBS provides granular isolation of privileged parts of the OS (like the kernel) to prevent attacks and exfiltration of data 

Infrastructure lock provides robust protection against unauthorized access to resources and data by preventing unintended changes to both hardware configuration and firmware updates.

When the infrastructure is locked, any attempt to change the system configuration is blocked and an error message is displayed. 

Now that we understand what these technologies provide, one might have a few more questions, such as:

  • How do I install these technologies?
  • Is it easy to deploy and configure?
  • Does it require a lot of human manual (and perhaps error prone) interaction? 

In short, deploying these technologies is not an easy task unless you have the right set of tools in place. 

This is when you’ll need the “secret sauce”— which is the Dell OpenManage Integration with Microsoft Windows Admin Center (OMIMSWAC) on top of our certified Dell Cyber-resilient Architecture, as illustrated in the following figure: 

Figure 2: OMIMSWAC and Dell Cyber-resilient Architecture with AX Nodes 

As a quick reminder, Windows Admin Center (WAC) is Microsoft’s single pane of glass for all Windows management related tasks. 

Dell OMIMSWAC extensions make WAC even better by providing additional controls and management possibilities for certain features, such as Secured-core and Infrastructure lock. 

Dell Cyber Resilient Architecture 2.0 safeguards customer’s data and intellectual property with a robust, layered approach. 

Since a picture is worth a thousand words, the next section will show you what WAC extensions look like and how easy and intuitive they are to play with. 

Dell OMIMSWAC Secured-core

The following figure shows our Secured-core snap-in integration inside the WAC security blade and workflow.

Figure 3: OMIMSWAC Secured-core view

 The OS Security Configuration Status and the BIOS Security Configuration Status are displayed. The BIOS Security Configuration Status is where we can set the Secured-core required BIOS settings for the entire cluster.

OS Secured-core settings are visible but cannot be altered using OMIMSWAC (you would directly use WAC for it). You can also view and manage BIOS settings for each node individually.

Figure 4: OMIMSWAC Secured-core, node view

Prior to enabling Secured-core, the cluster nodes must be updated to Azure Stack HCI, version 21H2 (or newer). For AMD Servers, the DRTM boot driver (part of the AMD Chipset driver package) must be installed. 

Dell OMIMSWAC Infrastructure lock

The following figure illustrates the Infrastructure lock snap-in integration inside the WAC security blade and workflow. Here we can enable or disable Infrastructure lock to prevent unintended changes to both hardware configuration and firmware updates.

Figure 5: OMIMSWAC Infrastructure lock

Enabling Infrastructure lock also blocks the server or cluster firmware update process using OpenManage Integration extension tool. This means a compliance report will be generated if you are running a Cluster Aware Update (CAU) operation with Infrastructure lock enabled, which will block the cluster updates. If this occurs, you will have the option to temporarily disable Infrastructure lock and have it automatically re-enabled when the CAU is complete. 

Conclusion

Dell understands the importance of the new security features introduced by Microsoft and has developed a programmatic approach, through OMIMSWAC and Dell’s Cyber-resilient Architecture, to consistently deliver and control these new features in each node and cluster. These features allow customers to always be secure and compliant on Azure Stack HCI environments.

Stay tuned for more updates (soon) on the compliance front, thank you for reading this far!

Author Information

 Ignacio Borrero, Senior Principal Engineer, Technical Marketing

Twitter: @virtualpeli

References

2020 Verizon Data Breach Investigations Report

2019 Accenture Cost of Cybercrime Study

Global Ransomware Damage Costs Predicted To Reach $20 Billion (USD) By 2021 

Cybercrime To Cost The World $10.5 Trillion Annually By 2025 

The global cost of cybercrime per minute to reach $11.4 million by 2021

Home > Integrated Products > Microsoft HCI Solutions from Dell Technologies > Blogs

Azure Stack HCI Microsoft life cycle management disaster recovery stretch clustering

Azure Stack HCI Stretch Clustering: because automatic disaster recovery matters

Ignacio Borrero Ignacio Borrero

Wed, 22 Sep 2021 18:17:41 -0000

|

Read Time: 0 minutes

If history has taught us anything, it’s that disasters are always around the corner and tend to appear in any shape or form when they’re least expected.

To overcome these circumstances, we need the appropriate tools and technologies that can guarantee resuming operations back to normal in a secure, automatic, and timely manner.

Traditional disaster recovery (DR) processes are often complex and require a significant infrastructure investment. They are also labor intensive and prone to human error.

Since December 2020, the situation has changed. Thanks to the new release of Microsoft Azure Stack HCI, version 20H2, we can leverage the new Azure Stack HCI stretched cluster feature on Dell EMC Integrated System for Microsoft Azure Stack HCI (Azure Stack HCI).

The integrated system is based on our flexible AX nodes family as the foundation, and combines Dell Technologies full stack life cycle management with the Microsoft Azure Stack HCI operating system.

It is important to note that this technology is only available for the integrated system offering under the certified Azure Stack HCI catalog.

Azure Stack HCI stretch clustering provides an easy and automatic solution (no human interaction if desired) that assures transparent failovers of disaster-impacted production workloads to a safe secondary site.

It can also be leveraged to perform planned operations (such as entire site migration, or disaster avoidance) that, until now, required labor intensive and error prone human effort for execution.

Stretch clustering is one type of Storage Replica configuration. It allows customers to split a single cluster between two locations—rooms, buildings, cities, or regions. It provides synchronous or asynchronous replication of Storage Spaces Direct volumes to provide automatic VM failover if a site disaster occurs.

There are two different topologies:

  • Active-Passive: All the applications and workloads run on the primary (preferred) site while the infrastructure at the secondary site remains idle until a failover occurs.
  • Active-Active: There are active applications in both sites at any given time and replication occurs bidirectionally from either site. This setup tends to be a more efficient use of an organization’s investment in infrastructure because resources in both sites are being used.

Azure Stack HCI stretch clustering topologies: Active-Passive and Active-Active

 To be truly cost-effective, the best data protection strategies incorporate a combination of different technologies (deduplicated backup, archive, data replication, business continuity, and workload mobility) to deliver the right level of data protection for each business application.

The following diagram highlights the fact that just a reduced data set holds the most valuable information. This is the sweet spot for stretch clustering.


For a real-life experience, our Dell Technologies experts put Azure Stack HCI stretched clustering to the test in the following lab setup: 

Test lab cluster network topology

 Note these key considerations regarding the lab network architecture:

  • The Storage Replica, management, and VM networks in each site were unique Layer 3 subnets. In Active Directory, we configured two sites—Bangalore (Site 1) and Chennai (Site 2)—based on these IP subnets so that the correct sites appeared in Failover Cluster Manager on configuration of the stretched cluster. No additional manual configuration of the cluster fault domains was required.
  • Average latency between the two sites was less than 5 milliseconds, required for synchronous replication.
  • Cluster nodes could reach a file share witness within the 200-millisecond maximum roundtrip latency requirement.
  • The subnets in both sites could reach Active Directory, DNS, and DHCP servers.
  • Software-defined networking (SDN) on a multisite cluster is not currently supported and was not used for this testing.

For all the details, see this white paper: Adding Flexibility to DR Plans with Stretch Clustering for Azure Stack HCI.

In this blog though, I only want to focus on summarizing the results we obtained in our labs for the following four scenarios:

  • Scenario 1: Unplanned node failure
  • Scenario 2: Unplanned site failure
  • Scenario 3: Planned failover
  • Scenario 4: Life cycle management

Scenario

Event

Simulated failure or maintenance event

Stretched Cluster

expected response

Stretched Cluster

actual response

1

Unplanned node failure

Node 1 in Site 1 power-down

Impacted VMs should failover to another local node

In around 5 minutes, all 10 VMs in Node 1 Site 1 fully restarted in Node 2 Site 1.

 

This is expected behavior since Site 1 has been configured as preferred site; otherwise, the active volume could have been moved to Site 2, and the VMs would have been restarted on a cluster node in Site 2.

2

Outage in Site 1

Simultaneous power-down of Nodes 1 and 2 in site 1

Impacted VMs should failover to nodes on the secondary site

In 25 minutes, all VMs were restarted, and the included web application was fully responsive.

 

The volumes owned by the nodes in Site 2 remained online throughout this failure scenario.

 

The replica volumes remained offline until Site 1 was restored to full health.

Once Site 1 was back online, synchronous replication began again from the source volumes in Site 2 to their destination replica partners in Site 1.

3

Planned failover

Switch Direction operation on a volume from Windows Admin Center

Selected VMs and workloads should transparently move to secondary site

Within 0 to 3 mins, the application hosted by the affected VMs was reachable without service interruption (time depends on whether IP reassignment is required).

 

First, the owner node for the volumes changed to Node 2 in Site 2, and owner node for the replica volumes changed to Node 2 in Site 1. No service interruption.

At this time, the test VM was running in Site 1, but its virtual disk that resided on the volume was running in Site 2. Performance problems can result because I/O is traversing the replication links across sites. After approximately 10 minutes, a Live Migration of the test VM would occur automatically (if not manually initiated earlier) so that the VM would be on the same node as its virtual disk.

4

Lifecycle management

Update all nodes in the cluster by using Single-click Full Stack Cluster Aware Updating (CAU) in Windows Admin Center

Stretched cluster and CAU should work seamlessly together to provide full stack cluster update without service interruption and local only workload mobility for the Live Migrated VMs

The total process of applying the operating system and firmware updates to the stretched cluster took approximately 3 hours, and the process had no application impact.

 

Each node was drained, and its VMs were live migrated to the other node in the same site.

The intersite links between Site 1 and Site 2 were never used during update operations.  In addition, the process required only a single reboot per node.

This behavior was consistent throughout the update of all the nodes in the stretched cluster.

 To sum up, Azure Stack HCI Stretch Clustering has been shown to work as expected under difficult circumstances. It can easily be leveraged to cover a wide range of data protection scenarios, such as:

  • restoring your organization's IT within minutes after an unplanned event
  • transparently moving running workloads between sites to avoid incoming disasters or other planned operations
  • automatically failing over VMs and workloads of individual failed nodes

This technology may make the difference for businesses to automatically stand up after disaster strikes, a total game changer in the automatic disaster recovery landscape.

Thank you for your time reading this blog and don’t forget to check out the full white paper!!!

 

Home > Integrated Products > Microsoft HCI Solutions from Dell Technologies > Blogs

Azure Stack HCI HCI Windows Server Microsoft Azure

Technology leap ahead: 15G Intel based Dell EMC Integrated System for Microsoft Azure Stack HCI

Ignacio Borrero Ignacio Borrero

Wed, 22 Sep 2021 18:15:33 -0000

|

Read Time: 0 minutes

We are happy to announce the latest members of the family for our Microsoft HCI Solutions from Dell Technologies: the new AX-650 and AX-750 nodes.

If you are already familiar with our existing integrated system offering, you can directly jump to the next section. For those new to the party, keep on reading! 

 

Figure 1: Dell EMC Integrated System for Microsoft Azure Stack HCI portfolio: New AX-650 and AX-750 nodes

As with all other nodes supported by Dell EMC Integrated System for Microsoft Azure Stack HCI, the AX-650 and AX-750 nodes have been intelligently and deliberately configured with a wide range of component options to meet the requirements of nearly any use case – from the smallest remote or branch office to the most demanding database workloads.

The chassis, drive, processor, DIMM module, network adapter, and their associated BIOS, firmware, and driver versions have been carefully selected and tested by the Dell Technologies engineering team to optimize the performance and resiliency of Azure Stack HCI. Our engineering has also validated networking topologies using PowerSwitch network switches.

Arguably the most compelling aspect of our integrated system is our life cycle management capability. The Integrated Deploy and Update snap-in works with the Microsoft cluster creation extension to deliver Dell EMC HCI Configuration Profile. This Configuration Profile ensures a consistent, automated initial cluster creation experience on Day 1. The one-click full stack life cycle management snap-in for the Microsoft Cluster-Aware Updating extension allows administrators to apply updates. This seamlessly orchestrates OS, BIOS, firmware, and driver updates through a common Windows Admin Center workflow.

On top of it, Dell Technologies makes support services simple, flexible, and worry free – from installation and configuration to comprehensive, single source support. Certified deployment engineers ensure accuracy and speed, reduce risk and downtime, and free IT staff to work on those higher value priorities. Our one-stop cluster level support covers the hardware, operating system, hypervisor, and Storage Spaces Direct software, whether you purchased your license from Dell EMC or from Microsoft.

Now that we are at the same page with our integrated system…

What’s new with AX-650 and AX-750? Why are they important for our customers?

AX-650 and AX-750 are based on Intel Xeon Scalable 3rd generation Ice Lake processors that introduce big benefits in three main areas:

  • Hardware improvements
  • New features
  • Management enhancements

Hardware improvements

Customers always demand the highest levels of performance available, and our new 15G platforms, through Intel Ice Lake and its latest 10nm technology, deliver huge performance gains (compared to the previous generation) for: 

  • Processing: up to a 40 percent CPU performance increase, a 15 percent per core performance boost, and 42 percent more cores
  • Memory: 33 percent more memory channels, a 20 percent frequency boost, and a 2.66x increase in memory capacity
  • PCIe Gen4 IO acceleration: a doubled throughput increase compared to PCIe Gen3, 33 percent more lanes, an increase in direct attached Gen4 NVMe drives, and support for the latest Gen4 accelerators and GPUs

These impressive figures are a big step forward from a hardware boost perspective, but there are even more important things going on than just brute power and performance.

Our new 15G platforms lay the technology foundation for the latest features that are coming (really) soon with the new version of Microsoft Azure Stack HCI.

New features

Windows Server 2022 and Azure Stack HCI, version 21H2 will bring in (when they are made available) the following two key features:

  • Secured-core Server
  • GPU support

The fundamental idea of Secured-core Server is to stay ahead of attackers and protect our customers’ infrastructure and data all through hardware, BIOS, firmware, boot, drivers, and the operating system. This idea is based on three pillars:

  • Simplified security: easy integration and consumption through Windows Admin Center
  • Advanced protection: leveraging hardware root-of-trust, firmware protection, and virtualization-based security (VBS)
  • Preventative defense: proactively block the paths attackers use to exploit a system

For more details about Secured-core Server, click here.

Figure 2: Secured-core Server with Windows Admin Center integration

AX-650, AX-750, and AX-7525 are the first AX nodes to introduce GPU readiness for single-width and double-width GPUs.

With the September 21, 2021 launch, all configurations planned to support GPUs are already enabled in anticipation for the appropriate selection of components (such as GPU risers, power supplies, fans, and heatsinks).

This process permits the GPU(s) to be added later on (when properly validated and certified) as an After Point of Sale (APOS).

The first GPU that will be made available with AX nodes (AX-650, AX-750, and AX-7525) is the NVIDIA T4 card.

To prepare for this GPU, customers should opt for the single-width capable PCI riser.

The following table shows the maximum number of adapters per platform taking into account the GPU form factor:

 

AX-750

AX-650

AX-7525

 

Single width

Dual width

Single width

Dual width

Single width

Dual width

All SSD

Up to 31

Up to 2

Up to 22

N/A

 

All NVMe

Up to 31

Up to 2

Up to 22

N/A

Up to 33

Up to 33

NVMe+SSD

 

Up to 4

Up to 3

1 Max of 3 factory installed with Mellanox NIC adapters. Exploring options for up to 4 SW GPUs
2 Depending on the number of RDMA NICs
3 Only with the x16 NVMe chassis. x24 NVMe chassis does not support any GPUs

Note that no GPUs are available at the September 21, 2021 launch. GPUs will not be validated and factory installable until early 2022.

Management enhancements

Dell EMC OpenManage Integration with Microsoft Windows Admin Center (OMIMSWAC) extension was launched in 2019.

It has included hardware and firmware inventory, real time health monitoring, iDRAC integrated management, troubleshooting tools, and seamless updates of BIOS, firmware, and drivers.

In the 2.0 release in February 2020, we also added single-click full stack life cycle management with Cluster-Aware Updating for the Intel-based Azure Stack HCI platforms. This allowed us to orchestrate OS, BIOS, firmware, and driver updates through a single Admin Center workflow, requiring only a single reboot per node in the cluster and resulting in no interruption to the services running in the VMs. 

With the Azure Stack HCI June 2021 release, the OpenManage Integration extension added support for the AX-7525 and AX-6515 AMD based platforms.

Now, with the September 21, 2021 launch, OMIMSWAC 2.1 features a great update for AX nodes, including these important extensions:

  • Integrated Deploy & Update
  • CPU Core Management
  • Cluster Expansion 

Integrated Deploy & Update deploys Azure Stack HCI with Dell EMC HCI Configuration Profile for optimal cluster performance. Our integration also adds the ability to apply hardware solution updates like BIOS, firmware, and drivers at the same time as operating system updates as part of cluster creation with a single reboot.

With CPU Core Management, customers can dynamically adjust the CPU core count BIOS settings without leaving the OpenManage Integration extension in Windows Admin Center, helping to maintain the right balance between cost and performance.

Cluster Expansion helps to prepare new cluster nodes before adding them to the cluster, to significantly simplify the cluster expansion process, reduce human error, and save time.

Figure 3: CPU Core Management and Cluster Expansion samples

In conclusion, the AX-650 and AX-750 nodes establish the most performant and easy to operate foundation for Azure Stack HCI today, along with all the new features and goodness that Microsoft is preparing. Stay tuned for more news and updates on this front!

Author Information

Ignacio Borrero, @virtualpeli

Home > Integrated Products > Integrated System for Azure Stack Hub > Blogs

Microsoft IoT Azure Stack Hub Azure

The challenging Edge: Dell Technologies to the rescue with Azure Stack Hub – Tactical

Ignacio Borrero Ignacio Borrero

Mon, 26 Jul 2021 12:46:04 -0000

|

Read Time: 0 minutes

Some context for a turbulent environment

In August 31, 2017, Microsoft launched Azure Stack Hub and enabled a true hybrid cloud operating model to extend Azure services on-premises. An awesome and long expected milestone at that time!

Implementing Azure Stack Hub in our customers’ datacenters under normal circumstances is a pretty straightforward process today if you choose our Dell EMC Integrated System for Microsoft Azure Stack Hub.

But there are certain cases where delivering Azure Stack Hub may be complex (or even impossible), especially in scenarios such as:

  • Edge scenarios: semi-permanent or permanent sites where there is no planned decommissioning, that can include:
  • IoT Applications: Device provisioning, tracking and management applications
  • Efficient field operations: Disaster Relief, humanitarian efforts, embassies
  • Smarter management of mobile fleet assets: Utility and maintenance vehicles

  • Tactical scenarios: strategic sites stood up for a specific task, temporary or permanent, that can experience:
  • Limited/restricted connectivity: submarines, aircraft, and planes
  • Harsh Conditions: combat zones, oil rigs, mine shafts

The final outcome in these environments remains the same: provide always-on cloud services everywhere from a minimal set of local resources.

The question is… how do we make this possible?

The answer is: Dell EMC Integrated System for Microsoft Azure Stack Hub – Tactical

Dell Technologies, in partnership with Microsoft and Tracewell Systems, has developed Dell EMC Integrated System for Microsoft Azure Stack Hub – Tactical (aka Azure Stack Hub – Tactical): a unique ruggedized and field-deployable solution for Azure Stack tactical edge environments.

Azure Stack Hub – Tactical extends Azure-based solutions beyond the traditional data center to a wide variety of non-standard environments, providing a local Azure consistent cloud with:

  • Limited or no network connectivity
  • Fully mobile, or high portability (“2-person lift”) requirements
  • Harsh conditions requiring military specifications solutions
  • High security requirements, with optional connectivity to Azure Government and Azure Government Secret

Azure Stack Hub – Tactical is functionally and electrically identical to Azure Stack Hub All-Flash to ensure interoperability. It includes custom engineered modifications to make the whole solution fit into just three ruggedized cases that are only23.80 inches wide, 41.54 inches high, and 25.63 inches deep.

The smallest Azure Stack Hub – Tactical configuration comprises one management case plus two compute cases, each of them containing:

  • Management case:
  • 1 x T-R640 HLH management server (2U)
  • 1 x N3248TE-ON management switch (1U)
  • 2 x S5248F-ON Top-of-Rack switches (1U each)
  • Total weight: 146 lbs.

   

  • Compute case:

2 x T-R640 servers, based on Dell EMC PowerEdge R640 All-Flash server adapted for tactical use (2U each)

Two configuration options for compute servers:

  • Low:
  • 2 x Intel 5118 12-core processors
  • 384 GB memory
  • 19.2TB total raw SSD capacity
  • High:
  • 2 x Intel 6130 16-core processors
  • 768GB memory
  • 38.4TB total raw SSD capacity
  • Total weight: 116 lbs.
  • Heater option for extended temperature operation support:
  • The Tactical devices are designed to meet MIL-STD-810G specification
  • Dell Technologies in collaboration with Tracewell systems has designed a fully automated heater which, when fully integrated, can provide supplemental heating to the device when needed.

Compute cases can grow up to 8, for a total of 16 servers (in 4-node increments) -- the scale unit maximum mandated by Microsoft.

You can read the full specifications here.

Azure Stack Hub – Tactical is a turnkey end to end engineered solution designed, tested, and sustained through the entire lifespan of all of its hardware and software components.

It includes non-disruptive operations and automated full stack life cycle management for on-going component maintenance, fully coordinated with Microsoft’s Update process.

Customers also benefit from a simplified one call support model across all solution components.

Conclusion

Desperate “edge-cuts” must have desperate “tactical-cures”, and that is exactly what Dell EMC Integrated System for Microsoft Azure Stack Hub – Tactical delivers to our customers for edge environments and extreme conditions.

Azure Stack Hub – Tactical resolves the challenges of providing Azure cloud services everywhere by allowing our customers to add/remove deployments with relative ease through an automated, repeatable, and predictable process requiring minimal local IT resources.

Thanks for reading and stay tuned for more blog updates in this space by visiting Info Hub!

 


Home > Integrated Products > Microsoft HCI Solutions from Dell Technologies > Blogs

Azure Stack HCI Windows Admin Center life cycle management OMIMSWAC

Dell EMC OpenManage Integration with Microsoft Windows Admin Center v2.0 Technical Walkthrough

Ignacio Borrero Ignacio Borrero

Wed, 16 Jun 2021 13:35:49 -0000

|

Read Time: 0 minutes

Introduction

Dell EMC Integrated System for Microsoft Azure Stack HCI is a fully integrated HCI system for hybrid cloud environments that delivers a modern, cloud-like operational experience on-premises from a mature market leader.

The integrated system is based on our flexible AX nodes family as the laying foundation, and combines Dell Technologies full stack life cycle management with the Microsoft Azure Stack HCI operating system.

This blog focuses on one of the most important and critical parts of Azure Stack HCI: the management layer. Check this blog for additional background.

We will show how at Dell Technologies we make the good - Microsoft Windows Admin Center (WAC) - even better, through our OpenManage Integration with Microsoft Windows Admin Center v2.0 (OMIMSWAC).

The following diagram illustrates a typical Dell Technologies Azure Stack HCI setup:

To learn more about Microsoft HCI Solutions from Dell Technologies and get details on each of the different components, check out this video where our Dell Technologies experts examine the solution thoroughly from the bottom up.

 

Windows Admin Center Extensions from Microsoft

WAC provides the option to leverage easy-to-use workflows to perform many tasks, including automatic deployments (coming soon) and updates.

Dell Technologies has developed specialized snap-ins that integrate OpenManage with WAC to further extend the capabilities of Microsoft’s WAC extensions.

The following table describes the three key elements highlighted in the previous diagram as (1), (2), and (3). We examine each in detail in the next three sections.

ItemTypeIntegrates withDeveloped byDescription

Microsoft Cluster Aware Updating extension


Microsoft Failover Cluster Tool Extension 1.250.0.nupkg release*

* Min version validated

Extension

WAC

Microsoft

WAC workflow to apply cluster aware OS updates

 

Dell EMC Integrated Full Stack Cluster Aware Updating

Integration

Microsoft CAU extension

Dell Technologies

Integration snap-in to main CAU workflow to provide BIOS, firmware and driver updates while performing OS updates

OMIMSWAC v2.0 Standalone extension


Extension

WAC

Dell Technologies

OpenManage WAC extension for Infrastructure Life cycle management, plus cluster monitoring, inventory and troubleshooting

Cluster Creation extension

Microsoft Cluster Creation Extension

1.529.0.nupkg release*

* Min version validated

Extension

WAC

Microsoft

WAC workflow to create Azure Stack HCI Clusters

Integrated Deployment and Update (coming soon)

Integration

Microsoft IDU extension

Dell Technologies

Integration snap-in to main Cluster Creation workflow to provide BIOS, firmware and driver updates during the cluster creation process

Windows Admin Center extensions and integrations

You can install Microsoft Cluster Aware Updating extension within WAC by selecting the “Gear” icon on the top right corner, then under “Gateway”, navigate to “Extensions”. Under “Available extensions”, find the desired extension and select “Install”. For details, see the install guide. Please refer to the extensions product documentation for the latest updates.

 

Microsoft Cluster Aware Updating extension

To get to Microsoft WAC Azure Stack HCI Cluster Aware Updating extension, login to WAC and follow these steps:

  1. Click on the cluster you want to connect to. This takes us to the cluster Dashboard.
  2. On the left pane, under “Tools”, select “Update”.
  3. In the “Updates” window, click on “Check for updates”, which will pop up the “Install updates” window.
  4. Here we are presented with a three-step process where we select, in order:
  • Operating system updates
  • Hardware updates
  • Proceed with the installation

It is important to note that you can select either to run only one operation at a time by skipping the other or run both in one single process and reboot.

You may select, if available, any Operating system update and click “Next: Hardware updates”. 

This takes us to the second step of the sequence - Hardware updates - a key phase for the automated end-to-end cluster aware update process.

This is where the Dell Technologies snap-in integrates with Microsoft’s original workflow, allowing us to seamlessly provide automated BIOS, firmware, and driver updates (and OS updates if also selected) to all the nodes in the cluster with a single reboot. Let’s look at this process in detail in the next section. 

 

Dell EMC Integrated Full Stack Cluster Aware Updating

Once you click “Next: Hardware updates” on the original Microsoft’s Azure Stack HCI Cluster Aware Updating workflow, you are taken to Dell EMC Cluster Aware Updating integration.

If the integration is not installed, there is an option to install it from inside the workflow. 

Click “Get updates”.

Our snap-in for Cluster Aware Updating (CAU) takes us through the following sequence of five steps.

1. Prerequisites (screenshot above)

A validation process occurs, checking that all AX nodes are:

  • Supported in the HCL
  • Same model
  • OpenManage Premium License for MSFT HCI Solutions compliant (included in AX node base solution)
  • Compatible with cluster creation

Click “Next: Update source”.

2. Update source

Here we can select the source for our BIOS, firmware, and driver repository, whether online [Update Catalog for Microsoft HCI Solutions] or offline (edge or disconnected) [Dell EMC Repository Manager Catalog]. Dell Technologies has created and keeps these solution catalogs updated.

Click “Next: Compliance report”.

3. Compliance report

Now we can check how compliant our nodes are and select for BIOS, firmware, and/or driver remediation. All the recommended components are selected by default.

The compliance operation runs in parallel for all nodes, and the report is shown consolidated across nodes.

Click “Next: Summary”.

4. Summary

All selections from all nodes are shown in Summary for review before we click “Next: Download updates”.

5. Download updates

This window provides the statistics regarding the download process (start time, download status).

When all downloads are completed, we can click “Next: Install”, which takes us back again to Step 3 of the main workflow (“Install”), to begin the installation process of OS and hardware updates (if both were selected) on the target nodes.

If any of the updates requires a restart, servers will be rebooted one at a time, moving cluster roles such as VMs between servers to prevent downtime and guaranteeing business continuity.

Once the process is finished for all the nodes, we can go back to “Updates” to check for the latest update status and/or Update history for previous updates. 

It is important to note that the Cluster Aware Updating extension is supported only for Dell EMC Integrated System for Microsoft Azure Stack HCI.

 

OMIMSWAC v2.0 Standalone extension

The standalone extension applies to Windows Server HCI and Azure Stack HCI, and continues to provide monitoring, inventory, troubleshooting, and hardware updates with CAU.

New to OMIMSWAC 2.0 is the option to schedule updates during a programmed maintenance window for greater flexibility and control during the update process.

It is important to note that OMIMSWAC Standalone version provides the Cluster Aware Updating feature for the hardware (BIOS, firmware, drivers) in a single reboot, although this process is not integrated with operating system updates. It provides full lifecycle management just for the hardware, not the OS layer.

Another key takeaway is that OMIMSWAC Standalone version fully supports Dell EMC HCI Solutions from Microsoft Windows Server and even certain qualified previous solutions (Dell EMC Storage Spaces Direct Ready Nodes).

 

Conclusion

Dell Technologies has developed OMIMSWAC to make integrated systems’ lifecycle management a seamless and easy process. It can fully guarantee controlled end-to-end cluster hardware and software update processes during the lifespan of the service.

The Dell EMC OMIMSWAC automated and programmatic approach provides obvious benefits, like mitigating risk caused by human intervention, significantly fewer steps to update clusters, and significantly less focused attention time for IT administrators. In small 4-node cluster deployments, this can mean up to 80% fewer steps and up to 90% less focused attention from an IT operator.

Full details on the benefits of performing these operations automatically through OMIMSWAC versus doing it manually are explained in this white paper.

Thank you for reading this far and stay tuned for more blog updates in this space!


Home > Integrated Products > Microsoft HCI Solutions from Dell Technologies > Blogs

Azure Stack HCI

Microsoft HCI Solutions from Dell Technologies: Designed for extreme resilient performance

Ignacio Borrero Ignacio Borrero

Wed, 16 Jun 2021 13:35:49 -0000

|

Read Time: 0 minutes

Dell EMC Integrated System for Microsoft Azure Stack HCI (Azure Stack HCI) is a fully productized HCI solution based on our flexible AX node family as the foundation.

Before I get into some exciting performance test results, let me set the stage. Azure Stack HCI combines the software-defined compute, storage, and networking features of Microsoft Azure Stack HCI OS, with AX nodes from Dell Technologies to deliver the perfect balance for performant, resilient, and cost-effective software-defined infrastructure.

Figure 1 illustrates our broad portfolio of AX node configurations with a wide range of component options to meet the requirements of nearly any use case – from the smallest remote or branch office to the most demanding database workloads. 

 Figure 1: current platforms supporting our Microsoft HCI Solutions from Dell Technologies

Each chassis, drive, processor, DIMM module, network adapter and their associated BIOS, firmware, and driver versions have been carefully selected and tested by the Dell Technologies Engineering team to optimize the performance and resiliency of Microsoft HCI Solutions from Dell Technologies. Our Integrated Systems are designed for 99.9999% hardware availability*.

* Based on Bellcore component reliability modeling for AX-740xd nodes and S5248S-ON switches a) in 2- to 4-node clusters configured with N + 1 redundancy, and b) in 4- to 16-node clusters configured with N + 2 redundancy, March 2021.

Comprehensive management with Dell EMC OpenManage Integration with Windows Admin Center, rapid time to value with Dell EMC ProDeploy options, and solution-level Dell EMC ProSupport complete this modern portfolio.

You'll notice in that table that we have a new addition -- the AX-7525: a dual-socket, AMD-based platform designed for extreme performance and high scalability.

The AX-7525 features direct-attach NVMe drives with no PCIe switch, which provides full Gen4 PCIe potential to each storage device, resulting in massive IOPS and throughput at minimal latency.

To get an idea of how performant and resilient this platform is, our Dell Technologies experts put a 4-node AX-7525 cluster to the test. Each node had the following configuration:

  • 24 NVMe drives (PCIe Gen 4)
  • Dual-socket AMD EPYC 7742 64-Core Processor (128 cores)
  • 1 TB RAM
  • 1 Mellanox CX6 100 gigabit Ethernet RDMA NIC

The easy headline would be that this setup consistently delivered nearly 6M IOPs at sub 1ms latency. One could think that we doctored these performance tests to achieve these impressive figures with just a 4-node cluster!

The reality is that we sought to establish the ‘hero numbers’ as a baseline – ensuring that our cluster was configured optimally. However, we didn’t stop there. We wanted to find out how this configuration would perform with real-world IO patterns. This blog won’t get into the fine-grained details of the white paper, but we’ll review the test methodology for those different scenarios and explain the performance results.

Figure 2 shows the 4-node cluster and fully converged network topology that we built for the lab:

 Figure 2: Lab setup

We performed two differentiated sets of tests in this environment:

  • Tests with IO profiles aimed at identifying the maximum IOPS and throughput thresholds of the cluster
    • Test 1: Using a healthy 4-node cluster
  • Tests with IO profiles that are more representative of real-life workloads (online transaction processing (OLTP), online analytical processing (OLAP), and mixed workload types)
    • Test 2: Using a healthy 4-node cluster
    • Test 3: Using a degraded 4-node cluster, with a single node failure
    • Test 4: Using a degraded 4-node cluster, with a two-node failure

To generate real-life workloads, we used VMFleet, which leverages PowerShell scripts to create Hyper-V virtual machines executing DISKSPD to produce the desired IO profiles.

We chose the three-way mirror resiliency type for the volumes we created with VMFleet because of its superior performance versus erasure coding options in Storage Spaces Direct.

Now that we have a clearer idea of the lab setup and the testing methodology, let’s move on to the results for the four tests.

Test 1: IO profile to push the limits on a healthy 4-node cluster with 64 VMs per node

Here are the details of the workload profile and the performance we obtained:

IO profile

Block size

Thread count

Outstanding IO

Write %

IO pattern

Total IOs

Latency

B4-T2-O32-W0-PR

4k

2

32

0%

100% random read

5,727,985

1.3 ms

(read)

B4-T2-O16-W100-PR

4k

2

16

100%

100% random write

700,256

9 ms*

(write)

 

 

 

 

 

 

Throughput

B512-T1-O8-W0-PSI

512k

1

8

0%

100% sequential read

105 GB/s

B512-T1-O1-W100-PSI

512k

1

1

100%

100% sequential write

8 GB/s

* The reason for this slightly higher latency is because we are pushing too many Outstanding IOs and we already plateaued on performance. We noticed that even with 32 VMs, we hit the same IOs, because all we are doing from that point on is adding more load that a) isn’t driving any additional IOs and b) just adds to the latency.

This test sets the bar for the limits and maximum performance we can obtain from this 4-node cluster: almost 6 million read IOs, 700k write IOs, and a bandwidth of 105 GB/s for reads, and 8 GB/s for writes. 

Test 2: real-life workload IO profile on a healthy 4-node cluster with 32 VMs per node

The IO profiles for this test encompass a broad range of real-life scenarios:

  • OLTP oriented: we tested for a wide spectrum of block sizes, ranging from 4k to 32k, and write IO ratios, varying from 20% to 50%.
  • OLAP oriented: the most common OLAP IO profile is large block size and sequential access. Other workloads that follow a similar pattern are file backups and video streaming. We tested 64k to 512k block sizes and 20% to 50% write IO ratios.

The following figure shows the details and results we obtained for all the different tested IO patterns:

    Figure 3: Test 2 results 

Super impressive results and important to notice (on the left) the 1.6 million IOPS at 1.2 millisecond average latency for the typical OLTP IO profile of 8 KB block size and 30% random write. Even at 32k block size and 50% write IO ratio, we measured 400,000 IOs at under 7 milliseconds latency.

Also, very remarkable is the extreme throughput we witnessed during all the tests, with special emphasis on the incredible 29.65 GB/s with an IO profile of 512k block size and 20% write ratio.

Tests 3 and 4: push the limits and real-life workload IO profiles on a degraded 4-node cluster

To simulate a one-node failure (Test 3), we shut down node 4, which caused node 2 to take additional ownership of the 32 restarted VMs from node 4, for a total of 64 VMs on node 2.

Similarly, to simulate a two-node failure (Test 4), we shut down nodes 3 and 4, leading to a VM reallocation process from node 3 to node 1, and from node 4 to node 2. Nodes 1 and 2 ended up with 64 VMs each.

The cluster environment continued to produce impressive results even in this degraded state. The table below compares the testing scenarios that used IO profiles aimed at identifying the maximum thresholds.

IO profile

Healthy cluster

One node failure

Two node failure

Total IOs

Latency

Total IOs

Latency

Total IOs

Latency

B4-T2-O32-W0-PR

4,856,796

0.38 ms

(read)

4,390,717

0.38 ms

(read)

3,842,997

0.26 ms

(read)

B4-T2-O16-W100-PR

753,886

3.2 ms

(write)

482,715

5.7 ms

(write)

330,176

11.4 ms

(write)

 

Throughput

Throughput

Throughput

B512-T1-O8-W0-PSI

91 GB/s

113 GB/s

77 GB/s

B512-T1-O1-W100-PSI

8 GB/s

6 GB/s

10 GB/s

Figure 4 illustrates the test results for real-life workload scenarios for the healthy cluster and for the one-node and two-node degraded states.

  Figure 4: Test 3 and 4 results

Once more, we continued to see outstanding performance results from an IO, latency, and throughput perspective, even with one or two nodes failing.

One important consideration we observed is that for the 4k and 8k block sizes, IOs decrease and latency increases as one would expect, whereas for the 32k and higher block sizes we realized that:

  • Latency was less variable across the failure scenarios because write IOs did not need to be committed across as many nodes in the cluster.
  • After the two-node failure, there was actually an increase of IOs (20-30%) and throughput (52% average)!

There are two reasons for this:

  1. The 3-way mirrored volumes became 2-way mirrored volumes on the two surviving nodes. This effect led to 33% fewer backend drive write IOs. The overall drive write latency decreased, driving higher read and write IOs. This only applied when CPU was not the bottleneck.
  2. Each of the remaining nodes doubled the number of running VMs (from 32 to 64), which directly translated into greater potential for more IOs.

Conclusion

We are happy to share with you these figures about the extreme-resilient performance our integrated systems deliver, during normal operations or in the event of failures.

Dell EMC Integrated System for Microsoft Azure Stack HCI, especially with the AX-7525 platform, is an outstanding solution for customers struggling to support their organization’s increasingly heavy demand for resource intensive workloads and to maintain or improve their corresponding service level agreements (SLAs).