Preview of Intelligent Automation in Dell APEX Cloud Platform for Microsoft Azure
Wed, 24 Apr 2024 15:35:21 -0000
|Read Time: 0 minutes
UPDATE 11/7/2023: This blog and the embedded YouTube videos were published after Dell APEX Cloud Platform for Microsoft Azure was first announced at Dell Technologies World 2023. It contains early preview content. Please proceed to the following links to see the most up-to-date collateral and YouTube demo videos created after the platform was generally available Sept. 2023. https://www.youtube.com/playlist?list=PL2nlzNk2-VMEkNM7E8m0ia_lLHWlOuT5h |
It was another exhilarating Dell Technologies World (DTW) back in May. It’s always fun connecting with colleagues, customers, and partners in Las Vegas. As always, Vegas managed to surprise me with something I’d never seen before. I finally witnessed the incredible iLuminate team up close and personal at the APEX After Dark party. I tried to describe the phenomenon to a friend who hasn’t experienced one of their performances, but words cannot adequately convey this mesmerizing spectacle of sight and sound! In the end, only one of my photos from the event and a link to one of their recorded shows could make it real for them.
Similarly, words alone can’t do justice to the game changing potential of the new APEX Cloud Platform announced at DTW. That’s why I created a demo video giving customers an early preview[1] of the new management and orchestration capabilities coming to our APEX Cloud Platform Foundation Software. This software integrates intelligent automation into the familiar management tools of each supported cloud ecosystem – Microsoft Azure, Red Hat OpenShift, and VMware vSphere.
In this blog, I want to showcase APEX Cloud Platform for Microsoft Azure and the features and functionality we integrate into Microsoft Windows Admin Center. My colleague and friend, Kenny Lowe, wrote a brilliant analysis of our new solution in his recent blog post, Delving Into the APEX Cloud Platform for Microsoft Azure. He included some screen shots from my demo video, which hasn’t been shared publicly until now. I highly recommend reading his enlightening article, which provides invaluable context before viewing the demos.
Please be aware that the clips below are sections of a lengthier video that shares the story of a fictional retail company named WhyGoBuy. They used APEX Cloud Platform Foundation Software to accelerate their time to value and improve operational efficiency. Because this video was over 15 minutes long, I divided it into bite-sized chunks and included a brief introduction to each administrative task. You can view the full video HERE.
Seeing is believing
Without further ado, let’s dive into the technology!
At initial release of APEX Cloud Platform for Microsoft Azure, Dell Technologies is offering a white-glove deployment experience through Dell ProDeploy Services. Our expert technicians will walk you through your first deployments to help you get comfortable with the process. Soon after announcing general availability, we will empower you to install the platform yourself using the APEX Cloud Platform Foundation Software deployment automation. In this first video, our administrators at WhyGoBuy followed the step-by-step user input configuration method and provided the settings in each step of the deployment wizard.
The next video presents a common Day 2 operations scenario. Some of WhyGoBuy’s Storage Spaces Direct volumes were approaching maximum capacity, and one volume required immediate attention. Luckily, APEX Cloud Platform for Microsoft Azure offered a consistent hybrid management experience. Administrators were promptly made aware of the issue through Azure Monitor, which provided observability for their entire fleet of platforms across data center and edge locations. Then, they navigated to the Windows Admin Center extension for further investigation and remediation of the issue.
Lifecycle management is critical to ensuring the optimal security, performance, and reliability of any infrastructure. With APEX Cloud Platform Foundation Software, Dell helps our customers remain in a continuously validated state – updating the platform from one known good state to the next, inclusive of hardware, operating system, and systems management software. A few months passed since WhyGoBuy deployed their first platform, and the time came to apply a quarterly baseline bundle using the Windows Admin Center extension. The following video captures their experience.
WhyGoBuy was committed to maintaining a robust security posture. They used APEX Cloud Platform Foundation Software intrinsic infrastructure security management features to help them accomplish this. The next video showcases two of these features:
- Infrastructure Lock – Protects against unauthorized or malicious changes to configuration settings by enabling the System Lockdown feature in Dell iDRAC. This also prevents updates to BIOS, firmware, and drivers to guard against cybersecurity attacks.
- Secured-core server – Proactively defends against many of the paths attackers might use to exploit a system by establishing a hardware root-of-trust, protecting firmware, and introducing virtualization-based security.
In this final video, WhyGoBuy set up connectivity to Dell ProSupport to benefit from log collection, phone home, automated case creation, and remote support. They also wanted to send telemetry data to Dell CloudIQ cloud-based software for multi-cluster monitoring. CloudIQ provided proactive monitoring, machine learning, and predictive analytics so they could take quick action and simplify operations of all their on-premises APEX Cloud Platforms.
The future’s so bright
We are excited to bring Dell APEX Cloud Platform for Microsoft Azure to market later this year. I’ve compiled the following list of available resources for further learning.
- Dell APEX Cloud Platform for Microsoft Azure Playlist
- Solution Brief – Deploy mission-critical database and virtual desktop workloads in a Microsoft hybrid cloud environment
- Thomas Maurer Speaks with Kenny Lowe on APEX Cloud Platform for Microsoft Azure
- Building the Future of Azure Stack HCI
- Delving Into the APEX Cloud Platform for Microsoft Azure
After we launch this solution, you’ll be able to find white papers, videos, blogs, and more at the APEX tile at our Info Hub site.
And as always, please reach out to your Dell account team if you would like to have more in-depth discussions about the APEX portfolio. If you don’t currently have a Dell contact, we’re here to help on our corporate website.
Author: Michael Lamia, Engineering Technologist at Dell Technologies
Follow me on Twitter: @Evolving_Techie
LinkedIn: https://www.linkedin.com/in/michaellamia/
Email: michael.lamia@dell.com
[1] Dell APEX Cloud Platform for Microsoft Azure will be generally available later in 2023. Some of the features and functionality depicted in these videos may behave differently at initial release or may not be available until later releases. Dell makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by Dell as an accommodation to the recipient solely for the purposes of discussion and without intending to be bound thereby.
Related Blog Posts
Dell Technologies First to Deliver Azure Stack HCI 23H2
Wed, 24 Apr 2024 15:47:16 -0000
|Read Time: 0 minutes
There is nothing quite like being first – first to watch the newly released docuseries on your favorite streaming platform, first to try the highly anticipated new restaurant, first to see the popular band that’s in town, and so on. These types of events tend to get everybody snapping selfies and posting memes on their social media accounts. As a bona fide nerd, I get that same feeling of exhilaration when cool new tech hits the market – especially when it’s from the Dell Technologies and Microsoft team. I love getting the word out about groundbreaking features that produce meaningful business outcomes for our customers.
In September 2023, we officially released our Dell APEX Cloud Platform for Microsoft Azure, the first offer in the market for Premier Solutions for Microsoft Azure Stack HCI. As the first partner to qualify a solution for this elite category, Dell Technologies is ready for greenfield deployments with Azure Stack HCI version 23H2 staged on our factory-delivered MC nodes beginning today. Dell Services is here to provide you with a white glove initial implementation experience.
In this blog, I want to share my enthusiasm about this 23H2 release and help the community understand why it’s such a big deal.
What’s all the fuss about 23H2?
Microsoft just announced the general availability of Azure Stack HCI version 23H2 last month. Pundits agree that this may be their most ambitious Azure Stack HCI release effort to date. They have dramatically simplified fleet management at-scale of infrastructure distributed across edge locations using Azure Resource Manager (ARM) and key Azure management services. On-premises resources like virtualized desktops, server VMs, and Azure Kubernetes Service (AKS) workload clusters are automatically Azure Arc-enabled. This means that these resources can benefit from Azure’s advanced configuration, monitoring, and security services immediately after the deployment of 23H2.
Topping the list of new features is cloud-based deployment. You can use the Azure portal to deploy Azure Stack HCI from the cloud, including cluster, storage, and networking configuration. You can also leverage ARM templates with custom parameter values for each unique cluster to drive reuse and repeatability. Dell Technologies plans on going beyond the cluster creation aspects of the deployment as we integrate with this new capability in our next release of the APEX Cloud Platform for Azure.
As depicted in the following early preview screenshots, we will continue to use our APEX Cloud Platform Foundation Software to provide a fully automated, end-to-end Day 1 deployment and cluster creation experience. This includes bare-metal OS provisioning and onboarding to Azure Arc prior to cluster creation in the Azure portal. We will also be able to seamlessly re-deploy existing clusters using our automation workflow if the need arises.
Figure 1. Early preview of Day 1 deployment and cluster creation workflow
Figure 2. Azure Stack HCI registration step in the early preview
The outcome is the same whether you leverage Dell Technologies’ existing deployment experience or wait for our new cloud-based experience coming in the next release. Both accelerate your time to value – taking you from factory-delivered MC nodes to fully deployed Azure hybrid cloud – using powerful API-driven software capabilities. Dell ProDeploy Services offers a white glove deployment experience that uses our APEX Cloud Platform Foundation Software Day 1 API to rapidly bring up any number of clusters in a predictable and repeatable manner.
During initial deployment of 23H2, Azure Arc Resource Bridge and AKS enabled by Azure Arc are automatically installed on your Azure Stack HCI cluster. This is an especially compelling enhancement, as installing Arc Resource Bridge and AKS on previous Azure Stack HCI versions has been notoriously challenging. Immediately after initial deployment, you can provision Arc-enabled VMs and Arc-enabled Kubernetes workload clusters across any number of on-premises Azure Stack HCI clusters centrally from ARM. You can use a guided, wizard-driven workflow in the Azure portal or ARM templates for Infrastructure as Code (IaC) automation.
Figure 3. Arc Resource Bridge running on three Azure Stack HCI clusters
Figure 4. Azure Arc VM provisioning
Azure Stack HCI version 23H2 also provides management of updates across all your Azure Stack HCI clusters using Azure Update Manager, as shown in the following figure. These updates are applied with the cluster-aware updating feature to prevent any disruption to running workloads. In the context of APEX Cloud Platform for Azure, you will be able to apply monthly quality and security updates using Azure Update Manager. However, baseline updates that include Dell’s BIOS, firmware, and driver packages will still require the full stack lifecycle management automation workflow in the APEX Cloud Platform extension in Windows Admin Center.
Figure 5. APEX Cloud Platform in Azure Update Manager
Azure Virtual Desktop (AVD) may be the most anxiously anticipated Azure service to come to on-premises Azure Stack HCI clusters to date. AVD is now generally available on 23H2 and offers host pool provisioning directly from the Azure portal. After a 23H2 deployment, you can begin creating Windows 10 and Windows 11 single- and multi-session host VMs across all your Azure Stack HCI clusters. These client VMs can also leverage updated Azure Marketplace images with Microsoft 365 applications preinstalled and GPU acceleration for your most demanding client applications.
There is also a bevy of new capabilities and improvements that addresses the core stack – hypervisor, storage, and VMs:
- ReFS deduplication and compression is designed for active workloads like AVD running on Azure Stack HCI and can result in significant storage capacity savings.
- Trusted launch comes to Azure Arc-enabled VMs to help prevent firmware and boot loader attacks.
- Significant investments have been made to improve the Azure Stack HCI security posture in 23H2. This new version has a tailored security baseline with over 300 security settings configured that remain compliant using a drift control mechanism. Check out the newly published Azure Stack HCI Security Book, which provides a complete readout of all the robust security features that come out-of-the-box with 23H2.
- Microsoft Defender for Cloud for Azure Stack HCI (preview) provides coverage for Azure Stack HCI infrastructure as part of the Cloud Security Posture Management plan.
- Azure Migrate to Azure Stack HCI (preview) - Use Azure Migrate to move VMs from an existing Hyper-V environment to Azure Stack HCI version 23H2. This feature uses Azure Migrate as the control plane, but the data transfer stays entirely on-premises. Support for VMware vCenter source environments is coming soon.
Dell Technologies is first out of the gate
Premier Solutions for Microsoft Azure Stack HCI is reserved for top partners with the deepest levels of integration and engineering collaboration with Microsoft. Dell Technologies completed our testing and validation of 23H2 ahead of general availability on the Dell APEX Cloud Platform for Microsoft Azure. We are pleased to get the powerful capabilities of 23H2 into your hands immediately, so you can spend less time on operations and more time on the innovation that helps your organization secure a competitive advantage in the market. Dell ProDeploy Services is ready to provide a white glove 23H2 deployment experience on all new MC nodes delivered from the factory.
Figure 6. 23H2 release timeline
With all the new cloud-based capabilities Microsoft has introduced for Day 1 – N operations with 23H2, we want to be clear about how IT administrators will perform various tasks specifically with Dell APEX Cloud Platform for Microsoft Azure. Some administrative scenarios can be accomplished at-scale with Azure Resource Manager, and others will require the granular, cluster-by-cluster level access provided by the Dell APEX Cloud Platform Foundation Software. This software integrates automation workflows into Windows Admin Center via a Dell extension.
Figure 7. APEX Cloud Platform’s consistent management experience
The following table provides a detailed comparison of management capabilities.
Table 1. Comparison of fleet and cluster-level management capabilities
Task | Fleet Management with ARM | Cluster-Level Management with APEX Cloud Platform Foundation Software |
Day 1 deployment | Cloud-based deployment from Azure will be integrated with our solution later in 2024. | Day 1 deployment and cluster creation automation currently performed by Dell ProDeploy Services. |
Monitoring | Event Monitoring for Dell APEX Cloud Platform for Microsoft Azure feature in Azure Monitor Insights for Azure Stack HCI. This includes a Dell workbook for visualizing real-time hardware and software alerts. | Physical View of platform component inventory, monitoring, and alerting on a per cluster basis. |
Lifecycle management | Azure Update Manager available for Azure Stack HCI monthly quality and security updates on APEX Cloud Platform for Azure. Baseline updates, including hardware, require APEX Cloud Platform Foundation Software. | Full stack lifecycle management keeps an individual cluster in a continuously validated state, progressing from one known good state to the next inclusive of OS, hardware, and systems management software. |
Security management | Fully integrated with Microsoft Defender for Cloud (preview). | Toggle intrinsic infrastructure security management features, including Infrastructure Lock and secured core server. |
Scale out and scale up | Not currently in scope. | Add Node and Add Disk features fully automate node and cluster expansion. |
Node management | Not currently in scope. | Workflow available to repair and replace cluster nodes. |
Serviceability and support | Not currently in scope. | Enables phone home, auto case creation, and remote connectivity to create a consolidated management, operations, and support experience. |
Full stack lifecycle management
In the future, you will be able to leverage our full stack lifecycle management (LCM) experience in the Dell APEX Cloud Platform Foundation Software for in-place OS upgrades. Our software periodically queries Dell Technologies and Microsoft update sites, checking for new bundles. You never have to leave the Updates tab of the APEX Cloud Platform extension in Windows Admin Center, as shown in the following figure, to review or apply updates. The software identifies any patch dependencies that may exist before revealing these bundles in the Updates tab. Guardrails are established to ensure you apply all updates in the proper order. Dell Technologies and Microsoft collaboratively test and validate this process for every release using APEX Cloud Platform hardware in our respective engineering labs.
Figure 8. Update bundles in the APEX Cloud Platform extension in Windows Admin Center
The following table summarizes the contents of each update bundle type.
Table 2. Contents of update bundles
Update Bundle | Contents |
Azure Stack HCI Solution (baseline) | Azure Arc infrastructure, Lifecycle Manager, and Operating System |
APEX Cloud Platform Foundation Software | Cloud Platform Manager VM and all microservices-based systems management automation and orchestration software |
APEX Cloud Platform Hardware | BIOS, iDRAC, firmware and driver updates |
For a more in-depth discussion about this full stack lifecycle management feature, please review this recent blog post, The Evolution of Azure Stack HCI Lifecycle Management.
23H2 is only the beginning
Support of Azure Stack HCI version 23H2 is only one of the many enhancements we’ve introduced in our latest release of Dell APEX Cloud Platform for Microsoft Azure. We’ve also added new automation workflows to our extension in Windows Admin Center, which include many pre-checks and validations to ensure consistently successful operations with no disruption to running workloads:
- Add and Replace Disks feature: We’ve provided an automated workflow to increase Storage Spaces Direct capacity and performance or restore capacity and performance to a desired state.
- Node Repair and Replace feature: We’ve also made it easy to restore a cluster to full health after a node has experienced a failure that requires the server to be reimaged.
Dell Technologies is also developing integrations into Azure management and governance services. This latest platform release introduces the first of these integrations. You can now visualize fault and informational event data generated by the MC node hardware and the Cloud Platform Manager VM using an Azure Monitor Insights for Azure Stack HCI workbook. Simply enable the Event Monitoring for Dell APEX Cloud Platform for Microsoft Azure feature for Insights to get started.
Resources
We have tons of great content to help you deep-dive into Dell APEX Cloud Platform for Microsoft Azure powered by Dell APEX Cloud Platform Foundation Software:
- What's New with the Dell APEX Cloud Platform for Microsoft Azure March 2024 Release
- Monitoring the Dell APEX Cloud Platform for Microsoft Azure with Azure Insights
- YouTube playlist with educational and demo videos
- NEW YouTube playlist for March 2024 release
- Info Hub white papers, videos, and interactive demos
- APEX Cloud Platform for Azure main product page
- Microsoft’s official announcement of 23H2 general availability
- General availability of Azure Virtual Desktop
- Azure Stack HCI Security Book
- Check out What’s new for Azure edge infrastructure in 2023 for an eye-opening case study of a fictional grocery store chain that uses Microsoft Azure to deploy and manage infrastructure at the edge using Azure Arc, Azure Kubernetes Service, and Azure Stack HCI. This is a highly enlightening, end-to-end view of how all the technologies within the Azure hybrid cloud ecosystem can harmoniously work together to solve a real-world business problem in the retail vertical.
And as always, please reach out to your Dell Technologies account team if you would like to have more in-depth discussions about the Dell APEX Cloud Platforms family. If you don’t currently have a Dell Technologies contact, we’re here to help on our corporate website.
Author: Michael Lamia, Engineering Technologist at Dell Technologies
Follow me on X: @Evolving_Techie
LinkedIn: https://www.linkedin.com/in/michaellamia/
Accelerating and Optimizing AI Operations with Infrastructure as Code
Fri, 03 May 2024 12:00:00 -0000
|Read Time: 0 minutes
Accelerating and Optimizing AI Operations with Infrastructure as Code
Achieving maturity in a DevOps organization requires overcoming various barriers and following specific steps. The level of maturity attained depends on the short-term and long-term goals set for the infrastructure. In the short term, IT teams must focus on upskilling their resources and integrating tools for containerization and automation throughout the operating lifecycles, from Day 0 to Day 2. Any progress made in scaling up containerized environments and automating processes significantly enhances the long-term economic viability and sustainability of the company. Furthermore, in the long term, it involves deploying these solutions across multicloud, multisite landscapes and effectively balancing workloads.
The optimization of your AI applications, and by extension, other high-value workloads, hinges upon the velocity, scalability, and efficacy of your infrastructure, as well as the maturity of your DevOps processes. Prior to the explosion that is AI, recent survey results indicated the state of automation for infrastructure operations’ workflows was overall less than 50%; partner that with twofold the increase of application counts and organizations may struggle against the waves of change[1].
From compute capabilities to storage density and speed, spanning across unstructured, block, and file formats, there exists fundamental elements of automation ripe for swift integration to establish a robust foundation. By seamlessly layering pre-built integration tools and a complementary portfolio of products at each stage, the journey towards ramping up AI can be alleviated.
There are important considerations regarding the various hardware infrastructure components for a generative AI system, including high performance computing, highspeed networking, and scalable, high-capacity, and low-latency storage to name a few. The infrastructure requirements for AI/ML workloads are dynamic and dependent on several factors, including the nature of the task, the size of the dataset, the complexity of the model, and the desired performance levels. There is no one-size-fits-all solution when it comes to Gen AI infrastructure, as different tasks and projects may demand unique configurations. Central to the success of generative AI initiatives is the adoption of Infrastructure-as-Code (IaC) principles which facilitate the automation and orchestration of underlying infrastructure components. By leveraging IaC tools like RedHat Ansible and HashiCorp Terraform, organizations can streamline the deployment and management of hardware resources, ensuring seamless integration with Gen AI workloads.
At the base of this foundation is Red Hat Ansible modules for Dell, and they speed up the provisioning of servers and storage for quick AI application workload mobility.
Creating playbooks with Ansible to automate server configurations, provisioning, deployments, and updates are seamless while data is being collected. Due to the declarative and mutable nature of Ansible, the playbooks can be changed in real-time without interruption to processes or end users.
Compute
On the compute front, a lot goes into configuring servers for the different AI and ML operations:
GPU Drivers and CUDA toolkit Installation: Install appropriate GPU drivers for the server's GPU hardware. For example, installing CUDA Toolkit and drivers to enable GPU acceleration for deep learning frameworks such as TensorFlow and PyTorch.
Deep Learning Framework Installation: Install popular deep learning frameworks such as TensorFlow or PyTorch, along with their associated dependencies.
Containerization: Consider using containerization technologies such as Docker or Kubernetes to encapsulate AI workloads and their dependencies into portable and isolated containers. Containerization facilitates reproducibility, scalability, and resource isolation, making it easier to deploy and manage GenAI workloads across different environments.
Performance Optimization: Optimize server configurations, kernel parameters, and system settings to maximize performance and resource utilization for GenAI workloads. Tune CPU and GPU settings, memory allocation, disk I/O, and network configurations based on workload characteristics and hardware capabilities.
Monitoring and Management: Implement monitoring and management tools to track server performance metrics, resource utilization, and workload behavior in real-time.
Security Hardening: Ensure server security by applying security best practices, installing security patches and updates, configuring firewalls, and implementing access controls. Protect sensitive data and AI models from unauthorized access, tampering, or exploitation by following security guidelines and compliance standards.
Dell Openmanage Ansible collection offers modules and roles both at the iDRAC/Redfish interface level and at the OpenManage Enterprise level for server configurations such as PowerEdge XE 9860 designed to collect, develop, train, and deploy large machine learning models (LLMs).
The following is a summary of the OME and iDRAC modules and roles as part of the openmanage collection:
Storage
When it comes to AI and storage, during the data processing and training aspects, customers rely on scalable and simple access to file systems which increased data is trained on. With AI unstructured data storage is necessary for the bounty of rich context and nuance that will be accessed during the building phase. It also highly depends on user access to be variable, and Ansible automation playbooks can help change and adapt quickly.
Dell PowerScale is the world’s leading scale-out NAS platform, and it recently became the first ethernet storage certified on NVIDIA SuperPod. When it comes to Ansible automation, PowerScale comes with an extensive set of modules that covers a wide range of platform operations:
Software defined storage
Hyper converged platforms like PowerFlex offer highly scalable and configurable compute and storage clusters. In addition to the common day-2 tasks like storage provisioning, data protection and user management, the Ansible collection for PowerFlex can be used for cluster deployment and expansion. Here is a summary of what Ansible collections for PowerFlex offers:
Conclusion
The one thing agreed upon is that Generative AI tools need the scale, repeatability, and reliability beyond anything created from the software and data center combined. This is precisely what building infrastructure-as-code practices into a multisite operation are designated to do. From PowerEdge to PowerScale, the level of capacity and performance is unmatched. This allows AI operations and Generative AI to absorb, grow and provide the intelligence that organizations need to be competitive and innovative.
[1] Infrastructure-as-code and DevOps Automation: The Keys to Unlocking Innovation and Resilience, September 2023
Other resources:
GenAI Acceleration Depends on Infrastructure as Code
Authors: Jennifer Aspesi, Parasar Kodati