![Banner image](https://cdn-prod.scdn6.secure.raxcdn.com/static/media/f8e5f7a9-f8d8-4d50-835c-c3930f532de3.jpeg?_cb=1708963727.806692)
Securing the Digital Frontier: Inside Dell and AMD’s Zero Trust Approach
Read the ReportFri, 23 Feb 2024 17:01:00 -0000
|Read Time: 0 minutes
Introduction
The devices that connect us make incredible things possible, but these connections also provide additional points of vulnerability that malicious actors can exploit. In fact, some estimates predict that cyber attacks will cost organizations as much as $10.5 trillion USD by 2025.1 And according to one estimate, recovering from the damage of a cyber attack takes about 277 days.2
While newer tech such as artificial intelligence (AI) offers improvements in productivity and business operations for many organizations, it also leaves data vulnerable to more sophisticated cyber attacks. With each advancement in technology, tech industry leaders must shift strategies to effectively counter cyber criminals as they find new ways to attain and exploit data. To thwart these threats and keep data safe, every data center component—from servers and storage to networks, software, and firmware–needs built-in protection. Protection starts with supply chain tampering mitigations on the manufacturing floor and continues through the transportation process and customer use. And attacks no longer stop at the data center walls. Organizations with a presence in the cloud face additional challenges in keeping data secure.
Together, Dell and AMD provide a purpose-built cyber resilient architecture that helps organizations adopt a Zero Trust strategy, embracing the idea that system components are vulnerable at each link in the chain and offering protection at every point. A Zero Trust strategy uses strong, identity-based policies for every IT asset along with “least privilege” principles for access. Dell Cyber Resilient Architecture includes in-depth features centered around boot integrity and data protection as well as security features in Integrated Dell Remote Access Controller (iDRAC). Dell PowerEdge servers are anchored with a silicon-based Root of Trust (RoT) that establishes a chain of trust for cryptographic verification of hardware and software components on the server. AMD Infinity Guard provides an additional layer of security that decreases the potential of attack during software boot and execution. AMD Infinity Guard encompasses several additional security features, including Platform Secure Boot and Platform Secure Processor, that ensure PowerEdge servers are protected at each stage in their lifecycle.
Dell Cyber Resilient Architecture
Dell Cyber Resilient Architecture utilizes PowerEdge security features that work in concert to provide both resiliency and enable a zero trust strategy. Security features must protect against potential threats, detect suspicious activity, and recover quickly in case of a breach. At the same time, they must also maintain a “verify before trust” posture for a zero trust approach of least privilege, where users and devices are given access only to what they need to perform their tasks. By working together, these PowerEdge security controls provide a comprehensive security solution that ensures resiliency while enforcing a zero trust posture. For more details on the full Dell Cyber Resilient Architecture features and services, see the whitepaper.
Ensuring Boot Integrity
The pre-boot environment it often overlooked and, if safeguards aren’t taken, could be open to attack. If a malicious actor compromises the BIOS, firmware, or a driver during boot, they could potentially obtain access to the entire system. Without the proper controls in place, they could successfully infiltrate the system at any point and reach their desired target: your data.
To mitigate vulnerabilities, the server vendor must protect the BIOS, but also verify and validate specific server components and firmware such as memory and processors. Server hardware manufacturers must ensure that their components integrate fully with the server architecture for security and validation checks to work seamlessly. Every Dell PowerEdge server offers multiple layers of security to protect the boot cycle: a silicon-based RoT, UEFI Secure Boot, and iDRAC security features, including Firmware Rollback and Rapid Operating System Recovery.
On top of these Dell PowerEdge server-based security layers, AMD processors feature Platform Secure Boot (PSB) and Platform Security Processor (PSP) to protect data in use. Combined, Dell and AMD cover every aspect of the boot cycle to ensure a secure foundation for your data and workloads.
Dell iDRAC and Root of Trust
The RoT concept assumes that if a system verifies a foundation or baseline as safe, then all subsequent validations and security checks are anchored in a continuous chain of trust. Imagine a house: If the foundation is unstable and beginning to crumble, the integrity of the wall supports matters little. Similarly, if your server’s BIOS is compromised, protecting the server OS may be in vain.
The PowerEdge server chain of trust provides a seamless cryptographic verification across all server components from foundation to data. This ensures that the components of the system software stack (hypervisor, OS, applications) are aware that they can trust the underlying server when the server is operational. This layer establishes the foundation of a chain of trust within a server and creates a trusted and secure server platform. Dell servers use a unique silicon-based RoT burned into each server for cryptographic verification that ensures secure booting on every cold boot or A/C cycle. Starting with version 4.10.10.10, iDRAC provides an RoT mechanism to verify the BIOS image at boot and will not allow the server to boot until it verifies the BIOS image. For PowerEdge servers with AMD processors, integrated Dell Remote Access Controller (iDRAC) leverages AMD PSB technology to verify the BIOS code before the OS loads. AMD PSB scrutinizes the BIOS integrity, interfacing with the primary BIOS ROM and AMD fusion controller hub (FCH) for thorough RoT processing. This meticulous validation extends up to the OS bootloader, guaranteeing a continuous chain of trust.
Figure 1: Silicon-based Root of Trust Domains in PowerEdge Servers with iDRAC9.
Should BIOS validation fail, iDRAC immediately shuts down the server and notifies the user, preventing the booting of unauthorized firmware. iDRAC also includes a backup and recovery system for BIOS and iDRAC firmware, which reinforces server reliability and safeguards server operations against potential firmware corruption. To provide additional protection, iDRAC also offers a live BIOS scan that users can run on demand or schedule to run regularly. This scan requires the iDRAC Datacenter license and allows users to catch potential issues before they reboot, allowing proactive mitigation.3
UEFI Secure Boot
Dell PowerEdge servers use the industry-standard UEFI Secure Boot to validate operating system-specific bootloaders, ensuring the integrity of the OS kernel and other critical components. UEFI acts as a shield from malware and ransomware in pre-boot environments. To ensure interoperability, both server and component manufacturers need to collaborate to ensure that the UEFI-enabled BIOS recognizes driver and firmware signatures for components. By validating cryptographic signatures of UEFI drivers and other pre-OS code, UEFI Secure Boot endeavors to ensure that any code loaded during boot is free of malicious content.
To heighten security customization, administrators can configure custom OS bootloader signing certificates for UEFI Secure Boot. (To learn more about UEFI Secure Boot Customization options, visit https://infohub.delltechnologies.com/section- assets/direct-from-development-uefi-boot-enhanced-security-to-combat-threats.) This restricts execution to trusted secure OS bootloaders, which uphold the secure boot chain by authenticating the OS kernel and filesystem. This feature provides additional flexibility, particularly for Linux administrators who prefer to sign their own OS bootloaders rather than depend on third-party default UEFI Certificate Authorities. Administrators can upload custom certificates through the iDRAC API, enhancing authentication of their specific OS bootloaders. Uniquely, Dell PowerEdge servers support complete customization of Secure Boot, including the option to remove all standard certificates from Microsoft, VMware, or UEFI CA.4
CPLD Validation
Every Dell PowerEdge server validates the Complex Programmable Logic Device (CPLD) on every A/C boot. CPLD, a versatile programmable logic device,5 comprises multiple simple PLDs connected by a programmable switching matrix. Its firmware, typically stored in EEPROM, flash memory, or SRAM, enables modifications to system board functions beyond BIOS capabilities, including the implementation of specific logic for system board device interactions. CPLD Validation ensures that system board modifications will not harm your servers or your data.
iDRAC Hardware Security
Extending the chain of trust to additional hardware components, iDRAC uses the Security Protocol and Data Model (SPDM), which standardizes how servers gather information about their components. Each component’s identity, firmware, and configuration information is encrypted. iDRAC hardware security uses authenticated key exchanges to secure lines of communication between components and iDRAC. With SPDM, iDRAC can authenticate the validity of components such as PowerEdge RAID Controllers (PERC)12 and network interface cards (NICs), which not only enhances server security by authenticating components’ Device Identity certificates but also alerts users to any authentication failures.
AMD Platform Secure Boot
AMD processors feature AMD Platform Secure Boot (PSB) to help counteract another growing concern in today’s digital landscape: firmware-level threats. PSB leverages the AMD silicon RoT and verifies the boot process from the BIOS code to the OS Bootloader through UEFI secure boot.6 Dell uses AMD PSB-enabled motherboards to permit only their cryptographically signed BIOS code to run. Additionally, Dell binds each AMD processor to a specific motherboard with one- time-programmable fuses that tie the processor to the Dell firmware code signing keys.7 To protect against attacks aimed at embedding malware into firmware, PSB boots authorize only firmware authenticated by AMD Secure Processor.8
By cryptographically verifying the software stack, AMD Platform Secure Boot adds a substantial layer of defense against unauthorized intrusion across various platforms, particularly in virtualized environments or in the cloud.
AMD Platform Secure Processor
Together with PSB, AMD Platform Secure Processor (PSP) fortifies the Dell PowerEdge server boot process. When a CPU first powers on in the Dell factory, the AMD Platform Secure Processor embeds a unique Dell ID permanently into the CPU. This ID effectively ties the CPU to the PowerEdge server, creating a secure bond.9
This integration means that PSP will prevent a PowerEdge server from booting if it detects a CPU from a different server. However, CPU portability is still possible in the event of a hardware failure. The AMD processor is locked to the vendor’s signing key rather than the motherboard, offering a balance between security and component mobility.10Figure 2: States of Data
Protecting Your Data
While attackers gain access to systems, the end goal is always the same: to find your data and steal, manipulate, sell, or destroy it. The physical server isn’t the only point of vulnerability. Bad actors can attack networking, IT policies can contain errors, end-users can have weak passwords, and IT teams can set access permissions too broadly. Attackers may target users with phishing emails to distribute malware.
Dell enables customers to employ a Zero Trust approach that relies on multiple security layers to help protect against all these types of vulnerabilities. To guard against theft or compromise, you must secure data at rest, data in process, and data in flight, through to data decommissioning.11 With features such as at-rest encryption, robust encryption key management, and automated certificate renewal, Dell PowerEdge servers work to block, deter, and mitigate malicious attacks after first boot. Dell PowerEdge servers with AMD processors offer additional features to bolster security, including AMD Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV).
Data at Rest
To protect data at rest, Dell provides three main security features: software-based encryption, enterprise key management, and hardware drive encryption. With drives that support Instant Secure Erase (ISE), Dell customers can cryptographically erase any data on self-encrypted drives (SEDs), ISE drives, and NVM devices such as NVDIMMs. SEDs protect data from attacks in instances where a disgruntled employee or other malicious actor physically removes drives from a server. Because the encrypted drive’s locking key password ties it to the specific server and RAID controller it came from, another server cannot access the data. For further protection, iDRAC can use Dell OpenManage Secure Enterprise Key Manager with local key management (iLKM, LKM) which works in conjunction with an external third-party key manager to lock and unlock the storage controller at boot. If someone boots the server away from the key manager, iDRAC keeps the storage controller locked so that the data stored on the device remains encrypted. To find out more about other encryption key options, visit https://infohub.delltechnologies.com/section-assets/openmanage-sekm-storage-performance- infographic.
Data in Flight
With data in flight, vulnerabilities in the network and data access control could allow attackers to intercept or modify data traveling over the network. The iDRAC web connection is one possible point of vulnerability—so Dell provides several options to secure the connection with a TLS/SSL certificate, thus mitigating the chance of attack. While this certificate is self-signed by default, admins can create a custom certificate, or one signed by a trusted Certificate Authority (CA). This certificate enables encrypted, secure connections for web browsers and tools such as command-line utilities to safely interact with the server via the iDRAC connection.
iDRAC also provides several controls for users to modify strict, narrow access rules that allow SSH access to the server. For iDRAC users with a Datacenter level license, iDRAC offers Simple Certificate Enrollment Protocol (SCEP), which maintains web server certificates with automatic renewal to avoid accidental lapses in coverage. A 2020 study from third party Principled Technologies found that this automatic renewal feature keeps servers more secure while saving IT staff valuable time—especially when it comes to maintaining a fleet of server certificates.12
Data in Use
To protect data in use, Dell PowerEdge servers enable AMD confidential compute features, including AMD Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) to protect data as it flows through memory and processing components.
Secure Memory Encryption
AMD SME encrypts all data as it enters the memory, further securing your data. Without memory encryption, data is vulnerable to malicious software and other intrusions, especially with newer memory technology such as NVDIMMs that do not lose data when powered down. This protection extends to memory via high-performance encryption engines integrated into the memory channels, ensuring both security and speed. Because it’s completely transparent to the host OS and application layers, AMD SME accomplishes this without necessitating any changes to application software, providing a user-friendly approach to enhanced memory security.13
AMD SME in-memory data encryption marks a departure from older memory encryption methods that were tailored to specific use cases. A key advantage of AMD SME is its flexibility, allowing software to utilize it in different ways: either by encrypting all of DRAM for comprehensive protection or selectively encrypting specific regions, such as those used by guest virtual machines (VMs).14
AMD Secure Encrypted Virtualization
AMD SEV enhances encryption for memory and virtual machines by implementing a virtual machine-based Trusted Execution Environment (TEE). Integrating with AMD V architecture, AMD SEV encrypts each VM’s memory separately, protecting the VMs from each other and from the hypervisor. This approach employs cryptography to safeguard code within a VM from potentially vulnerable higher-privileged code, such as the hypervisor. The added layer of security is especially crucial in cloud environments. This method ensures enhanced protection for VMs, fortifying them against external vulnerabilities. The encryption happens right at the memory controller, where it encrypts and decrypts data without slowing processing speed—thanks to the AMD Secure Processor handling all the encryption details invisibly.15
There are still some situations when a VM’s data needs to communicate with other VMs or with the hypervisor. In these instances, AMD SEV allows the VM to choose which encryption key to apply to specific memory pages: a guest key that keeps the page private to the VM or a hypervisor key that allows the hypervisor and other VMs to decrypt the page. This flexibility allows for security and communication based on the needs of each VM.16
AMD SEV offers additional features that expand the cryptographic isolation of VMs: SEV-Encrypted State (SEV-ES) and SEV-Secure Nested Paging (SEV-SNP). SEV-ES further isolates VMs from each other and the hypervisor by encrypting CPU register content when a VM powers down, protecting it from unauthorized access via a neighbor VM or the hypervisor. SEV-SNP builds on SEV and SEV-ES, adding memory integrity protection and optional additional security features for VMs. The memory integrity enhancement makes it so that a VM can access data in memory only if it can read the last value it wrote. If another entity modified the data in the memory, the VM cannot access the data. This protects the VM from running compromised data or code.
Encryption and Encryption Keys
AMD SEV uses a unique encryption key for each VM, cryptographically isolating VMs and the hypervisor. This encryption engine secures data on write and decrypts it on read. Each VM, upon creation, receives a unique key, ensuring that any unauthorized attempt to access its memory results in incomprehensible data. Every 4th Generation AMD EPYC™ processor offers up to one thousand encryption keys. This architecture doesn’t alter applications within the VM; instead, it operates at the operating system level and elevates data security. Designed to protect data in use, including memory contents, the encryption hardware built into the memory controller manages VRAM traffic encryption and decryption, strengthening protection of data in use.17
Resources
1- Chuck Brooks, “Cybersecurity Trends & Statistics For 2023; What You Need to Know,” accessed December 4, 2023, https://www.forbes.com/sites/chuckbrooks/2023/03/05/cybersecurity-trends--statistics-for-2023-more-treachery- and-risk-ahead-as-attack-surface-and-hacker-capabilities-grow/?sh=9885ce219dba.
2- Ken Kizzee, “Cyber Attack Statistics to Know,” accessed December 19, 2023,https://parachute.cloud/cyber-attack- statistics-data-and-trends/.
3- Dell, “Improved security with iDRAC9 using Root of Trust and BIOS Live Scanning,” accessed December 19, 2023, https://dl.dell.com/manuals/common/dell-emc-idrac9-security-root-of-trust-bios-live-scanning.pdf.
4- Dell, “Cyber Resilient Security in Dell PowerEdge Servers,” accessed December 4, 2023, https://www.delltechnologies. com/asset/en-us/products/servers/industry-market/cyber-resilient-security-with-poweredge-servers.pdf.
5- Technopedia, “Complex Programmable Logic Device,” accessed December 4, 2023, https://www.techopedia.com/definition/6655/complex-programmable-logic-device-cpld.
6- AMD, “AMD Pro Security,” accessed December 4, 2023, https://www.amd.com/en/technologies/pro-security.
7- AMD, “AMD Infinity Guard,” accessed December 4, 2023, https://www.amd.com/en/technologies/infinity-guard.
8- AMD, “4 Ways AMD Infinity Guard Helps Protect Your Data,” accessed December 4, 2023, https://www.amd.com/system/files/documents/content-security-infographic.pdf.
9- AMD, “AMD Infinity Guard.”
10- Dell, “Defense in-depth: Comprehensive Security on PowerEdge AMD EPYC Generation 2 (Rome) Servers,” accessed December 4, 2023, https://infohub.delltechnologies.com/p/defense-in-depth-comprehensive-security-on-poweredge- amd-epyc-generation-2-rome-servers/.
11- Dell, “Reduce Your Risk of Unauthorized Server Data Access,” accessed January 24, 2024, https://infohub.delltechnologies.com/section-assets/data-protection-infographic.
12- Principled Technologies, “Reduce hands-on deployment times to near zero with iDRAC9 automation,” accessed December 4, 2023, https://www.principledtechnologies.com/Dell/iDRAC9-v6.10-provisioning-infographic-0323.pdf.
13- AMD, “AMD Infinity Guard.”
14- AMD, “AMD Memory Encryption,” accessed December 4, 2023,
https://www.amd.com/content/dam/amd/en/ documents/epyc-business-docs/white-papers/memory-encryption-white- paper.pdf.
15- AMD, “AMD Secure Encrypted Virtualization,” accessed December 4, 2023, https://www.amd.com/en/developer/sev.html
16- AMD, “AMD Memory Encryption.” https://www.amd.com/content/dam/amd/en/documents/epyc-business-docs/ white-papers/memory-encryption-white-paper.pdf
17- AMD, “AMD Secure Encrypted Virtualization.”
Related Documents
![Post thumbnail](https://cdn-prod.scdn6.secure.raxcdn.com/static/media/1eb1cd47-760a-4616-a63b-852c32a5f9f9.jpeg?_cb=1718221344.8070743)
Industry’s First Multimodal RAG on Dell PowerEdge XE9680 Server with AMD Instinct MI300X Accelerators
Wed, 12 Jun 2024 19:49:13 -0000
|Read Time: 0 minutes
In this blog, Scalers AI presents a multimodal RAG solution enabled by compute & memory capability of the AMD Instinct MI300X accelerators on Dell PowerEdge XE9680 servers.
With the release of the AMD Instinct MI300X accelerator, we are now entering an era of choice for leading AI accelerators that power today’s retrieval-based generative AI solutions. Dell has paired the accelerators with its flagship PowerEdge XE9680 server for high performance AI applications. Leveraging this powerful combination, Scalers AI is excited to showcase a multimodal RAG (retrieval augmented generation) solution.
This offering can analyze audio, video, and text content, which is critical for enterprises as they operate on multimodal inputs whether the use cases are customer support calls, product quality images, employee training videos and more. In this blog, Scalers AI walks through an Earnings webcast scenario, provides insights into the solution architecture and user interface, and uncovers the following critical value drivers:
- Built multimodal RAG on AMD Instinct MI300X accelerator on Dell PowerEdge XE9680 server.
- Deployed four different models (language, image embeddings, text embeddings and voice) on a single Dell PowerEdge XE9680 server with AMD Instinct MI300X accelerators.
- Showcased live at Dell Tech World ‘24 in an Earning Webcast application demonstrating the use of language, voice, and vision for rapid analyst insights.
| Solution Architecture
This solution leverages Dell PowerEdge XE9680 server equipped with eight AMD Instinct MI300X accelerators, along with AMD ROCm™ supporting an array of optimizations for generative AI workloads.
The software stack includes the following key components:
- MilvusDB, an open-source vector database
- OpenAI Whisper, an automatic speech recognition system (ASR)
- CLIP, an image embedding model
- bge-large-en-v1.5, a text embedding model, which captures syntactic and semantic information from text data and encapsulates the information in numerical vectors
- Llama 3 with vLLM, with LlamaIndex as a RAG Framework
- Fast API, to interface with user interactions including video file uploads, model selection, and using the conversation console
Additional context on software components are available in the github repository.
| Step-by-Step Demo Walkthrough
Earnings webcasts are a critical tool for publicly traded companies to communicate with their investors, analysts, and the broader market, and typically involve large amounts of data collected from the company’s knowledge base. RAG solutions enhance the accuracy, efficiency, and effectiveness of the earnings webcast process by extracting insights from company proprietary data, providing significant benefits to enterprises in terms of communication, decision-making, and stakeholder engagement. The user interface provided below illustrates the earnings webcast RAG solution user interface, through which users can upload relevant video files and query the application, which will then compile relevant information and provide references to the uploaded files from which the answers were extracted.
The steps below detail the flow of the solution, each step is marked in the interface illustration above:
- User uploads input video files.
- Solution creates image and text embedding from input video frames.
- User converses with the application through the conversation console about information presented in the input video.
- Solution generates text responses in the conversation console.
- Solution retrieves a relevant video clip from the list of input video files. This provides the user with reference content from which the text response was generated.
The image above illustrates each segment of the workflow, and details how the embeddings model, vector databases, and generative AI models interact with the data and user queries within the application.
As you can see from this demo, showcased live at Dell Tech World ‘24, enterprises can now take advantage of their various data types, whether they involve voice, video, images, or text. This enables them to scale and enhance multiple use cases such as employee onboarding, customer support assistants, critical document generation, all while keeping their proprietary data and workflows private. Dell’s flagship PowerEdge XE9680 server with eight AMD Instinct MI300X accelerators supports the memory footprint needed for these rich multimodal data and model intensive use cases.
In this blog, we showcased how enterprises deploying applied AI can use their own proprietary data to take advantage of multimodal RAG capabilities in the context of an Earnings Webcast Insights and explored the capabilities of Dell PowerEdge XE9680 server equipped with AMD Instinct MI300X accelerators with the following milestones:
- Built multimodal RAG on AMD Instinct MI300X accelerator on Dell PowerEdge XE9680 server.
- Deployed four different models (language, image embeddings, text embeddings and voice) on a single Dell PowerEdge XE9680 server with AMD Instinct MI300X accelerators.
- Showcased live at Dell Tech World ‘24 in an Earning Webcast application demonstrating the use of language, voice, and vision for rapid analyst insights.
The reference code along with more information can be found here.
| Additional Criteria for IT Decision Makers
| What is RAG, and Why is it Critical for Enterprises?
RAG, or Retrieval-Augmented Generation, is a method in natural language processing (NLP) that enhances the generation of responses or information by incorporating external knowledge retrieved from a large corpus or database. This approach combines the strengths of retrieval-based models and generative models to provide more accurate, informative, and contextually relevant outputs.
The key advantage of RAG is that it leverages a large amount of external knowledge dynamically, enabling the model to generate responses that are not just based on its training data but also on up-to-date and detailed information from the retrieval phase. This makes RAG particularly useful in applications where factual accuracy and details are crucial, such as in customer support, academic research, and other domains requiring precise information. Ultimately, RAG provides enterprises with a powerful tool for improving the accuracy, relevance, and efficiency of their information systems, leading to better customer service, cost savings, and competitive advantages.
| Why is the Dell PowerEdge XE9680 Server with AMD Instinct MI300X Accelerators Well-suited for RAG Solutions?
Designed especially for AI tasks, Dell PowerEdge XE9680 server is a powerful data-processing server equipped with eight AMD Instinct MI300X accelerators, making it well-suited for AI-workloads, especially for those involving training, fine-tuning, and conducting inference with Large Language Models (LLMs). AMD Instinct MI300X accelerator is a high-performance AI accelerator intended to operate in groups of eight within AMD’s generative AI platform.
Implementing Retrieval-Augmented Generation (RAG) solutions effectively requires a robust hardware infrastructure that can handle both the retrieval and generation components efficiently. Critical hardware features for RAG solutions include high performance accelerator units and large RAM and storage capacity. With 192 GB of GPU memory, a single AMD Instinct MI300X accelerator can host an entire Llama 3 70B parameter model for inference. It is optimized for generative AI and can deliver up to 10.4 Petaflops of performance (BF16/FP16), and provides 1.5TB of total HBM3 memory in a group of eight accelerators.
Copyright © 2024 Scalers AI, Inc. All Rights Reserved. This project was commissioned by Dell Technologies. Dell and other trademarks are trademarks of Dell Inc. or its subsidiaries. AMD, AMD Instinct™, AMD ROCm™, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other product names are the trademarks of their respective owners.
***DISCLAIMER - Performance varies by hardware and software configurations, including testing conditions, system settings, application complexity, the quantity of data, batch sizes, software versions, libraries used, and other factors. The results of performance testing provided are intended for informational purposes only and should not be considered as a guarantee of actual performance.
![Post thumbnail](https://cdn-prod.scdn6.secure.raxcdn.com/static/media/f7db393f-d7f6-4fdf-8da3-e196be6e9e29.png?_cb=1715717214.302938)
Improve performance by easily migrating to a modern OpenShift environment on PowerEdge R7615 servers
Tue, 14 May 2024 20:15:19 -0000
|Read Time: 0 minutes
Improve performance and gain room to grow by easily migrating to a modern OpenShift environment on Dell PowerEdge R7615 servers with 4th Generation AMD EPYC processors and high-speed 100GbE Broadcom NICs
We deployed this modern environment, then migrated database VMs from legacy servers and saw performance improvements that support consolidation.
Transactional databases are the backbone of many business operations, powering ecommerce and order fulfillment, human resources and payroll, and a host of other activities. If your company is running these kinds of workloads on server infrastructure that is several years old, you might believe that performance is adequate and that you have little reason to consider upgrading to new servers with modern processors, networking, and a Red Hat® OpenShift® container-based environment. In fact, by continuing to use this older gear, you could be incurring higher than necessary operating expenditures by maintaining and powering more servers than you need to perform a given volume of work. You could also be risking downtime with aging hardware that is likelier to break down. By upgrading to a modern environment, you could mitigate these issues and future-proof your infrastructure. A 2019 Forrester Consulting report recommended that organizations refresh their servers at least every three years to maximize agility and productivity.[1] The report states not only that modern servers allow organizations to adopt more emerging technologies at a faster rate, but also “modern hardware has a profound impact on business benefits such as better customer experience, employee productivity, and innovation.”[2]
We explored the process of migrating VMs from a legacy environment and conducted testing to quantify the resulting improvements in network and database performance. We started with a legacy environment consisting of MySQL™ virtual machines (VMs) running on a cluster of three Dell™ PowerEdge™ R7515 servers with 3rd Generation AMD EPYC™ processors and 25Gb Broadcom® NICs. We then deployed a modern OpenShift container-based environment comprising three Dell PowerEdge R7615 servers with 4th Generation AMD EPYC processors and high-speed 100Gb Broadcom NICs. While the primary application of OpenShift is typically for containerized workloads, we used OpenShift Virtualization, which presents a familiar VM layer to administrators while utilizing the containerized technology on the underlying layer. Both environments used a Dell PowerStore 1200T for external storage that the servers accessed using iSCSI. We measured database performance using the HammerDB TPROC-C benchmark.
We found that the modern cluster environment of Dell PowerEdge R7615 servers with 4th Generation AMD EPYC processors and high-speed 100Gb Broadcom NICs outperformed the legacy cluster environment, delivering 44 percent greater database performance. These improvements mean that companies that upgrade can enjoy savings by meeting their workload requirements with fewer servers to license, maintain, power, and cool. Selecting 100Gb Broadcom NICs also positions companies well to take advantage of increasingly popular network-intensive technologies such as artificial intelligence (AI).
The benefits of containerization and Red Hat OpenShift Virtualization
Many organizations choose containers for DevOps due to their easy scalability and portability. Because a container encapsulates an application as well as everything necessary to run that application, it’s simple to move the container from development to test and production environments, adding instances of the application by replicating the container. Containers can also be useful for microservices, data streaming, and other use cases.[3]
Containers aren’t necessarily ideal for every use case, however, and for some infrastructures, IT teams may wish to incorporate both containers and VMs. Red Hat OpenShift Virtualization, which we used in our testing, enables organizations to run both VMs and containers on the same platform by bringing VMs into containers.[4] This lets IT reap the benefits of both containers and VMs with the efficiency benefit of relying on one management tool, rather than having to maintain two distinct infrastructures.
About our testing
We explored the process of deploying a modern data center environment and migrating VMs to it from a legacy environment. We also measured the database performance the VMs achieved in both environments:
Legacy environment
- Three Dell PowerEdge R7515 servers with 3rd Generation AMD EPYC 7663 56-core processors and Broadcom Advanced Dual 25Gb Ethernet NICs
- External storage using Dell PowerStore 1200T over iSCSI
- VMware® vSphere® 8
Modern environment
- Three Dell PowerEdge R7615 servers with 4th Generation AMD EPYC 9554 64-core processors and Broadcom NetExtreme-E BCM57508 100GB NICs
- External storage using Dell PowerStore 1200T over iSCSI
- Red Hat OpenShift 4.14
Figure 1 presents a diagram of our test configuration. In addition to our test server clusters, we needed three servers to host infrastructure VMs, workload client VMs, and the OpenShift control node VMs. We configured a Dell PowerEdge R7525 to serve as the host for our infrastructure VMs for services such as AD, DHCP, and DNS, as well as HammerDB client VMs. We also configured a Dell PowerEdge R7625 to host additional HammerDB client VMs. For the OpenShift environment, we deployed a Dell PowerEdge R540 to host the OCP control nodes. We virtualized the control nodes to reduce the number of servers needed for the test bed.
Figure 1: Our test configuration. Source: Principled Technologies.
To test the MySQL database performance of each environment, we used the TPROC-C workload from the HammerDB benchmark. HammerDB developers derived their OLTP workload from the TPC-C benchmark specifications; however, as this is not a full implementation of the official TPC-C standards, the results in this paper are not directly comparable to published TPC-C results. For more information, please visit https://www.hammerdb.com/docs/ch03s01.html.
Each VM had a single MySQL instance with a TPROC-C database. We targeted the maximum transactions per minute (TPM) each environment could achieve by increasing the user count until performance degraded.
What we learned
Finding 1: Deploying OpenShift in the modern environment was easy
For our environment, the OpenShift installation process using the Red Hat Assisted Installer to install an OpenShift Installer-Provisioned Cluster was straightforward and simple. We started by setting up the prerequisites for the environment, which included a VM for Active Directory, DNS, and DHCP. We created a domain for our private network and added the API and ingress routes as DNS A records. Next, we set up a VM as a router so that our OpenShift environment could access the internet from our private network. Finally, we created three blank VMs to serve as our OpenShift controller nodes. Once we had met the pre-requisite requirements, we logged into the Red Hat Hybrid Console and navigated to the Assisted Installer to create the cluster.
The Assisted Installer streamlined the process by walking us through configuration menus for storage, network, and access to the cluster. We started the cluster creation by assigning it a name, providing the domain, and selecting an OpenShift version. From there the installer guided us through the process of providing an installer image using the SSH public key of the server running the installer. After downloading the ISO, we booted each of the controller and worker nodes into the image and the Assisted Installer discovered each node. After discovering the controller and worker nodes, the installer walked us through the rest of the configuration process and then began the installation. The Assisted Installer made the process very simple with only six configuration tabs to advance through, and with our total install time after configuration taking around three hours. Once the installation was complete, each node rebooted into the OpenShift OS and the Assisted Installer provided us with a cluster console fully qualified domain name (FQDN) to connect to and manage the cluster from. For detailed steps on the OpenShift deployment process, see the science behind the report.
Finding 2: Migrating VMs from the legacy VMware environment to the modern OpenShift environment was easy
Migrating a VM from the VMware environment to OpenShift was also a straightforward process and quick to set up. While the actual migration time will vary depending on VM size and hardware speed, the setup consists of only a few steps and took us less than 10 minutes. We first installed the Migration Toolkit for Virtualization from the OpenShift OperatorHub. We then entered the IP address and credentials for the vCenter as a new provider. Next, we created a NetworkMap and a StorageMap to connect the respective resources between the environments. We then created a new migration plan to map the VMs to a namespace in OCP. We ran the migration plan on a single VM, and confirmed that we were able to enter the VM console once the migration was complete. For detailed steps on the process of migrating VMs from the legacy environment to the modern environment, see the science behind the report.
About 4th Gen AMD EPYC 9554 processors
According to AMD, EPYC 9554 processors deliver fast performance “for cloud, enterprise, and HPC workloads—helping accelerate your business.”[5] EPYC processors include AMD Infinity Guard, which per AMD is “a set of layered, cutting-edge security features that help you protect sensitive data and avoid the costly downtime cause by security breaches.”[6]
In addition to performance and security features, AMD claims their processors are energy-efficient, which can reduce energy costs and “minimize environmental impacts from data center operations while advancing your company’s sustainability objectives.”[7]
When comparing SPECCPU Floating Point peak rates and the default thermal design power (TDP) of the AMD EPYC 9554 and the AMD EPYC 7663, the 9554 has 54 percent better performance per watt, which demonstrates the improved power efficiency with the new 4th Gen AMD EPYC process.[8],[9]
For more information about 4th Gen AMD EPYC processors visit: https://www.amd.com/en/processors/epyc-server-cpu-family.
Finding 3: Database performance improved by 44 percent in the new environment
Figure 2 shows the results of our database performance testing using the TPROC-C workload from the HammerDB benchmark suite. The modern OpenShift cluster of Dell PowerEdge R7615 servers outperformed the legacy cluster by 44 percent. This extra capability could benefit companies upgrading to the new environment in several ways. The company could provide a better user experience, perform more work—or support more users—with a given number of servers, or reduce the number of servers necessary to execute a given workload.
Figure 2: Performance in transactions per minute using the TPROC-C workload of the HammerDB benchmark suite. Higher is better. Source: Principled Technologies.
Finding 4: Performance improved in the modern cluster, supporting consolidation, which leads to savings
Based on the results of our performance tests (see Figure 3), a company could consolidate the database workloads of a four-node Dell PowerEdge 7515 cluster with some additional headroom into three modern Dell PowerEdge R7615 servers with 4th Generation AMD EPYC processors and high-speed 100Gb Broadcom NICs.
The cluster of three modern servers delivered a total of 9,674,180 transactions per minute (3,224,726 TPMs per server). The cluster of three legacy servers delivered a total of 6,714,712 TPM (2,238,237 per server). Based on these results, four legacy servers would achieve a total of 8,952,948 TPM, which would leave 721,231 additional TPM room for growth on the modern three-node cluster.
Reducing the number of servers you need means that operational expenditures such as data center power and cooling and administrator time for maintenance also decrease, leading to ongoing savings.
Figure 3: Performance in transactions per minute that three modern servers and four legacy servers could achieve, based on our hands-on testing. Higher is better. Source: Principled Technologies.
About Dell PowerEdge R7615 servers
The Dell PowerEdge R7615 is a 2U, single-socket rack server. Dell states that it has designed this server to provide “performance and flexible, low-latency storage options in an air or Direct Liquid Cooling (DLC) configuration.”[10]
According to Dell, this server uses the AMD EPYC 4th generation processor to deliver up to 50 percent higher core count per single-socket platform in an innovative air-cooled chassis.[11] It also supports DDR5 at 4800 MT/s memory and PCIe® Gen5 with double the speed of previous Gen4 for faster access and transport of data, optimizing application output.[12] It supports up to six single-wide full-length GPUs or three double-wide full-length GPUs, to improve responsiveness or reduce app load time for power users, plus lower-latency, high-performance NVMe SSDs to help maximize compute performance.[13]
Learn more at https://www.delltechnologies.com/asset/en-us/products/servers/technicalsupport/poweredge-r7615-spec-sheet.pdf.
How high-speed 100Gb Broadcom NICs can help your organization
Even if a 25Gb NIC is sufficient to meet a company’s current networking needs, opting to equip new servers with the high-speed 100Gb Broadcom NIC can be a smart move. Future-proofing your network can allow you to meet the increasing demands of emerging technologies.
Advanced technologies such as artificial intelligence and machine learning, which can require the processing and transmission of large amounts of data, are becoming increasingly prevalent across businesses of all sizes. In a June 2023 survey of small business decision-makers, 74 percent were interested in using AI or automation in their business and 55 percent said their interest in these technologies had grown in the first half of 2023.[14] Upgrading to a modern environment with a highspeed 100Gb Broadcom NIC positions companies to take advantage of AI applications for social media, content creation, marketing, customer support, and many other use cases.
Another way that investing in the high-speed 100Gb Broadcom NIC can help your company is through improved efficiency. You might be tempted to go with a 25Gb NIC, thinking that as your networking needs increase, you can simply add more NICs of this size. However, consider a 2023 Principled Technologies study that compared the performance of a server solution with a 100Gb Broadcom 57508 NIC and a solution with four 25Gb NICs.[15] Testing revealed that the 100Gb NIC solution achieved up to 2.3 times the throughput of the solution with 25Gb NICs. It also delivered greater bandwidth consistency, which can translate to providing a better user experience; the report states that applications using the 25Gb NICs network configuration “would experience significant variation in available bandwidth, potentially causing jittery or interrupted service to multiple streams.”[16]
About the Broadcom BCM57508-P2100G Dual-Port 100GbE PCle 4.0 ethernet controller
A higher performing NIC can reduce latency, increase throughput, and allow the server to transmit and receive a great volume of data. The Dell PowerEdge R7615 we tested features the Broadcom BCM57508-P2100G DualPort 100GbE PCle 4.0 ethernet controller, which supports speeds of up to 200 Gigabits per second. Broadcom designed the BCM57508-P2100G “to build highlyscalable, feature-rich networking solutions in servers for enterprise and cloud-scale networking and storage applications, including high-performance computing, telco, machine learning, storage disaggregation, and data analytics.”[17]
The BCM57508-P2100G features BroadSAFE® technology, “to provide unparalleled platform security” and a “unique set of highly-optimized hardware acceleration engines to enhance network performance and improve server efficiency.”[18]
BCM57508-P2100G Dual-Port 100GbE PCle 4.0 ethernet controller. Image provided by Dell.
Conclusion
If your organization’s transactional databases are running on gear that is several years old, you have much to gain by upgrading to modern servers with new processors and networking components and an OpenShift environment. In our testing, a modern OpenShift environment with a cluster of three Dell PowerEdge R7615 servers with 4th Generation AMD EPYC processors and high-speed 100Gb Broadcom NICs outperformed a legacy environment with MySQL VMs running on a cluster of three Dell PowerEdge R7515 servers with 3rd Generation AMD EPYC processors and 25Gb Broadcom NICs. We also easily migrated a VM from the legacy environment to the modern environment, with only a few steps required to set up and less than ten minutes of hands-on time. The performance advantage of the modern servers would allow a company to reduce the number of servers necessary to perform a given amount of database work, thus lowering operational expenditures such as power and cooling and IT staff time for maintenance. The high-speed 100Gb Broadcom NICs in this solution also give companies better network performance and networking capacity to grow as they embrace emerging technologies such as AI that put great demands on networks.
This project was commissioned by Dell Technologies.
May 2024
Principled Technologies is a registered trademark of Principled Technologies, Inc.
All other product names are the trademarks of their respective owners.
Read the report on the PT site at https://facts.pt/2V6p3FG and see the science at https://facts.pt/Dj53ZJb.
Author: Principled Technologies
[1] Forrester, “Why Faster Refresh Cycles and Modern Infrastructure Management are Critical to Business Success,” accessed May 1, 2024, www.techrepublic.com/resource-library/casestudies/forrester-why-faster-refresh-cycles-and-modern-infrastructure-management-are-critical-to-business-success/.
[2] Forrester, “Why Faster Refresh Cycles and Modern Infrastructure Management are Critical to Business Success,” accessed May 1, 2024, www.techrepublic.com/resource-library/casestudies/forrester-why-faster-refresh-cycles-and-modern-infrastructure-management-are-critical-to-business-success/.
[3] Red Hat, “Understanding containers,” accessed April 12, 2024, https://www.redhat.com/en/topics/containers.
[4] Red Hat, “Red Hat OpenShift Virtualization,” accessed April 12, 2024,
https://www.redhat.com/en/technologies/cloud-computing/openshift/virtualization.
[5] AMD, “AMD EPYC Processors,” accessed April 12, 2024, https://www.amd.com/en/processors/epyc-server-cpu-Family.
[6] AMD, “AMD EPYC Processors.”
[7] AMD, “AMD EPYC Processors.”
[8] SPEC, “SPEC CPU®2017 Floating Point Rate Result for Dell PowerEdge R6615 (AMD EPYC 9554 64-Core Processor),” accessed May 2, 2024, https://www.spec.org/cpu2017/results/res2024q1/cpu2017-20240212-41481.html.
[9] SPEC, “SPEC CPU®2017 Floating Point Rate Result for Dell PowerEdge R6515 (AMD EPYC 7663 56-Core Processor),” accessed May 2, 2024, https://www.spec.org/cpu2017/results/res2021q3/cpu2017-20210913-29288.html.
[10] Dell, “PowerEdge R7615 Specification Sheet,” accessed April 12, 2024, https://www.delltechnologies.com/asset/en-us/products/servers/technical-support/poweredge-r7615-spec-sheet.pdf.
[11] Dell, “PowerEdge R7615 Specification Sheet.”
[12] Dell, “PowerEdge R7615 Specification Sheet.”
[13] Dell, “PowerEdge R7615 Specification Sheet.”
[14] Constant Contact, “AI Stats and Trends Small Businesses Need to Know Now,” accessed April 12, 2024, https://news.constantcontact.com/small-business-now-ai-2023.
[15] Principled Technologies, “Opt for modern 100Gb Broadcom 57508 NICs in your
Dell PowerEdge R750 servers for improved networking performance,” accessed April 12, 2024,
https://www.principledtechnologies.com/Dell/PowerEdge-R750-networking-iPerf-1023.pdf.
[16] Principled Technologies, “Opt for modern 100Gb Broadcom 57508 NICs in your
Dell PowerEdge R750 servers for improved networking performance,” accessed April 12, 2024,
https://www.principledtechnologies.com/Dell/PowerEdge-R750-networking-iPerf-1023.pdf.
[17] Broadcom, “BCM57508 – 200GbE,” accessed April 12, 2024,
https://www.broadcom.com/products/ethernet-connectivity/network-adapters/bcm57508-200g-ic.
[18] Broadcom, “BCM57508 – 200GbE.”