Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Dell.com Contact Us
United States/English
Mark Maclean
Mark Maclean

Mark works as part of the PowerEdge technical marketing engineering team focused on OpenManage server management automation and orchestration. With over 40 years experience working in IT, with 30 years plus of working at Dell mostly in the x86 server space. 


Social Handles: Linkedin : uk.linkedin.com/in/markmacleandell

Assets

Home > Servers > Systems Management > Blogs

OME OpenManage Enterprise valentine

14 reasons to fall in love with Dell OpenManage Enterprise 4.0 this Valentine’s Day

Mark Maclean Stephen  Daborn Mark Maclean Stephen Daborn

Wed, 24 Apr 2024 15:44:39 -0000

|

Read Time: 0 minutes

It’s Valentine’s Day, and love is in the air. Dell OpenManage Enterprise is ready to sweep you off your feet with its 14 swoon-worthy features. Imagine a romantic dinner, but instead of music, there are servers humming in perfect harmony, and instead of roses, there is a management tool that makes your heart flutter. OpenManage Enterprise is like the perfect date: attentive, reliable, and always there to make sure the bond between administrators and PowerEdge servers is as smooth as silk. So, grab a box of chocolates, cuddle up with your server rack, and let's dive into the 14 features of Dell OpenManage Enterprise (also known as OME) that will make you believe in love at first sight! 

❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤ 

  1. 1. OME offers lifecycle management for Dell PowerEdge servers  

This includes orchestrated discovery of servers, health monitoring, firmware updating, warranty status information, and alert response automation such as SNMP Trap Forwarding and Forwarding to Syslog servers, as well as more than 30 reports as standard with the ability to create custom reports, all delivered in a virtual appliance. 

  1. 2. Apply VMware cluster-aware firmware updates. 

A feature offered by the OME plugin for VMware OMEVV. This leverages VMWare’s vCenter, DRS, and maintenance mode to sequentially update each member server in a cluster with zero down time for your virtual machines during firmware updates. 

  1. 3. Chart and analyze telemetry information from multiple servers  

Visualise multivariate metrics  on one graph in OME Power Manager for review. This data includes key performance / power / thermal & IO data. Develop and review a baseline of server performance over time to spot trends and problems before they become an issue. 

  1. 4. Automatically create Service Requests with Dell Technical Support. 

When a hardware failure is detected, OME Service plugin creates a case with Dell ,  reducing the time to fix. Note : Dell pro support contract required. 

  1. 5. Enable OME iDRAC credential management (password rotation),  

keep OME iDRAC usage compliant with organisations password rotation policy 

  1. 6. Monitor servers’ power consumption, respond to thermal events, report carbon emissions, and cap the power if required 

With OME Power Manager managing server power, customers can report energy usage and build a strategy to lower energy bills, even selected HPE and Lenovo servers are supported. 

  1. 7. Streamline installation of new servers. 

 Remove manual steps by  systematically deploy server configuration profiles templates and operating systems, reducing time to production for newly delivered servers 

  1. 8. Build a single view of your entire Dell infrastructure including server, storage, networking, and data protection 

Plug OME data into Dell’s cloud-based proactive monitoring and predictive analytic tool, CloudIQ, to better collaborate and simplify operations. 

  1. 9. Put server management into your pocket  

Extend OME to your mobile device with OpenManage Mobile and get secure control where ever you are.  

  1. 10. Make drift Detection Proactive 

Drift management of firmware & settings gives visibility of issues while simultaneously reducing time and effort to resolve. 

Drift Management improves operational efficiency and enhances server security posture.  

  1. 11. Integrate ServiceNow n with OME  

Deliver both population of the ServiceNow CMDB with OME data and automatic incident creation for critical events, combining data to enhance service delivery. 

Bring Dell server monitoring, deployment, and configuration management into MECM and SCVMM, so customers can leverage skills and investment in system centre 

  1. 13. Create an infrastructure as code environment 

Use Dell’s packs for Terraform from HashiCorp or Redhat’s Ansible support. Watch this video of OME template deployment via Ansible to see just how simple it is.  

  1. 14. Build custom automation and bespoke integrations 

OME Restful API offers DevOps teams deep software-defined infrastructure  

Show your PowerEdge servers some love and deploy OpenManage Enterprise 4.0 today.  

 

Resources 

 

 

Authors: Mark Maclean OpenManage Technical Marketing Engineering  

Steve Daborn, Senior Global Product Marketing Manager 

Home > Servers > Systems Management > Blogs

OpenManage OME OpenManage Enterprise OME success OME reference OME users

OpenManage Enterprise - Customer Success Stories

Mark Maclean Stephen  Daborn Mark Maclean Stephen Daborn

Wed, 24 Apr 2024 15:43:00 -0000

|

Read Time: 0 minutes

We are often asked what the best tool is for managing Dell PowerEdge servers. In this blog, discover how both our in-house Dell IT team and Cambridge University, a long-term customer, use our server management solutions to manage thousands of PowerEdge servers, ultimately avoiding outages, boosting overall server productivity, reducing maintenance windows, and delivering increased operational efficiency.


How Dell IT excels in server management using Dell OpenManage  

Dell’s in-house IT team manages over 18,000 PowerEdge servers. The fleet of servers range from brand new to five years old, resulting in a mix of server models and generations. These servers are located across eight major data centers globally. Workloads include Dell.com and back-office systems such as Dell’s order management system. In fact, Dell runs over 600 business applications. Many of these are mission critical, and an outage can have a major impact on customers, sales, and support, down to stopping even the production line.

Server hardware management is done via OpenManage Enterprise (OME), encompassing alerting, monitoring, firmware updating, and configuration deployment and management, as well as power consumption monitoring. Each data center has a dedicated OpenManage Enterprise instance responsible for approximately 2,500 servers.

Monitoring of server health events is covered by OME and integration with Service Now, which automatically creates trouble tickets and routes them to the appropriate team for remediation. Power usage data is collected and monitored, then used to optimize power load per rack cabinet and flag underutilized servers showing lower than expected power draw.

To aid automation and rapid distribution firmware, updates are collected, tested, and released via a customized catalogue.  These custom catalogues are assembled and tested by the Dell IT server management team and are consumed by OME to orchestrate server updates. Urgent updates to resolve security CVEs can be pushed out at will by OME following a change management approval. The largest patch job completed by the team so far was an iDRAC firmware update task for 14,500 servers in one change request, demonstrating how scalable OME automation is.  

Security is built into Dell’s processes and tools. Microsoft Active Directory integration enables the OME audit log to record who did what and when, recording the AD user account name. The team also use OME configuration drift detection reporting, which audits a server’s current configs against the desired state, highlighting non-conforming servers that OME can then resolve by re-applying a server template.

With Dell IT using OME at major scale in their complex production environment, any customer can be confident OME will perform at scale. As Dell IT says, “If you have Dell PowerEdge servers, you really need to be running OpenManage Enterprise.”  


University of Cambridge server management at scale

 Dawn - UK’s fastest artificial intelligence (AI) supercomputer.      Copyright Joe Bishop.

With an estate of 3,500 Dell servers plus other devices in one data center, the team at Cambridge University needs efficient and scalable server management. The HPCC server group uses integrated Dell Remote Access Controller (iDRAC) embedded in every server and OME to maximize the day-to-day efficiency of admin tasks such as health monitoring, firmware updates, and configuration.

Config management and drift detection are achieved via OME’s configuration compliancy features. Each cluster has a collection of firmware configuration settings. These templates are set and monitored centrally via OME with alerting set for non-compliant hosts.  Firmware updates are also streamlined using OME and customized in-house firmware repositories built with OME update manager.  Updates are scheduled and then left to run automatically against multiple servers, freeing administrators to focus on more novel tasks. Finally, server health monitoring is real-time. Any alerts are sent from iDRAC to OME with the status notified and logged by OME. Using the Dell TechDirect service portal, the team is able log fault calls and request any required parts from Dell.               

Operational highlights include:

  • Reduction in time to resolution of faults
  • Quicker and easier implementation of firmware updates
  • Set BIOS settings configuration across an entire cluster in one easy automated job

Beyond the Dell OpenManage tools, Cambridge uses the iDRAC server telemetry feature to stream power and thermal data to Graphite and Grafana. These Dell metrics, along with values from other data center infrastructure, are aggregated and visualized for analysis of trends, ensuring the clusters are powered and cooled effectively.

Join the ranks of satisfied customers who have optimized their server management operations and enjoy the peace of mind brought about by Dell OpenManage.

 

Resources

 

Authors:

Mark Maclean, PowerEdge & OpenManage Technical Marketing Engineering 

Steve Daborn, Senior Global Product Marketing Manager

Linkedin : uk.linkedin.com/in/markmacleandell   |   linkedin.com/in/stephendaborn


Home > Servers > PowerEdge Components > Direct from Development: Tech Notes

BIOS sustainability server power management profile SPEC Power

Empowering Server Power Efficiency Profiles: Unleashing Power Savings in Bills & Usage

Mark Maclean Donald Russell Kevin Locklear Mark Maclean Donald Russell Kevin Locklear

Tue, 16 Apr 2024 15:48:44 -0000

|

Read Time: 0 minutes

Introduction

Over the last few years, the cost of power has continued to increase alongside the amount of power used in most data centers. Given these trends, customers are searching for strategies to reduce both the economic and environmental footprint of powering their server estates.

Simple strategies include virtualization and consolidation to reduce the number of physical servers, identifying zombie servers to be retired, and replacing older, less efficient servers with newer servers offering improved performance per watt.

BIOS System Profile Settings

Beyond the aforementioned strategies, Dell PowerEdge server customers can increase their power savings by selecting CPU power management and energy efficient policy settings in the system BIOS. These settings configure a collection of the rules that relate to server chip set behavior, including CPU C-state and CPU turbo mode, to increase power usage efficiency.

Selecting the most relevant setting can reduce CPU power demands while continuing to meet performance requirements to produce significant long-term cost savings. For example, in Intel®-based PowerEdge servers, customers can enable Dynamic Application Power Management (DAPC), which allows the BIOS to manage processor power states in order to achieve maximized performance per watt at all utilization levels. The full details of BIOS System Profile Settings can be found in the white paper, Set-up BIOS on the 16th Generation of PowerEdge Servers.

Testing and results

To demonstrate the effectiveness of the various profiles on power efficiency and server performance settings, SPEC Power® 2008 version 1.11.0 benchmarking was run for each setting. The SPEC Power® benchmark exercises the server at ten workload levels and combines power and performance into a single metric that measures power efficiency in operations per watt.

Table 1. SPEC Power® benchmark results


Max Perf

Performance

DAPC Performance 

DAPC Balanced Perf

DAPC Balanced Energy

DAPC Energy Efficient

SPEC Power® Score

8621

10311

10378

11105

11564

SPEC Power® 100% OP/s

8,383,505

8,380,816

8,399,796

8,402,421

8,451,740

SPEC Power® 100% Watts

602

602

602

602

602

SPEC Power® 100% Score PPR

13924

13921

13943

13956

14036

SPEC Power® 60% OP/s

5,052,076

5,047,622

5,068,899

5,051,143

5,066,320

SPEC Power® 60% Watts

549

488

477

392

360

SPEC Power® 60% Score PPR

9198

10343

10624

12890

14084

SPEC Power® Idle Watts

269

125

125

121

122

We selected a Dell PowerEdge server with dual Intel® 6448Y 2.1GHz 32 cores with 256GB ram for the test. The SPEC Power® benchmark was run by the Dell Technologies Server Performance Analysis (SPA) team in the Dell Technologies Austin Server Performance lab. The summary of the results in Table 1 shows that using DAPC/Energy Efficient policy delivered the best overall SPEC Power® score with comparable performance. Looking at the individual results more closely, a server at 100% utilization has the same power usage irrespective of the BIOS profile. However, given that most customers are not running their servers at 100%, the 60% results have been highlighted, demonstrating the power savings available for a representative customer.

Substantial energy efficiency delivered

 This graph shows the SPEC Power data at 60% workload using different BIOS settings. the DAPC/Energy Efficient profile is significantly higher than max perf, dapc/performance, and dapc/balance energy, although slightly lower than dapc/balanced perf.

Figure 1. SPEC Power® results at 60%

The DAPC/Energy Efficient policy delivered 35% more savings in power usage as compared to the Max Performance profile. 

Considering the average EU energy costs of $0.21[1] for an estate of 100 servers running at 60% load, there is a potential savings of $380,797 in energy costs over four years when comparing the Max Performance profile to the Energy Efficient policy. For a 1000-server estate, these potential savings increase to $1,523,188, all while maintaining server performance.

Those who have purchased an electric car in the last few years know that the range advertised by the manufacturer can differ to the mileage delivered in the real world. Treat these Dell Technologies results as guidance. It is recommended that customers run their own testing using their workloads.

These results are from Dell Technologies in-house testing as of January 2024. The cost of power was sourced from Consumer Energy Prices in Europe (qery.no). The full spec2008 results are posted on spec.org.

Changing BIOS profiles

BIOS profiles can be set several ways, the simplest being from the server BIOS access at boot using the <F2> key. That said, when faced with more than a few servers, this method becomes very time-consuming. There are a number of methods to automate this process, including running a script at the iDRAC API level or using a server configuration profile. A server configuration profile (SCP) is sometimes referred to as a template and can be used to bundle the system profile setting into the server firmware configuration. Using a tool such as OpenManage Enterprise (OME), a server template can then be deployed to each server’s iDRAC—or Dell remote access controller—to streamline and automate the application of these BIOS settings.

This is a screenshot of the system profile in the BIOS setup, showing all of the different system profile settings, including Performance Per Watt (DAPC).

Figure 2. System profile in BIOS setup

For customers who want to track and report these settings on Dell PowerEdge servers, the Dell OME Power Manager plugin for OpenManage Enterprise enables the automatic grouping of servers by profile, displaying this information on the GUI as shown in Figure 3. The Power Manager plugin also offers a ready-to-run report template that breaks down the entire server estate, grouped by server profile. This report can be scheduled or run ad hoc.   

This screenshot shows the Dell OME page displaying the BIOS profiles

Figure 3. OpenManage Enterprise displaying BIOS profiles

System profiles and BIOS settings in detail

The following tables provide detailed background information about each system profile and the BIOS settings they alter for Intel®- and AMD-based PowerEdge servers.

Table 2. Intel® Platform System Profile

System Profile Settings

Performance Per Watt Optimized (DAPC)

Performance Per Watt Optimized (OS)

Performance

Workstation Performance

CPU Power Management

System DBPM (DAPC)

OS DBPM

Maximum Performance

Maximum Performance

Memory Frequency

Maximum Performance

Maximum Performance

Maximum Performance

Maximum Performance

Turbo Boost

Enabled

Enabled

Enabled

Enabled

Energy Efficient Turbo

Enabled

Enabled

Disabled

Disabled

C1E

Enabled

Enabled

Disabled

Disabled

C-States

Enabled

Enabled

Disabled

Enabled

Memory Patrol Scrub

Standard

Standard

Standard

Standard

Memory Refresh Rate

1x

1x

1x

1x

Uncore Frequency

Dynamic

Dynamic

Maximum

Maximum

Energy Efficient Policy

Balanced Performance

Balanced Performance

Performance

Performance

Monitor/Mwait

Enabled

Enabled

Enabled

Enabled

CPU Interconnect Bus Link Power Management

Enabled

Enabled

Disabled

Disabled

PCI ASPM L1 Link Power Management

Enabled

Enabled

Disabled

Disabled

Workload Configuration

Balance

Balance

Balance

Balance

Table 3. AMD Platform System Profile

System Profile Settings

Performance Per Watt Optimized (OS)

Performance

CPU Power Management

OS DBPM

Maximum Performance

Memory Frequency

Maximum Performance

Maximum Performance

Turbo Boost

Enabled

Enabled

C-States

Enabled

Disabled

Memory Patrol Scrub

Standard

Standard

Memory Refresh Rate

1x

1x

PCI ASPM L1 Link Power Management

Enabled

Disabled

Determinism Slider

Power Determinism

Power Determinism

Power Profile Select

High Performance Mode

High Performance Mode

PCIE Speed PMM Control

Auto

Auto

EQ Bypass To Highest Rate

Disabled

Disabled

DF PState Frequency Optimizer

Enabled

Enabled

DF PState Latency Optimizer

Enabled

Enabled

Host System Management Port (HSMP) Support

Enabled

Enabled

Boost FMax

0 - Auto

0 - Auto

Algorithm Performance Boost Disable (ApbDis)

Disabled

Disabled

Dynamic Link Width Management (DLWM)

Unforced

Unforced

Conclusion

When implementing strategies for increasing server energy efficiency, selecting a BIOS system profile can result in significant power savings with minimal or no server performance degradation. The power cost savings for a 1000-server estate could potentially be $1,390,737 over four years. Additionally, as a result of low processor power consumption, the load on the cooling system in the data center is reduced, increasing savings on energy costs and power. Customers running an estate of Dell PowerEdge servers should review their use of these BIOS settings for their server workloads to better understand how these profiles can help to reduce power usage and lower energy bills. 

References

 

[1] For non-household consumers such as industrial, commercial, and other users not included in the households sector, average electricity prices in the EU stood at €0.21 per kWh (excluding VAT and other recoverable taxes and levies) for the first half of 2023 according to the latest Eurostat data, Consumer Energy Prices in Europe (qery.no)

Authors: Mark Maclean, PowerEdge Technical Marketing Engineering; Kevin Locklear, ISG Sustainability; Donald Russell, Senior Performance Engineer, Solution Performance Analysis

Home > Servers > Rack and Tower Servers > Intel > Third-party Analysis

AI Intel machine learning CPU ML Artificial Intelligence inferencing Sapphire Rapids

AI-related Performance Testing of PowerEdge MX760c vs. PowerEdge MX750c

Mark Maclean Mark Maclean

Thu, 14 Mar 2024 16:56:05 -0000

|

Read Time: 0 minutes

Trust the numbers: Dell PowerEdge MX760c using 4th generation Intel® Xeon® Scalable Processors performs better machine learning. 

Grid Dynamics set out to review whether taking a CPU-only (non-GPU) approach is effective in training and inference tasks for small and medium-sized machine learning models (up to ten million parameters), and whether it is a good solution for running larger models (with hundreds of millions of parameters) effectively in inference mode. They also wanted to review improvement in server performance, generation to generation.

To understand how the MX760c model series handles significant artificial intelligence/machine learning workloads as compared to the MX750c model series, Grid Dynamics developed four use cases that simulate the computational needs of retail, industrial, and IT infrastructure:

1. A recommendation system for analyzing user preferences and creating personalized recommendations.

2. A sales forecasting and inventory decision support tool for store managers for keeping the inventory optimized against actual and forecasted demand, and for planning for stock replenishment.

3. Anomaly detection for industrial timeseries for analyzing anomalies in telemetry data and detecting failure probability in industrial hardware.

4. Popular machine learning for evaluating server performance through a series of standardized tests.

(2) Based on testing performed by Grid Dynamics, February 2023.

The tests team concluded that Dell PowerEdge MX760c – leveraging Intel® Xeon® Gold 6430 (2.10 / 32 core) CPUs – performs better in machine learning than MX750c leveraging Intel® Xeon® Platinum 8368 (2.40 GHz /38 core) CPUs. The MX760C used Intel® Xeon® Gold 6430 that supports the Advanced Matrix Extensions accelerator that improves the performance of deep-learning training and inference. The Gold 6430 also supports faster DDR5 memory. 

Figure 1.  Inference speed (neural ML models – Inferences per Second (higher is better) 

Advanced Matrix Extensions (AMX) technology implemented in “Sapphire Rapids” 4th generation of Intel® Xeon® Scalable Processors CPUs allows these servers to perform matrix operations for quantized (Int8 and Bf16) ML models much faster.

(8) Based on testing performed by Grid Dynamics, February 2023.

To summarize, Dell PowerEdge MX760c has proven itself to be a faster and a more capable solution than the previous generation MX750c server for the studied use cases. For more details, see the benchmark and testing results in these documents:

A picture containing text, indoorDescription automatically generated

 

Executive Summary - Performance Comparison of  Dell PowerEdge MX760c and MX750c Server Models

Technical Paper - Performance Comparison of  Dell PowerEdge MX760c and MX750c Server Models

 

 


Home > Servers > Modular Servers > Third-party Analysis

AI Intel machine learning CPU ML Artificial Intelligence inferencing Sapphire Rapids

AI-related Performance Testing of PowerEdge MX760c vs. PowerEdge MX750c

Mark Maclean Mark Maclean

Thu, 14 Mar 2024 16:52:36 -0000

|

Read Time: 0 minutes

Trust the numbers: Dell PowerEdge MX760c using 4th generation Intel® Xeon® Scalable Processors performs better machine learning. 

Grid Dynamics set out to review whether taking a CPU-only (non-GPU) approach is effective in training and inference tasks for small and medium-sized machine learning models (up to ten million parameters), and whether it is a good solution for running larger models (with hundreds of millions of parameters) effectively in inference mode. They also wanted to review improvement in server performance, generation to generation.

To understand how the MX760c model series handles significant artificial intelligence/machine learning workloads as compared to the MX750c model series, Grid Dynamics developed four use cases that simulate the computational needs of retail, industrial, and IT infrastructure:

1. A recommendation system for analyzing user preferences and creating personalized recommendations.

2. A sales forecasting and inventory decision support tool for store managers for keeping the inventory optimized against actual and forecasted demand, and for planning for stock replenishment.

3. Anomaly detection for industrial timeseries for analyzing anomalies in telemetry data and detecting failure probability in industrial hardware.

4. Popular machine learning for evaluating server performance through a series of standardized tests.

(2) Based on testing performed by Grid Dynamics, February 2023.

The tests team concluded that Dell PowerEdge MX760c – leveraging Intel® Xeon® Gold 6430 (2.10 / 32 core) CPUs – performs better in machine learning than MX750c leveraging Intel® Xeon® Platinum 8368 (2.40 GHz /38 core) CPUs. The MX760C used Intel® Xeon® Gold 6430 that supports the Advanced Matrix Extensions accelerator that improves the performance of deep-learning training and inference. The Gold 6430 also supports faster DDR5 memory. 

Figure 1.  Inference speed (neural ML models – Inferences per Second (higher is better) 

Advanced Matrix Extensions (AMX) technology implemented in “Sapphire Rapids” 4th generation of Intel® Xeon® Scalable Processors CPUs allows these servers to perform matrix operations for quantized (Int8 and Bf16) ML models much faster.

(8) Based on testing performed by Grid Dynamics, February 2023.

To summarize, Dell PowerEdge MX760c has proven itself to be a faster and a more capable solution than the previous generation MX750c server for the studied use cases. For more details, see the benchmark and testing results in these documents:

A picture containing text, indoorDescription automatically generated

 

Executive Summary - Performance Comparison of  Dell PowerEdge MX760c and MX750c Server Models

Technical Paper - Performance Comparison of  Dell PowerEdge MX760c and MX750c Server Models

 

 




Home > Servers > Systems Management > Direct from Development: Tech Notes

OME OpenManage Enterprise

Upgrading To OpenManage Enterprise 4.0

Mark Maclean Mark Maclean

Thu, 07 Dec 2023 17:39:47 -0000

|

Read Time: 0 minutes

Upgrading To OpenManage Enterprise 4.0

 

Authors: Mark Maclean, PowerEdge Technical Marketing Engineering / Manoj Malhotra, Product Manager, OME

Summary                                          

Dell OpenManage Enterprise is an infrastructure management console for Dell PowerEdge Servers offering a full lifecycle management solution plus many other features. Since its initial release OpenManage Enterprise (often abbreviated to OME) has continued to develop adding new features every release. Customers on older versions of OME 3.x can migrate to OME 4.0 to leverage the new features, such as iDRAC credentials rotation and multi-faction authentication with RSA SecurID.

Migation

Overview

Unlike earlier versions, OME 4.0 does not offer an in-place upgrade, rather a transfer of existing data to a new instance of the appliance. 

The upgrade is achieved through:

  1. Deploy a new instance of OME 4.0 virtual appliance
  2. Migrate data from OME 3.10.x to OME 4.0
  3. Decommission old OME 3.10.x virtual appliance

The migration is only required when you need to upgrade from OME 3.10.x (CentOS-based) to OME 4.0 (SLES-based). In the future, when upgrading (for example, from OME 4.0 to OME 4.1) the in-place upgrade will be supported.

 

This transfer of existing OME data such as discovered servers, deployment templates, policies, logs and credentials is achieved via the migration feature built in to OME. This migration wizard is step-based to export data from the OME 3.10.x appliance and import into a fresh OME 4.0 appliance. In order to migrate, customers must have OME 4.0 installed and configured with a new IP address and administrator account. Also, the existing OME 3.10.x and new OME 4.0 instances must be able to communicate with each other over the network. 

 

Figure 1 Possible upgrade paths to OME 4.0

This migration feature is only supported when going from OME 3.10.x to OME 4.0. Customers on early versions must apply in-place upgrades to reach OME 3.10.x before migrating to OME 4.0, see figure 1. 

Enablement

As with previous versions, OME 4.0 is delivered as a virtual appliance. The virtual appliance is offered in three formats to be deployed on VMware or Microsoft Hyper-V or KMV. Once commissioned the OME appliance will manage any Dell PowerEdge host regardless of operating system. All three versions of the appliance can be downloaded from the Dell support site and detailed installation instructions for the virtual appliances are included in chapter 2 of the OME user guide. See link to the OME support page at the bottom of this document. Migration should run in a maintenance window period or a quiet time to lower the risk of critical alerts bring missed.

 

Once a new OpenManage Enterprise version 4.0 virtual appliance has been installed and the basic configuration has been applied, migration can begin. The logical steps are shown in figure 2. Starting with the existing source host, that needs to be OME version 3.10.x. The migration overseer needs local administrator/backup administrator rights to access backup/restore menu. From the drop-down backup/restore menu the migration wizard can be started. The steps include: checking SSL certificate match using the default Dell or customer supplied certificate for secure access, checking network access to the new OME 4.0 virtual appliance, supplying a passphrase to secure the backed-up data, checking for the completion of or stopping non-migration tasks. Backup encryption passphrase needs to be a minimum of 8 charters, certain characters such as commas full stops/periods and several other characters are not supported as special characters. At the end of the process the 3.10.x appliance will then automatically transition in to “maintenance mode pending” status. 
 

Note: For customer supplied certificate client and server authentication are required from issuing CA.

 

 

Figure 2 OME high level migration step

 

Then on the new OME 4.0 appliance, the first time an administrator logs into OME, an initial onboarding wizard starts automatically. There is no need to install any plugins, because the automation built into the migration tool handles this task. As part of this onboarding wizard, the migration feature can be selected to be run. 

 

Note: This migration feature can also be run from the drop-down backup/restore menu post completion of the initial wizard if required, see figure 3.

 

Figure 3 Initial OME onboarding wizard – Migration step 

The “migrate-in” steps to import data are as follows: once communications have been established via the supplied IP address and credentials, the migration engine automatically checks the plugin status and appliance status. If all is ready, then the backup passphrase used during “migrate-out” is re-entered and the “migrate-in” task is started via the import button, see figure 4. 

 

Figure 4 Migrate in wizard showing import steps

The wizard displays migration status and the various steps as they run and complete, see figure 5. These steps are also recorded in the migration log and can be viewed post-migration. As the IP address of the OME 3.10 appliance is not migrated across, post successful migration the OME 4.0 appliance executes a task to configure all the known iDRAC that have SNMP enabled with the IP details of the new management console as a trap destination.  

Figure 5 Log displaying successful migration 

If necessary, the administrator can cancel the migration at the source using the Cancel migration hyperlink in the wizard. This will take the source appliance out of maintenance mode and back into working mode.

At the end of a successful migration, the source migrate-out appliance automatically enters the Decommission Ready status. The login GUI color changes to burgundy and text is modified to warn that the appliance is decommissioned. 

NOTE: Only an admin can login to the console. 

Instead of the dashboard, a message is displayed declaring that the appliance is ready to be decommissioned. At this point, the administrator recommended action is to power down and archive the virtual appliance. The admin can bring the appliance back to the running state however, this is highly discouraged, see figure 6. Finally, it is recommended to take a backup of the newly commissioned OME 4.0 appliance post migration before any further operations.

Figure 6 Example of a decommissioned OME 3.10.x login screen

Migration will move data such as application settings, device inventory, and plugin data, see table 1 for more details. For example, the Site ID details used by the OME CloudIQ plugin is migrated across to ensure continuation of server management traffic movement and the historic power data held by power manager is also transferred. Only one backup, restore or migrate process is supported at a time. Running more than one backup/restore process at a time can lead to unexpected system behavior.

 

Table 1 Data Considered During Migrate Jobs

Item

Description

Database

 

 

 

 

 

 

  • Devices discovered and all template, profile firmware configuration compliance information related with the device.
  • Configuration information (template, profile, firmware configure compliance, etc.).
  • Job history and audit logs
  • Application settings

Configuration

files

 

 

 

 

  • Certificate store
  • Samba share files
  • Multi-factor authentication files
  • Appliance keystore used for encryption
  • Webserver configuration files
  • Source appliance information such as RAM, CPU, storage and device count 

 

Auto install

plugins

  • Source host installed plugin details will be captured, to be installed on the target appliance automatically during the restore operation.

Plugin data

restore

  • Plugin related configuration files and data is restored on the target appliance.

 

Conclusion

Using the built in migration feature, customers can upgrade to OME 4.0 quickly and easily. Using the step-based wizard with integrated pre-transfer checks, and automated data streaming makes migration simple and hassle free. For more details, see chapter 18 of the OpenManage Enterprise 4.0 User's Guide. 

References

Home > Servers > Systems Management > Direct from Development: Tech Notes

iDRAC OME OpenManage Enterprise password rotation credentials CyberArk

Announcing iDRAC Credential Management in OpenManage Enterprise 4.0

Mark Maclean Manoj Malhotra Mark Maclean Manoj Malhotra

Wed, 01 Nov 2023 15:25:10 -0000

|

Read Time: 0 minutes

Summary

Dell OpenManage Enterprise is an infrastructure management console that offers a full lifecycle management solution for Dell PowerEdge Servers and provides many other features. Since its initial release, OpenManage Enterprise (or OME for short) has continued to add new features with every release. Among the list of new features, OME release 4.0 now supports optional iDRAC credential management. iDRAC credentials are required by OME for server management tasks. This new feature offers customers support for either internal OME iDRAC password rotation or iDRAC credential retrieval from CyberArk Central Credential Provider, an external third-party credential provider solution.

iDRAC password rotation

Overview

Many customers have a password rotation policy for iDRACs. OME 4.0 can now support this requirement by removing the need for administration accounts with static credentials on managed iDRACs. This feature is supported on iDRAC 7, 8, and 9. The internal password rotation feature in OME 4.0 can create and then update credentials on a scheduled basis for the managed iDRACs. The frequency of rotation can be set in the OME password management section and can range from daily to annual, as shown in the following figure.

Figure 1.  OME iDRAC Password Management with Internal rotation selected

Enablement

After the OpenManage Enterprise version 4.0 virtual appliance has been installed, and the basic configuration has been applied, the first time an administrator logs into OME, an initial onboarding wizard executes. As part of this wizard, the iDRAC password rotation feature is enabled by default. Note: This rotation feature can only be disabled/enabled during this initial onboarding.

After the feature is enabled, the process to implement a rotation policy starts with the standard OME device discovery job, using an existing administrator level iDRAC account such as root / Calvin. To enable support for password rotation, an OME Advanced or OME Advanced+ license is required to be present on each iDRAC. During the server onboarding task, as OME discovers the new servers, OME automatically creates a unique OME service account with OME specific user account IDs and strong passwords on each iDRAC.

Figure 2.  Initial OME onboarding wizard - One-time credential management enablement

After one or more servers are onboarded and the OME service accounts have been automatically created on each iDRAC, the credential type used for each server is displayed in OME on the All Devices page. Any server where password rotation is enabled is reported as credential type “Internal”. Servers for which rotation is not supported, for example where there is no OME Advanced license, are reported as “Discovery” (which means that OME will continue to use the credentials set at discovery). See Figure 3.

Figure 3.  Credential type reporting 

Using CyberArk for iDRAC credential retrieval

Overview

CyberArk is a third-party Identity and Access Management (IAM) security tool that offers comprehensive solutions to store and manage passwords across organizations. OME can be configured to interface with the CyberArk Central Credential Provider for managing iDRAC credentials.

Enabling CyberArk

To enable CyberArk, you must configure support details about the CyberArk vault on the iDRAC Password Management page in OME (Figure 4). An OME Advanced+ license is required to be present on each iDRAC.

Figure 4.  CyberArk enablement

Servers with iDRAC CyberArk support enabled are reported as credential type “CyberArk” (Figure 5).

Figure 5.  Credential type CyberArk reporting with drop down filter by type

Conclusion

With the new credentials features now available in OpenManage Enterprise release 4.0, Dell has added additional security features to OME that can support customers’ password rotation policies.

References


Home > Servers > Systems Management > Direct from Development: Tech Notes

iDRAC OME bare metal OpenManage Enterprise auto deploy server deployment

Good, Better, Best Automation of Bare Metal Server Deployment using OpenManage Enterprise

Mark Maclean Manoj Malhotra Mark Maclean Manoj Malhotra

Wed, 01 Nov 2023 15:01:08 -0000

|

Read Time: 0 minutes

Introduction

Customers looking for a simple method to automate Dell PowerEdge server deployment at scale need to review the use of Dell OpenManage Enterprise (OME). During a typical server deployment, customers need to configure firmware settings such as boot order, RAID storage configuration details, iDRAC settings, and security standards, in addition to loading a server operating system. All these manual tasks can be repetitive and time-consuming.

Customers can save a substantial amount of administration time by leveraging automated deployment mechanisms. Dell offers many deployment solutions the choice of which depend on customer requirements and elements such as network environment and server operating system. OME offers its own solution and can also integrate into many popular third-party tools such as Ansible, Terraform, Microsoft System Center, or VMware vCenter.

This Direct from Development (DfD) tech note describes the capabilities and results that customers can expect when using OME to deploy bare metal servers. This document covers the deployment features and how to streamline server deployment when using OpenManage Enterprise orchestration controlling the iDRAC that is built into each Dell PowerEdge server.

OpenManage Enterprise – bare metal deployment

OpenManage Enterprise (OME) is Dell's on-premises server lifecycle management console. Its capabilities include discovery, monitoring, updating firmware, reporting, and of course configuration/deployment. During deployment, OME can discover a bare metal server and install both a firmware configuration setting and an operating system.

There are two typical approaches:

  • The first: A previously discovered server gets a configuration template manually pushed from OME.
  • The second is more automated: OME is configured with a list of tag numbers of arriving servers. OME then regularly examines an IP address range. When OME identifies a new server by its unique service tag, OME pushes the template to the new server's iDRAC for deployment. The customer can either obtain a list of service tag numbers associated with an order from Dell by email at the time of shipping, or collect the service tag numbers from external labels on the packaging or from the actual servers as they are being physically installed.

Each method supports an optional delivery of a bootable ISO file. This is an industry standard image file that contains all the required the files and configuration information to install an operating system. To automate the OS install, the operating system ISO is configured for an automated unattended install. All these features require no PXE boot support and no additional DNS/DHCP customization.

Server template

Let’s look at configuration settings first. This is based on iDRAC’s “server configuration profile” concept. A template encapsulates the server’s BIOS, iDRAC, and components’ firmware configuration settings as a machine-readable file. A template can consist of hundreds of firmware configuration values including iDRAC, BIOS, PERC RAID, NICs, and FC HBA settings. OME can create a template by obtaining these settings from a reference server. A customer can also clone and edit a template for simple updates, or OME can import a template exported from another OME instance.

Testing and results

To understand the profound impact of the automation of this process, we have tested it against a manual process for 1, 10*, and 100* servers[1]. Based on the testing of the OME auto deploy approach for a customer with 100* servers, we found significant differences between automation and the manual process. The following graph illustrates the considerable time savings when using automation.

In internal testing at the Dell TME server lab, we found that manually importing the server configuration profile (SCP or deployment template), and then starting the unattended OS install ISO using virtual media in the iDRAC GUI, took 9 minutes 31 seconds. However, creating an auto deployment and importing a list of target server(s) took only 13 steps in 2 minutes 11 seconds. In addition, whether creating an auto deployment job for 1, 10, or 100 servers, this task took the same amount of time. However, when using the manual process, each additional server added a further 9 minutes 31 seconds.

Testing overview

To demonstrate both the ease of use and the impact of automation, we tested two different approaches: manual versus automated. Both methods used a template approach to configure firmware settings using previously collected data. The testing was conducted using a PowerEdge R540 server with an iDRAC 9 as the target server and OME 3.10 as a deployment solution. Testing results do not include any pre-work such as exporting the server SCP server configuration profile from the iDRAC, creating file shares, collecting Dell Service Tag information, setting the initial IP address on the iDRAC, or installing OME.

Steps for a manual approach to server deployment using SCP and ISO

Included are all installation steps until the server is booting from the OS ISO that contains the OS unattended installation information.

Starting from the iDRAC home page after signing in:

  1. Select configuration from the main tabs
  2. Select server configuration profile sub-tab
  3. Select import
  4. Select network share
  5. Enter XML SCP file name 
  6. Enter IP address of file share
  7. Enter share name of file share
  8. Enter user account / password 
  9. Select All for Import Components
  10. Select Off for Power state after import 
  11. Click Import 
  12. Click Job to watch configuration task running 
  13. Wait for status to be completed (100%) 
  14. Select Virtual Media sub tab
  15. Scroll down the page to remote file share
  16. Enter Image File Path for the file share for the ISO file
  17. Enter user account / password
  18. Click Connect 
  19. Once connected click OK 
  20. Select Dashboard from the main tabs
  21. Select Start the Virtual Console
  22. Click boot
  23. From the boot controls menu click Virtual CD/DVD/ISO 
  24. Click Yes to confirm boot action
  25. Click Power
  26. Click Power on System 
  27. Confirm Power action 

Steps for an automated approach to server delopyment using OME

Starting from the iDRAC home page after signing in:

  1. From Configuration drop down menu select Auto Deploy
  2. Click Create 
  3. In the auto deploy template wizard select the required server template
  4. Select Import CSV 
  5. Click Import CSV
  6. Select required CSV file contain list of new server tag numbers 
  7. Select Target Group Information
  8. Select Boot to Network ISO 
  9. Enter ISO path and file name 
  10. Enter IP address of file share
  11. Enter user account / password 
  12. For target IP setting leave as Don’t change IP settings
  13. For Target attributes leave unchanged 

Test results data

Table 1.  Results of testing

Number of servers

 

OpenManage Enterprise  auto deploy

Manual Config

Using iDRAC

1

2 Min 11 Sec

9 mins 31 secs

10

2 Min 11 Sec

1 hour 35 mins 10 secs*

100

2 Min 11 Sec

15 hours 51 mins 40 secs*

*Projected outcomes based on analysis of results of 1. Customer results may vary.

Advanced features

In addition to the template and ISO deployment, OME offers many advanced features, such as Server-initiated discovery in which new servers are automatically registered with OME through a DNS entry. This negates the need for OME to have a discovery job running to search for new bare metal servers. OME also offers support for stateless servers with the concept of a pool of MAC and WWN addresses that can be allocated and moved as required. This means that zoning and any storage LUN allocation done using MAC addresses and address related based rules becomes mobile between physical servers.

To support the demand for further automation and integration, OpenManage Enterprise provides a RESTful API.

This fully documented API supports all features found on the GUI. Dell also maintains a collection of example PowerShell and Python scripts in the Dell repository on GitHub.

One size does not fit all

Given Dell Technologies’ open approach to servers and the large number of PowerEdge customers, Dell has developed other methods to streamline server configuration, such as:

  • Deeper VMware deployment customization available from the OME plugin OpenManage integration with VMware vCenter (OMEVV)
  • OME plugin for Microsoft System Center and Config Manager
  • Zero touch provisioning built into iDRAC that uses DHCP provisioning options 43 and 60. This method uses an iDRAC SCP xml file that can include OS unattended installation information.
  • Integration for ServiceNow, Terraform, and Ansible
  • PXE support
  • A Dell embedded lifecycle management GUI is included with iDRAC for 1-to1 deployments

A word about unattended OS installs

Using OME to install an OS on the target server(s) requires a level of OS installation automation. This is commonly referred to as an unattended OS installation. For example, Windows Server requires including a bootable ISO image with the unattended installation information contained in an autounattend.xml file to automate the installation. Microsoft’s Windows System Image Manager (WSIM), part of Windows Assessment and Deployment Kit ADK, can be used to create this answer file. A fresh bootable ISO is then created with the answer file in the root and OS install files copied from a standard Microsoft ISO image. You can use the OSCDIMG command line utility, which is shipped as part of ADK, to create a new customized bootable Windows OS unattended installation ISO. OME controls and automates the mounting and booting of this ISO on the target servers’ iDRACs during the deployment task.

Summary

Customers can realize the benefits of the deployment automation built into OpenManage Enterprise with ease. These benefits multiply as the number of servers you are deploying increases. Taking the 100-server example, it takes over 15 hours of administrator time to complete the task manually, but only 2 minutes 11 seconds of administrator time to perform the deployment using OME. Our testing showed that using automation brought major benefits, not only in administration time saved but also in accuracy, repeatability, predictability, and of course, efficiency.

References

[1] Based on internal testing at the Dell TME server lab, October 2023.

Home > Servers > Systems Management > Blogs

openmanage enterprise idrac ome 4.0

Evolution of Intelligence System Management Dell OME 4.0 Release

Mark Maclean Mark Maclean

Tue, 31 Oct 2023 15:30:28 -0000

|

Read Time: 0 minutes

Why were the films Star Wars episodes 4, 5, and 6 released before episodes 1, 2, and 3? In charge of the schedule, Yoda was!

At Dell, we like to keep things simple, so we released version 4.0 of OpenManage Enterprise following on from release 3.10.   For customers new to OpenManage Enterprise, often referred to as OME, it is Dell’s management console solution, offering comprehensive lifecycle management for Dell PowerEdge servers and so much more.

 

 

Designed to simplify, OpenManage Enterprise offers you the ability to orchestrate and automate repetitive server management tasks at scale. Customers have a choice of:

  • Using the OME GUI
  • Integrating the data from OME into VMware virtual center or Microsoft System Center via plugins
  • Leveraging the rich restful API that offers 100% the features of the GUI offering Infrastructure as code for DevOps teams

Capabilities can be extended with additional plugins, ranging from automated Dell support case creation via the Services plugin to detailed and granular firmware updating via the Update Manager plugin. With Ansible modules, Terraform support, and Service Now integration, OME supports multiple ways to extend orchestration.

Combining OpenManage Enterprise with Dell CloudIQ utilizes the power of AIOps to detect server metrics anomalies. Trends can be analyzed and forecasted for capacity, power, and processing. Server security can be proactively improved using intuitive security policies and automated relevant CVE alerting. Adding server telemetry to data from other Dell infrastructure collected by CloudIQ gives you end to end management with predictive analytics to reduce risk by addressing problems before they have an impact.

Server power usage can be optimized to support sustainability by using the Power Manager plugin to maximize visibility and management of servers’ power consumption. You can review consumption and set power strategies using telemetry visualization and reports on power utilization, power costs, CO2 emissions, thermal data, GPU data, and performance metrics. Power Manager also supports analysis of power data by virtual machines, components, individual servers, racks, or even entire data centers. The plugin can automatically respond to power and thermal events to limit issues. With built-in idle server detection and reporting, hunting down underutilized zombie servers becomes easy.

OpenManage Enterprise strengthens servers’ cybersecurity defenses, covering elements from highlighting and deploying the latest firmware updates to configuration drift detection. With support for LDAP, Microsoft AD, iDRAC password rotation, Multi-Factor Authentication (MFA) with RSA SecurID, and CyberArk credential provider, management is a breeze.

OpenManage Mobile, a mobile app that provides monitoring capabilities from an Android or Apple phones, extends OpenManage Enterprise into your pockets. The app enables you to receive push alerts from OpenManage Enterprise, view server configuration details, and even view and control server consoles on mobile devices. Say hello to managing your servers from almost anywhere!

Following are the highlights of new features for OME 4.0 and associated plugins updates. For more details, visit the OME and plugin support pages in the Resources section.

OME 4.0

  • OME can now access iDRAC passwords via CyberArk vault
  • OME can now create encrypted credentials to access iDRAC and rotate them automatically   
  • Local Multi-Factor Authentication (MFA) with RSA SecurID
  • Support new server platforms (see OME Support Matrix for details)
    • New generation PowerEdge servers
    • New XR/XE/XC/VxRail platforms
  • TLS 1.3 support
  • Secure Boot (appliance allows only OS distribution provided signed binaries to run at startup)

PowerManager plugin 3.2 (for OME 4.0)

  • GPU power and thermal data
  • Visualization of multivariate data to observe patterns and trends in server telemetry data
  • AMPs readings from Grid A and B (when more than one power supply is deployed)
  • BIOS system profile and workload profile reporting

OpenManage Enterprise integration For VMware vCenter or OMEVV (for OME 4.0)

  • Support for the vSphere 8.0 U2 
  • Support for non-clustered hosts baseline profiles, firmware compliance, and application of firmware updates
  • Apply System Profile to baremetal hosts (System Profile = Deployment Template of OME)
  • vLCM support for standalone hosts 
  • OMMP 3.0 new management park for VMware Aria Operations for OMEVV plugin

Update Manager plugin 1.5 (for OME 4.0)

  • Set the Baseline to any Repository Version
  • Automate Repository Refresh on server or component update, such as add/remove
  • Compare and report different repositories

OME Integration for Microsoft Systems Center Plugin (OMEMSSC) 1.2 

  • SCOM Web Console Support 
  • SCOM Alert Auto Resolution 
  • SCOM Resource Pool Management 

CloudIQ Plugin (CIQP) 2.0 

OME Services Plugin (OME-S) 4.0 

Are you getting the maximum benefits from Dell OpenManage? As I always say, where there are servers, there's a server management need! 

 

Resources


Author: Mark Maclean, OpenManage Technical Marketing Engineering

Linkedin : uk.linkedin.com/in/markmacleandell

 

Home > Servers > Modular Servers > Direct from Development: Tech Notes

GPU MX7000 Liqid composable infrastructure CDI

Reference Architecture: GPU Acceleration for Dell PowerEdge MX7000

Mark Maclean George Wagner- Liqid Mark Maclean George Wagner- Liqid

Tue, 26 Sep 2023 16:34:19 -0000

|

Read Time: 0 minutes

Summary

Many of today’s most demanding applications can make use of GPU acceleration. Liqid partnered with Dell Technologies, to enable the rapid and dynamic provisioning of PCIe GPUs, as well as FPGA, and NVMe to Dell PowerEdge MX7000 compute sleds. The goal being to ensure that workload performance needs are met for the most accelerator hungry applications.

Background

The Dell PowerEdge MX7000 Modular Chassis simplifies the deployment and management of today’s most challenging workloads by allowing IT administrators to dynamically assign, move, and scale shared pools of compute, storage, and networking resources. It provides IT administrators the ability to deliver fast results, eliminating managing and reconfiguring infrastructure, to meet the ever-changing needs of their end users. For compute intensive AI-driven compute environments and high-value applications, Liqid Matrix software enables the ability to add physical GPUs on-demand to the PowerEdge MX7000.

GPU acceleration for PowerEdge MX7000

The following figure shows the essential MX7000 GPU expansion components:

Figure 1.  Deploying GPU into a PowerEdge MX7000

Liqid SmartStack Composable Systems for PowerEdge MX7000

Liqid SmartStacks are fully validated Liqid composable solutions designed to meet your most challenging GPU requirements. Available in four sizes, with a maximum capacity of 30 GPUs and 16 servers per system, each SmartStack includes everything you need to deploy GPUs to MX7000 systems.

Liqid SmartStack 4410 Series Technical Specifications

Table 1.  Liqid SmartStack Solutions

   

SmartStack 10

SmartStack 20

SmartStack 30

SmartStack 30+

Description

10 GPU / 4 Host Capacity

20 GPU / 8 Host Capacity

30 GPU / 6 Host Capacity

30 GPU / 16 Host Capacity

Supported Device Types

GPU, NVMe, FPGA, DPU

GPU, NVMe, FPGA, DPU

GPU, NVMe, FPGA, DPU

GPU, NVMe, FPGA, DPU

Max Devices

10x Full-height, full-length (FHFL) 10.5”, dual-slot

20x Full-height, full-length (FHFL) 10.5”, dual-slot

30x Full-height, full-length (FHFL) 10.5”, dual-slot

30x Full-height, full-length (FHFL) 10.5”, dual-slot

Max Hosts Supported

4x Host Servers

8x Host Servers

6x Host Servers

16x Host Servers

Max Composed Devices Per Host

4x Devices

4x Devices

4x Devices

4x Devices

PCIe Expansion Chassis

1x Liqid EX-4410 PCIe Gen4

2x Liqid EX-4410 PCIe Gen4

3x Liqid EX-4410 PCIe Gen4

3x Liqid EX-4410 PCIe Gen4

PCIe Fabric Switch

None

1x 48 Port

1x 48 Port

2x 48 Port

PCIe Host Bus Adapter

PCIe Gen3 x4 Per Compute Sled (1 or more)

PCIe Gen3 x4 Per Compute Sled (1 or more)

PCIe Gen3 x4 Per Compute Sled (1 or more)

PCIe Gen3 x4 Per Compute Sled (1 or more)

Rack Units

5U

10U

14U

15U

Composable Devices

Go to liqid.com/resources/library, for a current hardware compatibility list of composable PCIe devices

Implementing GPU expansion for MX

GPUs are installed into the PCIe expansion chassis. Next, U.2 to PCIe Gen3 adapters are added to each compute sled that requires GPU acceleration. They are then connected to the expansion chassis (Figure 1). Liqid Command Center software enables discovery of all GPUs, making them ready to be added to the server over native PCIe.

FPGA and NVMe storage can also be added to compute nodes in tandem. This PCIe expansion chassis and software are available from Dell.

Software-defined GPU deployment

Liqid Matrix software enables the dynamic allocation of GPUs to MX compute sleds at the bare metal level (GPU hot plug supported) via software composability. Up to 4 GPUs can be composed to a single compute sled, using Liqid UI or RESTful API, to meet end user workload requirements. To the operating system, the GPUs are presented as local resources directly connected to the MX compute sled over PCIe (Figure 2). All operating systems are supported including Linux, Microsoft Windows, and VMware ESXi. As workload needs change, using management software to add or remove resources, such as GPU, NVMe SSD and FPGA on the fly.

Enabling GPU Peer-2-Peer capability

A fundamental capability of this solution is the ability for RDMA Peer-2-Peer between GPU devices. Direct RDMA transfers have a massive impact on both throughput and latency for the highest performing GPU-centric applications. Up to 10x improvement in performance has been achieved with RDMA Peer-2-Peer enabled. The following figure provides an overview of how PCIe Peer-2-Peer works (Figure 3).

Figure 3.  Peer-2-Peer performance

Bypassing the x86 processor, and enabling direct RDMA communication between GPUs, unlocks a dramatic improvement in bandwidth, and a reduction in latency. This chart outlines the performance expected for GPUs that are composed to a single node with GPU RDMA Peer-2-Peer enabled (Table 2). 

Table 2.  Peer-2-Peer Performance Comparison

      

Peer-to-Peer Disabled

Peer-to-Peer Enabled

Improvement 

Bandwidth

8.6 GB/s

25.0 GB/s

3X More Bandwidth

Latency

33.7 µs

3.1 µs

11X Lower Latency

Application Performance

Scalable GPU performance is critical for successful outcomes. Tables 4 and 5 present a performance comparison of the Dell MX705c Compute Sled configured with varying numbers of NVIDIA A100 GPUs (1x, 2x, 3x, and 4x) in two different precisions: FP16 and FP32. These results indicate near-linear growth scale.

Table 3.  FP16 GPU performance – MX7000 with NVIDIA A100 GPUs, P2P enabled

FP16

BERT-Base

BERT-Large

GNMT

NCF

ResNet-50

Tacotron 2

Transformer-XL Base

Transformer-XL Large

WaveGlow

1x A100

374

119

187,689

37,422,425

1,424

37,047

37,044

16,407

198,005

2x A100

638

157

240,368

68,023,242

2,627

72,631

73,661

32,694

284,709

3x A100

879

208

313,561

85,030,276

3,742

87,409

102,121

45,220

376,094

4x A100

1,088

256

379,515

98,740,107

4,657

112,282

129,336

58,503

460,793

Table 4.  FP32 GPU performance – MX7000 with NVIDIA A100 GPUs, P2P enabled

FP32

BERT-Base

BERT-Large

GNMT

NCF

ResNet-50

Tacotron 2

Transformer-XL Base

Transformer-XL Large

WaveGlow

1x A100

184

55

100,612

24,117,691

891

36,953

24,394

10,520

198,237

2x A100

283

66

115,903

38,107,456

1,610

72,218

50,108

20,941

284,047

3x A100

380

88

149,359

47,133,830

2,257

84,735

66,869

28,748

370,425

4x A100

464

108

180,022

57,539,993

2,840

104,398

93,394

35,927

460,492

Conclusion

Liqid composable GPUs for the Dell PowerEdge MX7000 and other PowerEdge rack mount servers unlocks the ability to manage the most demanding workloads in which accelerators are required for both new and existing deployments. Liqid collaborated with Dell Technologies Design Solutions to accelerate applications through the addition of GPUs to the Dell MX compute sleds over PCIe.

Learn more | See a demo | Get a quote

This reference architecture is available as part of the Dell Technologies Design Solutions. To learn more, contact a Design Expert today https://www.delltechnologies.com/en-us/oem/index2.htm#open-contact-form.



Home > Servers > PowerEdge Components > Direct from Development: Tech Notes

NVMe SSD SAS SATA local storage

Choosing the Most Appropriate Server SDD Interfaces: E3.S, NVMe, SAS, or SATA

Bill Poch Mark Maclean Bill Poch Mark Maclean

Sun, 10 Sep 2023 15:32:11 -0000

|

Read Time: 0 minutes

Summary

This document is a straightforward guide to help PowerEdge customers choose the most appropriate SSD type, based on their business needs and goals. 

As new generations of CPUs and servers are released, they frequently bring new technologies such as increased PCIe bus speeds and new storage formats, such as the EDSFF E3.S form factor for NVMe PCIe 5 Solid State Drives (SSDs), as released in early 2023. PowerEdge customers can optimize their local storage configurations based on their applications and business needs. Multiple factors must be taken into consideration to make an informed decision, such as workload demands, budget, scale, and even roadmap. Still, when all of these factors are understood, it can be difficult to determine the best choice of SSD interface among NVMe, SAS, Value SAS, and SATA.

This DfD (Direct from Development) tech note is provided to simplify and guide customers in their choice of SSD. We hope customers will find it to be a valuable resource when it becomes unclear which storage medium is the optimal choice. First, let’s summarize the history and architecture of the NVMe, SAS, Value SAS, and SATA SSD interfaces:

NVMe (Non-Volatile Memory Express)

Since it came to market in 2011, the NVMe interface remains the class of flash storage with the highest performance. The driving architectural differentiator of NVMe is that it uses the PCIe interface bus to connect directly to the CPU and streamline the data travel path. This design contrasts with SAS and SATA, which require data to first traverse to an HBA disk controller before reaching the CPU. By removing a layer from the stack, the travel path is optimized and produces reduced latency and improved performance. Scalability is also significantly improved, because NVMe drives can go beyond the traditional four lanes by using lanes from the same “pool” of lanes connected to the CPU. EDSFF including EDSFF E3.S are the next generation of NVMe SSDs. These form factors enable higher server storage density. Furthermore, NVMe performance continually improves as each new generation of the PCIe standard becomes available.

Figure 1.  Latest Dell PowerEdge R7625 with 32 x E3.S drives

SAS (Serial Attached SCSI)

The SAS interface was released a few years after SATA and introduced new features that are beneficial for modern workloads. Instead of building upon the ATA (Advanced Technology Attachment) standard used in SATA, SAS serialized the existing parallel SCSI (Small Computer System Interface) standard. SAS cable architecture has four wires within two cables, creating more channels available for moving data and more connectors available for use by other devices. Furthermore, the channels are full duplex, allowing for reads and writes to traverse concurrently. Improved reliability, error reporting, and longer cable lengths were also introduced with SAS. Value SAS is often alongside SAS using the same interface but using lower performance devices, giving customers the technical benefit of SAS at a lower a price point. SAS improvements are made to this day, with SAS4 (24G) now available in certain supported PERC 12 (PowerEdge Raid Controller) configurations. For this reason, SAS still remains valuable and relevant within the market.

SATA (Serial Advanced Technology Attachment)

The SATA interface was released in 2000 and is still commonly adopted within modern servers because it is the most affordable of the SSD interface options. It replaced parallel ATA with serial ATA, which resolved various performance and physical limitations at that time. The SATA cable architecture has four wires within one cable—two for sending data and two for receiving data. These four channels are half-duplex, so data can only move in one direction at a time. At 6Gb/s, SATA write speeds are sufficient for storing information, but its read speeds are slow compared to more modern interfaces, which limits its application use for modern workloads. The last major SATA revision was in 2008, and SATA will not see further advancement in the future.

Figure 2.  Random 4KiB 70% read / 30% write IOPS variances for each storage interface

Table 1 lists key metrics for five storage-drive types most commonly attached to PowerEdge servers: Enterprise NVMe, Data Center (DC) NVMe, Enterprise SAS, Value SAS, and SATA. This comparison helps clarify which storage interface type is most applicable to specific business needs and goals.

Table 1.  Ranking performance metrics of Enterprise NVMe, DC NVMe, Enterprise SAS, Value SAS, and SATA drives

Performance: Performance can be measured in various ways. For this example, Random 4 KiB 70/30 (70% reads, 30% writes) data was compared and published here by Dell, with higher IOPS being better. Figure 2 illustrates the following IOPS performance variances:

  • E3.s NVMe Enterprise class drives produce 1.48x more IOPS than Enterprise NVMe SSDs. 
  • Enterprise NVMe SSDs produce 1.13x more IOPS than DC NVMe SSDs. 
  • DC NVMe SSDs produce 1.99x more IOPS than Enterprise SAS SSDs. 
  • Enterprise SAS SSDs produce 1.42x more IOPS than Value SAS SSDs. 

Lastly, Value SAS SSDs produce 2.39x more IOPS than SATA. Random 4KiB 70% read / 30% write IOPS variances for each storage interface

Latency: The NVMe protocol reduces the number of touchpoints that data must travel (bypassing the HBA) to reach the CPU. It also has less overhead, giving it significantly lower latency than SAS and SATA. The SAS protocol is full-duplex (as opposed to half-duplex) and offers two channels (as opposed to one) for data to use, giving it over 50% lower latency than SATA.

Price: According to Dell pricing in Q1 2022, SATA SSDs are the least expensive storage interface, at ~0.9x the price of Value SAS SSDs. Value SAS SSDs are ~0.85x the price of DC NVMe SSDs. DC NVMe SSDs are ~0.85x the price of Enterprise SAS SSDs. Enterprise SAS SSDs are ~0.97x the price of Enterprise NVMe SSDs. Pricing is volatile and these number variances are subject to change at any time.

Performance per price: PowerEdge customers who have not identified which metric is most important for their business goals should strongly consider performance (IOPS) per price (dollar) to be at the top of the list. Because NVMe has such a significant performance lead over SAS and SATA, it is easily the golden standard for performance per price. DC NVMe SSDs have the best performance per price, followed closely by Enterprise DC NVMe SSDs, followed by Value SAS SSDs, followed closely by SAS SSDs, followed by SATA SSDs. This tech note gives more performance/price detail.

Scalability: Currently, NVMe shows the greatest promise for wider-scale implementation due to the abundance of lanes that can be available with low overhead. However, it can be a costly investment if existing data center infrastructures must be upgraded to support the NVMe I/O protocol. SAS is more flexible, because SAS expanders are cost-effective, and most data center infrastructures already have the required hardware to support it. However, SAS does not have the potential to scale out as aggressively as NVMe. SATA does not scale well with SSDs.

Ongoing development: The NVMe interface has consistent and substantial advancements year-over-year, including updates such as NVMe 2.0b (released in Oct. 2022) and PCIe 5.0 (released on Intel CPUs in Jan. 2023). The SAS interface also has regularly cadenced updates, but the impact is mostly marginal, except for the recent SAS4 (24G) update. There are no plans to extend the capabilities of the SATA interface beyond the current limitations.

Assigning these ranks for each storage interface and metric, and explaining why the rank was given, will make it easier to understand which drive type will be the most valuable in relation to business needs and goals.

Guidance in accordance with business goals

Each business is unique and will have different requirements for their storage drives. Factors such as intended workload, business size, plan to scale, budget, and so on, should be considered to make a confident investment decision. Although this decision is ultimately up to each business, we provide the following guidelines to help businesses that are still undecided to make an educated choice:

 

Enterprise NVMe SSD: Businesses that desire maximum performance and have a flexible budget should consider purchasing Enterprise NVMe SSDs. Storage I/O heavy workloads such as HPC or AI will immediately benefit from the additional cache gained from the non-volatile nature of this storage interface. The fast-paced performance growth seen in Enterprise NVMe SSDs will also allow smaller workloads like databases or collaboration to easily keep up with the ever-increasing size of data. Ultimately, because Enterprise NVMe undergoes consistent valuable changes every year, such as performance increases and cost reduction/optimization, we recommend futureproofing your data center with it.

DC NVMe SSD: Businesses that desire a budget conscious NVMe solution, in addition to the greatest value, should consider purchasing DC NVMe SSDs. These drives have the same value proposition as for Enterprise NVMe SSDs, but with a sizeable price reduction (0.83x) and performance hit (0.86x). Businesses that want to get the best value will be pleased to know that DC NVMe drives have the best performance-per-price.

Enterprise SAS: Businesses that desire to continue using their existing SCSI-based data center environment and have maximum SAS performance should consider purchasing Enterprise SAS SSDs. Although the Enterprise SAS interface does not currently have any ranking leadership for performance or pricing, it is established in the industry as highly reliable, cost-effective to scale, and shows promise for the future, with 24G available. Enterprise SAS SSDs will adequately handle medium-duty workloads, such as databases or virtualization, but will operate best when mixed with NVMe SSDs if any heavy-duty workloads are included.

Value SAS: Businesses that desire a budget-conscious SAS solution should consider purchasing Value SAS SSDs. These drives have the same value proposition as for Enterprise SAS SSDs, but with both a sizeable price reduction (0.73x) and performance hit (0.71x). For this reason, it has a slightly lower performance-per-price than Enterprise SAS, and is therefore more of a “value” play when compared to SATA. This storage interface has a purpose for existing though, because small-to-medium businesses with a smaller budget can leverage this lower-cost solution while still receiving the many benefits of the SAS interface.

SATA: Businesses that desire the lowest price storage interface should consider purchasing SATA SSDs. However, caution should be applied with this statement, because there is currently no other value proposition for SATA SSDs, and the price gap for these flash storage interfaces has been shrinking over time, which may eventually remove any valid reason for the existence of SATA. With that said, SATA is currently still a solid choice for light workloads that are not read-heavy.

Figure 3.  Latest Dell PowerEdge MX760c with 8 x E3.S drives per sedge

Conclusion

The story of competing NVMe, SAS, and SATA storage interfaces is still being written. Five or more years ago, analysts made the argument that although NVMe has superior performance, its high cost warranted SAS the title of ‘best value for years to come’. What we see today is a rapidly shrinking price gap for all of these interfaces. We observe that SATA performance has fallen far behind SAS, and very far behind NVMe, with no plan to improve its current state. We also see NVMe optimizing its performance and price-point to yield more market share every year. Most importantly, we expect rapid growth in the industry adoption of heavier workloads and ever-increasing data requirements. Both storage drive and industry trends lead us to believe that the best option for any business desiring to build a future-proofed data center would be to begin making the investment in NVMe storage. However, the remaining types of storage still hold value for varying use cases. It is the customer’s choice about which storage type is best for their business goals. We hope this guide has helped to clarify the available options.

Home > Servers > Systems Management > Blogs

VMware OpenManage OMEVV OME OMIVV OpenManage Enterprise

Migrating OMIVV to OMEVV Made Simple

Mark Maclean Mark Maclean

Tue, 01 Aug 2023 14:05:13 -0000

|

Read Time: 0 minutes

Why did the virtual machine go on a diet? Because it had too many bytes and needed to lose some weight. Recently the Dell OpenManage portfolio also went on a slight diet, consolidating OpenManage Integration for VMware vCenter (OMIVV) into a new plug-in for OpenManage Enterprise. The new solution, OpenManage Enterprise Integration for VMware vCenter (OMEVV), offers additional features such as support for 16G servers, compatibility with vCenter 8 and vSphere ESXi 8, and integration into the wider OpenManage Enterprise ecosystem. 

To streamline customer migrations from OMIVV to OMEVV, the latest OMIVV release, version 5.4.1, includes a migration tool. Dell Technologies has published a white paper detailing the migration steps: Migrating from OMIVV to OMEVV. The white paper discusses both the migration tool and also relevant OMEVV REST APIs for future automation.

The OMIVV to OMEVV Migration Tool supports:

  •  VMware ESXi hosts that are inventoried and managed in OMIVV
  •  Updates to event and alarm settings
  •  Changes to severity of Dell health update notifications for VMware Proactive High Availability (PHA) event rules

Just ensure that the Dell servers to be migrated are compliant with the compatibility matrix. For example, only PowerEdge 13th Generation servers or higher are supported. Also, an OpenManage Enterprise Advanced+ license is required on each of the servers that will be migrated to OMEVV.  

The migration tool is launched through https://<OMIVV-instance-IP>/MigrationTool/login. Once logged in, administrators are reminded of the migration prerequisites, such as OpenManage Enterprise must be deployed and the OMEVV plug-in must be accessible from OMIVV through the network. Once the connection from OMIVV is authenticated, single or multiple vCenter instances can be selected for migration.

Details of the migration status can be displayed as the task runs, and, once complete, a summary of the migration tasks is displayed. The selected vCenter instances are automatically unregistered from OMIVV and registered in OMEVV alongside all the hosts transferring to the OpenManage Enterprise plug-in. Details of the migration jobs are also recorded in the OpenManage Enterprise event log.

The transition from an OMIVV stand-alone appliance to the OMEVV plug-in enables customers to reduce the complexity of data center management by streamlining the tools associated with managing and monitoring Dell servers in the vSphere environment. At the same time, the OMEVV plug-in gives customers access to the wider OpenManage Enterprise ecosystem. This includes Power Manager, phone support through the Dell services plug-in, and integration with CloudIQ, Dell's cloud-based AIOps monitoring and management solution for Dell's data center infrastructure portfolio. 

The inclusion of the migration tool in OMIVV 5.4.1 helps customers of all sizes migrate to the newer OMEVV server management architecture with the latest features and benefits of automation, security, and efficiency. 

Resources

Author:

Mark Maclean, PowerEdge Technical Marketing Engineering
Linkedin : uk.linkedin.com/in/markmacleandell

Home > Servers > Modular Servers > Direct from Development: Tech Notes

PowerEdge PowerFlex PowerStore MX7000 MX8116n MX750c MX760c

PowerEdge MX and NVMe/TCP Storage

Mark Maclean Claire O'Keeffe Mark Maclean Claire O'Keeffe

Fri, 28 Jul 2023 17:46:12 -0000

|

Read Time: 0 minutes

Introduction

Dell PowerEdge MX was introduced in 2018, and since then Dell Technologies has continued to add new features and functionality to the platform. One such area is the support of NVMe over TCP (NVMe/TCP). As new applications such as Artificial Intelligence and Machine Learning (AI/ML) and the continuing consolidation of virtual workloads demand greater storage performance, NVMe/TCP brings performance improvements over protocols such as iSCSI at a lower price point than compatible Fibre Channel (FC) infrastructure (see Transport Performance Comparison). Incorporating this protocol into storage solution architecture brings new opportunities for higher performance using Ethernet and retiring FC infrastructure. 

This tech note describes the architecture required to build PowerEdge MX solutions that use NVMe/TCP, simplifying connectivity to external storage arrays by reducing the physical network and streamlining protocols. It describes the value proposition and technology building blocks and provides high-level configuration examples using VMware.

Technology architecture  

The four components of a Dell NVMe/TCP solution are a compute layer with the appropriate host network interface enabled for NVMe/TCP, high-performance 25 GbE or 100 GbE switching network, storage array supporting NVMe/TCP, and, finally, a management application to configure and control access. Dell offers several end-to-end PowerEdge MX base storage solutions that support NVMe/TCP on either 25 GbE or 100 GbE networking. The solutions include PowerEdge servers, PowerSwitch networking, and several Dell storage array products with Dell SmartFabric Storage Software for zoning management.

Figure 1.  Example of NVMe/TPC SAN and LAN architecture

Dell continues to validate and expand the matrix of supported hardware and software. The document, NVMe/TCP Host/Storage Interoperability Simple Support Matrix, is available on E-Lab Navigator and updated on a regular basis. It includes details about tested configurations and supported storage arrays, such as PowerStore and PowerMax.

Table 1.  Example of supported configurations extracted from NVMe/TCP Host/Storage Interoperability Simple Support Matrix

Server

NIC

MX Firmware/
 Driver Baseline

Storage Array

Boot From San

OS

MX750c

MX760c

Broadcom 57508 dual 100 GbE Mezz card

MX baseline 2.10.00

PowerMax 2500/8500         OS 10.0.0 / 10.0.1

No

VMware ESXi 8.0

MX760c

 

Broadcom 57504 dual 25 GbE Mezz card

MX baseline 2.00.00

PowerMax 2500/8500        OS 10.0.0 / 10.0.1

No

VMware ESXi 8.0

MX750c

MX760c

Broadcom 57508 dual 100 GbE Mezz card

MX baseline 2.10.00

PowerStore 500T/1000T 3000T/7000T 9000T

No

VMware ESXi 8.0

MX760c

Broadcom 57504 dual 25 GbE Mezz card

MX baseline 2.00.00

PowerStore 500T/1000T 3000T/7000T 9000T

No

VMware ESXi 8.0

These are the minimum supported versions. See the Dell support site for the latest approved version.

PowerEdge MX

The 100 GbE mezzanine card was added to the PowerEdge MX compute sled connectivity portfolio in April 2023. The PowerEdge MX offers a choice of both 25 GbE and 100 GbE at the compute sled, with a selection of various networking I/O modules. 

Figure 2.  MX chassis 100 GbE architecture

IP switch fabric  

NVMe/TCP traffic uses traditional TCP/IP protocols, meaning the network design can be quite flexible. Often, existing networks can be used. The best-practice topology dedicates switches and device ports for storage area network (SAN) traffic only. In Figure 1, local area network (LAN) traffic connects to a pair of switches northbound from Fabric A in the MX chassis. Fabric B connects to dedicated, air-gapped switches to reach the storage array. 

For more details about NVMe/TCP networking, see the SmartFabric Storage Software Deployment Guide.

For 25 GbE connectivity, there are a number of options, starting with dual- or quad-port mezzanine cards, with a selection of pass-through or fabric expansion modules or full switches integrated into the PowerEdge MX chassis. For scalability, a pair of external top-of-rack (ToR) switches are implemented for interfacing with the storage array.

For 100 GbE end-to-end connectivity, the MX8116n Fabric Expander Module is a required chassis component for the PowerEdge MX platform. A Z9432F-ON ToR switch is then required for MX8116n connectivity. The Z9432F supports 32 ports x 400 GbE (or 64 ports x 200 GbE using breakouts or 128 ports x multiple interface speeds from 10 GbE to 400 GbE ports using breakouts). So how does the Z9432F-ON work in the MX 100 GbE solution? The 400 GbE ports on the MX8116n connect to ports on the PowerSwitch. The solution scales the network fabric to 14 chassis with 112 PowerEdge MX compute sleds. Each MX7000 chassis uses only 4 x 400 GbE cables, dramatically reducing and simplifying cabling (see Figure 2). 

Storage

Taking Dell PowerFlex as an example, NVMe/TCP is supported in the following manner: PowerFlex storage nodes are joined in storage pools. Typically, similar disk types are used within a pool (for example, a pool of NVMe drives or a pool of SAS drives). Volumes are then carved out from that pool, meaning the blocks/chunks/pages of that volume are distributed across every disk in the pool. Regardless of the underlying technology, these volumes can be assigned an NVMe/TCP storage protocol interface ready to be accessed across the network from the hosts accordingly.

Let’s look at another example—this one for Dell PowerStore, which is an all-NVMe flash storage array. A volume can be created and then presented using NVMe/TCP across the network. This allows the performance of the NVMe devices to be shared across the network, offering a truly end-to-end NVMe experience. 

NVMe/TCP zoning

An advantage and challenge of Ethernet-based NVMe/TCP is that it scales out from tens to hundreds to thousands of fabric endpoints. This quickly becomes arduous, error prone, and highly cost inefficient. FC excels at automatic endpoint discovery and registration. For NVMe/TCP to be a viable alternative to FC in the data center, it must provide users with FC-like endpoint discovery and registration, and FC-like zoning capabilities. Dell SmartFabric Storage Software (SFSS) is designed to help automate the discovery and registration of hosts and storage arrays using NVMe/TCP. 

 

Figure 3.  Dell SmartFabric Storage Software (SFSS)

Dell SFSS is a centralized discovery controller (CDC). It discovers, registers, and zones the devices on the NVMe/TCP IP SAN. Customers can control connectivity from a single, centralized location instead of having to configure each host and storage array manually. 

VMware support

In October 2021, VMware announced support of the NVMe/TCP storage protocol with the release of VMware vSphere 7 Update 3. VMware has since included support in vSphere 8. It is a simple task to configure an ESXi host for NVMe/TCP. Just select the adapter from the standard list of storage adapters for each required host. Once the adapter is selected in vSphere, the new volume appears automatically as a namespace, assuming access has been granted through SFSS. Any storage volume accessed through NVMe/TCP can be used to create a standard VMFS datastore.   

Figure 4.  Adding NVMe/TCP adapter in vSphere

Conclusion

NVMe/TCP is now a practical alternative to iSCSI and a replacement to older FC infrastructure. With NVMe/TCP's ability to provide higher IOPS at a lower latency while consuming less CPU than iSCSI, and offering similar performance to FC, NVMe/TCP can provide an immediate benefit. In addition, for customers who have cost constraints or skill shortages, moving from FC to NVMe/TCP is a viable choice. Dell SmartFabric Storage Software is the key component that makes scale-out NVMe/TCP infrastructures manageable. SFSS enables an FC-like user experience for NVMe/TCP. Hosts and storage subsystems can automatically discover and register with SFSS so that a user can create zones and zone groups in a familiar FC-like manner. Using Dell PowerEdge MX as the server compute element dramatically simplifies physical networking so customers can more quickly realize NVMe/TCP storage benefits.

References


Home > Servers > Systems Management > Blogs

OpenManage backup systems management OME OpenManage Enterprise restore

Unveiling the Power of OpenManage Enterprise Backup and Restore

Mark Maclean Mark Maclean

Fri, 16 Jun 2023 15:18:13 -0000

|

Read Time: 0 minutes

What does Roger Federer call his backup racket? The Federer Reserve, but as all server administrators know, having a backup is no joke!

Given that OpenManage Enterprise delivers key deployment, monitoring, updating, and reporting, ensuring the availability of this management solution is a key requirement. Earlier this year Dell Technologies released a backup and restore feature for OpenManage Enterprise and plugins. This feature is a more convenient way of backing up OME, because it enables an administrator to do so without the need for hypervisor snapshots. One can now back up the entire appliance configuration and data, including managed device information, custom groups, and discovery jobs such as the discovery task, alert policies, installed plugin data, and logs.

 

A backup task can be scheduled to run daily, weekly, or immediately. Backup administrator rights are required to execute these tasks. When backing up, administrators are required to provide a security passphrase. This is used as a security measure because during a restore, administrators are challenged for the passphrase, and data is restored only when there is a match.

The backup task supports HTTPS, CIFS, or NFS network shares as a target destination and the backup is encrypted to ensure the security of the appliance configuration data. (Note that the appliance is in a maintenance state during the backup, all new task scheduling is suspended, and no operations can be performed on the console during this time.)

In the unfortunate event of a deletion, corruption, or system failure, the restore capabilities of OpenManage Enterprise are easy. If required, data can be restored to the existing or a new instance of OpenManage Enterprise running the same version, using the same or larger sized virtual appliance.

This backup & restore capability for appliance data is a major feature to enhance the resilience of the OpenManage Enterprise management solution.

Resources

Learn more at: Support for Dell OpenManage Enterprise 

Author: Mark Maclean, PowerEdge Technical Marketing Engineering

Linkedin

Contributors: Manoj Malhotra, Product Manager; Pushkala Iyer, Product Planner

Home > Servers > Systems Management > Direct from Development: Tech Notes

PowerEdge VMware OpenManage OMEVV OMIVV

OpenManage Enterprise Integration for VMware Virtual Center Overview

Mark Maclean Manoj Malhotra Mark Maclean Manoj Malhotra

Thu, 27 Apr 2023 19:52:10 -0000

|

Read Time: 0 minutes

Summary

OpenManage Enterprise Integration for VMware vCenter (OMEVV) offers extensive functionality to manage Dell PowerEdge server hardware and firmware from within VMware vCenter. Delivered as a simple virtual appliance, OpenManage Enterprise, with its integration for VMware vCenter plugin architecture, has no dependence on local software agent installations on the managed hosts. This tech note highlights the key features of the plugin which provides deep level details for inventory, monitoring, firmware updating, and deployment of Dell servers, all from within the vCenter console GUI.

IT administrators face many challenges managing physical servers in VMware environments. This process can be complex and time-consuming. VMware vCenter provides a scalable platform that forms the foundation for VMware software management of these environments. The addition of OpenManage Enterprise Integration for VMware vCenter allows IT administrators to manage both their virtual and physical infrastructure from within vCenter, thus dramatically simplifying overall management. Additional PowerEdge menu options are added in vCenter, alongside Dell server data, to monitor and manage physical servers. These options also include semi-automated updates of server firmware and bare-metal deployment of ESXi hypervisor on Dell PowerEdge servers, including modular systems. 

OpenManage integration architecture

OpenManage Enterprise Integration for VMware vCenter is a plugin to the OpenManage Enterprise virtual appliance for server management. The OpenManage Enterprise virtual appliance is a virtual machine image that can be deployed easily containing Dell’s server management software. It can be installed on any ESXi, Microsoft Hyper-V, or Red Hat Linux KVM host.

Figure 1.  High level architecture (vRealize Aria, previously known as vROps or vRealize Operations, integration is expected to be released 2nd half 2023)

The OpenManage integration provides native integration into the vCenter Server console interface. It helps make the vCenter console the single pane of glass to manage both the virtual and physical environments. The integration goes beyond a simple “link and launch” to existing Dell system management tools. Instead, it brings server management tasks and server data natively into the vCenter console. An API interface is also supported for customers who want to automate or integrate with additional tools. VMware administrators do not need to learn to use additional tools for many of the PowerEdge management tasks because these are integrated into the menus that they are already familiar with within vCenter.

Managing Dell hosts

OpenManage Integration provides deep level details for inventory, monitoring, and alerting of Dell hosts (that is, physical servers) within vCenter and recommends or performs vCenter actions based on Dell hardware events. From the OpenManage Enterprise Plugin, administrators can view details of managed servers.

The dashboard view provides the health status of the monitored clusters and physical servers alongside host information, including warranty status. It also provides appliance information, such as the number of vCenters monitored, baseline compliance status, and OMEVV job status.

Figure 2.  OMEVV Dashboard

At the Hosts & Chassis level, the view provides the health status of the physical server. It also displays server details including power status, iDRAC IP, model name, service tag, asset tag, warranty data, last inventory scan, ESXi Hypervisor version, and core firmware versions.

Figure 3.  OMEVV list of managed hosts

The vSphere inventory view provides additional details. At the host level, the OMEVV host information view provides deeper server and component details, along with data, about local storage. It also includes server information, such as comprehensive firmware version reporting, power usage data, iDRAC IP address, Service Console IP, warranty type with expiration information, and recent system event log entries. The System Event Log (SEL) provides details such as iDRAC login events, firmware update jobs, and server reboots. Host subsystem health is displayed in the host summary area; detailed component health is available in OpenManage Enterprise.

Figure 4.  OMEVV server and component health 

There are a few prerequisites to meet for a Dell server to be managed by OMEVV, such as licensing requirements and minimum firmware versions. The OMEVV management compliance wizard ensures that the hosts have met these requirements. After it is discovered and selected as a managed host, a server will appear in the OpenManage Enterprise plugin group for OMEVV and in the list of managed hosts in the OMEVV plugin (see Figure 5).

For detailed steps about how to use the configuration wizard, see the OpenManage Integration User Guide. Although VxRail monitoring is supported by the core OME console, and the power manager plugin will manage VxRail power and thermal data, OMEVV does not support VxRail because VxRail has its own life cycle management solution. For more information about supported server models and iDRAC versions, see the OMEVV support matrix and the OpenManage Enterprise support matrix

Figure 5.  OMEVV managed server group in OME

Proactive automated actions to hardware alerts

The OpenManage Integration contains a predefined list of hardware events with recommended actions within vCenter which are triggered by Dell hardware events. Critical hardware alarms, such as loss of redundant power, can be enabled to put the affected host into VMware maintenance mode. If VMware DRS is configured, the VMs are evacuated by vMotion to another VMware host in the cluster. (Note: By default, all Dell alarms are disabled.) This is called VMware proactive High Availability (PHA) and is a vCenter feature that works with OMEVV. Customers can override the default severity assigned by Dell for these events to allow them to be tailored.

Figure 6.  Example server event alarms severity

Updating Dell server BIOS and firmware

Within the vCenter console, users can view BIOS / firmware versions, compare them to desired versions, and perform updates at the host or cluster level. This feature supports Dell 13G, 14G, 15G, 16G, and future generation servers with either iDRAC express or iDRAC enterprise. OMEVV offers cluster aware firmware updates where updates run sequentially one host at a time across the entire cluster, putting the target host into maintenance mode and using DRS to migrate virtual machines hot to ensure workloads are kept running. This firmware update feature can run tasks concurrently in parallel on up to 15 different VMware clusters simultaneously. This functionality is also supported by registering OMEVV as a Hardware Support Manager (HSM) for VMware vSphere Life Cycle Manager vLCM. vLCM is a VMware supplied tool that coordinates the OMEVV firmware updates in conjunction with ESXi software updates, including drivers and hypervisor patches, offering administrators an easier way to update the entire cluster.

The integrated firmware update process is wizard-based, allowing the selection of the new firmware level(s), targeting all or selected component(s), and scheduling the update. A baseline profile contains the location of the catalog/repository detailing required firmware versions and the target host(s) to be associated with the profile. If the host does not have internet access to the Dell support site, you can use Dell Repository Manager to create a local repository for use with OMEVV within the firewall or in air gapped environments. 

Figure 7.  Firmware compliance / available upgrades

Dell publishes:

  • Default firmware catalogs containing the latest released firmware. When using this, customers should check compatibility with the installed version of ESXi.
  • Firmware catalogs for the Dell customized ESXi image non-vSAN (IOS file) to streamline deployments. 
  • Firmware catalogs specific for vSAN that support the VMware compatibility matrix. The vSAN firmware catalog has the specific firmware versions for supported vSAN components, such as HBAs when used with the corresponding Dell customized ESXi image. When OMEVV discovers a host running vSAN, OMEVV prevents the use of the default Dell firmware catalog for updates.

Together these three elements provide an easy path to the desired cluster state.

Figure 8.  vLCM using OMEVV integration to patch Dell firmware as part of a VMware host update

Deploying the ESXi Hypervisor on new bare metal servers

Another key feature of the OpenManage Integration provides deployment of ESXi on Dell servers without using PXE. It includes the initial discovery, the optional deployment of the ESXi hypervisor with optional vSphere Host Profile, and registration of the host with a selected vCenter. It leverages the iDRAC9 enterprise hardware supported by 14G, 15G, and 16G generation Dell servers. 

The deployment feature separates the deployment preparation steps from the actual hypervisor deployment. After a bare metal server(s) has been discovered and appears in the list as compliant, it is ready for the hypervisor deployment. The deployment wizard collects details of the target servers, the ISO OS image file, the vCenter Destination Container, and the optional VMware ESXi host profile. This optional host profile encapsulates deeper configuration template of the ESXi install. The deployment information includes details such as the settings of vCenter instance, host name, host IP address, new password, NIC for management tasks, is collected by the wizard with common data being applied across all target hosts. A deployment job can be run immediately or scheduled. 

Figure 9.  Bare metal server deployment wizard

Dell chassis discovery and monitoring

OMEVV allows administrators to discover and monitor chassis details including hyperlinks to OME-M, related hosts, inventory, firmware, and warranty.

Figure 10.  MX chassis management information

Conclusion

The integration of OpenManage Enterprise with VMware vCenter provides a comprehensive, highly automated, end-to-end combined physical and virtual system management platform. OMEVV replaces the legacy standalone OMIVV, with only the new OMEVV supporting vSphere 8 and the latest server hardware. It enables host health monitoring, firmware update and bare metal deployment from within vCenter. It removes the complexities associated with manual processes and helps to avoid shuffling between multiple tools. This integration assists customers to reduce cost through a centralized, scalable, and customizable approach which is designed to enable and significantly simplify the management of Dell PowerEdge servers and modular chassis in a VMware environment.

References


Home > Servers > Modular Servers > Direct from Development: Tech Notes

PowerEdge MX7000 data center cooling

Dell PowerEdge MX7000 and MX760c Liquid Cooling for Maximum Efficiency

Mark Maclean David Hardy Mark Maclean David Hardy

Tue, 18 Apr 2023 15:21:16 -0000

|

Read Time: 0 minutes

Introduction

The market trend for high-performance servers to support the most demanding workloads has resulted in newer components, especially CPUs, putting more thermal demands on server design than ever before. Dell’s product engineers have brought new thermal innovation and added the choice of direct liquid cooling (DLC) to the PowerEdge MX7000 modular solution.

To maximize performance and cooling efficiency, customers now have the choice of liquid cooling or air cooling to support low-level to mid-level thermal design power (TDP) CPUs when selecting the MX760c with the latest 4th generation Intel® scalable processors. Implementing direct liquid cooling, or DLC, brings numerous benefits, including dramatically reducing the demand for cold air, so saving the costs of chilling, and reducing the power used to distribute cold air in the data center. 

Improved efficiency

Thermal conductivity is basically the ability to move heat, and air’s thermal conductivity is much lower than liquid. (The thermal conductivity of air is 0.031; for water, it is a much higher 0.66. These are average values measured in SI units of watts per meter-kelvin [W·m−1·K−1]). This means that DLC-cooled servers can run top-bin, high-TDP CPUs that otherwise could not operate without throttling with air cooling alone. Also, it takes much less energy to pump liquid coolant through a DLC cold-plate loop than moving a high volume of air that might be cooled through a mechanical chiller. That provides an overall energy savings at the rack and data center level that translate to lower operating costs.

While Dell has offered DLC-cooled servers in previous generations, the MX DLC solution is completely new. It uses the latest cold-plate loop design with Leak Sense, a proprietary method of detecting and reporting any coolant leaks in the server node through an iDRAC alert.

Figure 1.  Liquid-cooled PowerEdge MX760c with DLC heat sinks and pipework

The first liquid-cooled Dell server was completed more than ten years ago for a large-scale web company running a dense computer farm. Since then, we have made DLC available on a broad range of PowerEdge platforms, available globally. DLC solutions consist of the server, rack, and rack manifolds to direct coolant to each of the units in a rack, and a Coolant Distribution Unit (CDU). The DLC CDU is connected to the data center water loop and exchanges heat from the rack to the facility water supply. With customers demanding higher levels of performance while also aiming to reduce carbon emissions and energy costs, liquid cooling adoption continues to accelerate. Liquid cooling’s lower energy usage with lower OPEX cost decreases TCO and could produce an ROI within 12 to 24 months depending on the environment. 

Table 1.  Sample configurations highlighting low fan requirement and power saved by DLC configurations

 

Air cooling

Liquid cooling with DLC module

CPU SKU

205 W

225 W

270 W

270 W 

300 W

350 W 

Rear Fan PWR/ Idle CPU Load

82 W

33% duty

82 W  

33% duty

82 W

33% duty

82 W

33% duty

82 W

33% duty

82 W

33% duty

Rear Fan PWR/           50% CPU Load

185.7 W

50% duty

185.7 W

50% duty

485.3 W

50% duty

82 W

50% duty

82 W

33% duty

82 W

33% duty

Rear Fan PWR/ 100% Load CPU/MEM/Drive

1076.8 W

100% duty

1076.8 W

100% duty

1076.8 W

100% duty

111.7 W

39% duty

111.7 W

39% duty

111.7 W

39% duty

Results are based on a four-drive backplane configuration: 4 x 1.92 TB NVMe drives + 24 x 64 GB DDR5 + 2 x 25 Gb mezzanine cards. 

Table 2.  PowerEdge MX CPU details (offered liquid cooled only)

CPU  

TDP 

Specifications 

6458Q

350 W

 4.00 GHz / Max Turbo 3.10 GHz / 60 MB cache / 32 cores

8458P

350 W

 2.70 GHz / Max Turbo 3.80 GHz / 82.5 MB cache / 44 cores

8468

350 W

 2.10 GHz / Max Turbo 3.80 GHz / 105 MB cache / 48 cores

8468V

330 W

 2.40 GHz / Max Turbo 3.80 GHz / 97.5 MB cache / 48 cores  

8470

350 W

 2.00 GHz / Max Turbo 3.80 GHz / 105 MB cache / 52 cores

8470Q

350 W

 2.10 GHz / Max Turbo 3.80 GHz / 105 MB cache / 52 cores

8480+

350 W

 2.00 GHz / Max Turbo 3.80 GHz / 105 MB cache / 56 cores

A liquid cooling solution is limited to a four-drive backplane, E3.S backplane, or diskless configuration. A liquid cooling solution can be provided for all CPU SKUs to support various performance requirements. 

Customers can monitor and manage server and chassis power plus thermal data. This information, supplied by the MX chassis and iDRACs, is collected by OpenManage Power Manager and can be reported per individual server, rack, row, and data center. This data can be used to review server power efficiency and locate thermal anomalies such as hotspots. Power Manager also offers additional features, including power capping, carbon emission calculation, and leak detection alert with action automation.

Total solution with direct liquid cooling

The MX760c uses a passive cold-plate loop with supporting liquid cooling infrastructure to capture and remove the heat from the two CPUs. The following image highlights the elements in a complete DLC solution. While customers must provide a facility water connection, a service partner or infrastructure specialist typically provides the remaining solution pieces.

Figure 2.  DLC solution elements

Dell customers can now benefit from a pre-integrated DLC rack solution for MX that eliminates the complexity and risk associated with correctly selecting and installing these pieces. The DLC3000 rack solution for MX includes a rack, customer MX rack manifold, in-rack CDU, and each MX chassis and DLC-enabled compute node pre-installed and tested. The rack solution is then delivered to the customer’s data center floor, where the Dell services team connects the rack to facility water and ensures full operation. Finally, Dell ProSupport maintenance and warranty coverage backs everything in the rack to make the whole experience as simple as possible.

   

Figure 3.  DLC3000 MX rack solution (front and rear views)

Moreover, with the DLC solution, the pre-integrated rack can support up to four MX chassis and 32 compute sleds. With top-bin 350 W Xeon Gen 4 CPUs, that translates to over 22 kW of CPU power captured to the DLC cooling solution. It is a major leap in capability and performance, now available for Dell customers.

Conclusion

As Dell offers the 4th generation Intel CPU in air-cooled and liquid-cooled configurations for use with the PowerEdge MX, customers need to review the choice between traditional air cooling and DLC, and understand the benefits of both to make an informed decision. Organizations need to consider server workload demands, capital expenditures (CAPEX) plus operating expense (OPEX), cost of power, and cost of cooling to understand the full life-cycle costs and determine whether air cooling or DLC provides a better TCO.

References

 

 

Home > Servers > Systems Management > Direct from Development: Tech Notes

PowerEdge OpenManage Power Manager

Non-Dell Server Support in OpenManage Enterprise Power Manager

Mark Maclean Mark Maclean

Wed, 12 Apr 2023 14:19:03 -0000

|

Read Time: 0 minutes

Summary

The monitoring and management of power consumed by servers has become a priority for many organizations, whether due to cost of energy, carbon emission reduction commitments, or facility limitations. In January 2023, Dell released OpenManage Enterprise Power Manager version 3.1. One major feature of this release was the addition of support for a limited number of non-Dell servers. This Direct from Development tech note describes the new capabilities that customers can access to support HPE iLO5 and Lenovo XCC enabled servers. 

Market positioning

OpenManage Enterprise is Dell’s server lifecycle management console, with the ability to discover, deploy, monitor, update, manage, and report. Power Manager is a plug-in that adds additional power and thermal capabilities to the core management console. The Dell Product Group recognizes that not all customers have a 100 percent PowerEdge fleet and so need to monitor more than just Dell servers.

Discovery and reporting   

Non-Dell servers are discovered through their baseboard management controllers—iLO5 or XCC. An IP address and login credentials are all that is required. All non-Dell servers discovered in OpenManage Enterprise are automatically listed under the Non-Dell Servers group, as shown here: 

 

 Figure 1.    Example of the Non-Dell Servers group showing HPE ProLiant DL and Lenovo ThinkSystem servers 

Once the servers are discovered, Power Manager can monitor the power and thermal telemetry. This data is processed and then displayed through “applets,” as shown in Figure 2 and Figure 3. A RESTful API enables customers to build additional automation and report tools if required. Sample code is posted by Dell on GitHub.  


Figure 2.    Example of power and thermal metrics for a non-Dell server

There are numerous prebuilt reports that now include this non-Dell server data, such as maximum power (watts). This data is also available in the custom report builder. These reports can be run ad hoc or scheduled to be emailed on a regular basis. These reports support export in HTML, CSV, PDF, and XLS formats.


Figure 3.    Example of alert thresholds for non-Dell servers

References

Home > Servers > Systems Management > Blogs

PowerEdge VMware Ansible OpenManage System Center Operations Manager CloudIQ Power Manager cybersecurity ServiceNow iDRAC

Sweet 16 ways OpenManage helps customers to maximize their investment in PowerEdge

Mark Maclean Mark Maclean

Wed, 12 Apr 2023 01:27:49 -0000

|

Read Time: 0 minutes

As we at Dell announce details of the new wave of PowerEdge servers (details here), we want to highlight 16 examples of how the OpenManage portfolio of systems management software enhances our server range. Like I always say, where there are servers, there are server management requirements.

The OpenManage portfolio exists to save customers of any size time and money, eliminating the necessity of high-touch, manual steps to deliver efficiency. Designed to scale, with integrated security, Dell’s OpenManage strategy is to give customers a choice by using orchestration, automation, and integration, leveraging APIs with open standards.

#1 – Server health monitoring—This is server management 101. However, given the fact that PowerEdge servers are the foundation of the modern data center, this basic element is critical to application and services uptime. OpenManage solutions have many ways to get this information from the agent-free iDRAC directly (GUI/SNMP/SMTP/syslog/API and more) or through the Dell OpenManage Enterprise console, OpenManage mobile, Dell CloudIQ, VMware vCenter integration, Microsoft System Center, and leading third-party management software such as Nagios.

#2 – Remote access to servers—If deep one-to-one control for troubleshooting, deployment, configuration, console access, and so on is needed, then iDRAC is the answer. Dell's unique iDRAC9 offers out-of-band remote server connection, including firmware configuration, full server console remote control through eHTML5 (sometimes called vKMV) GUI, virtual media, and server telemetry. iDRAC agentless architecture offers server monitoring and control from anywhere without the need to install any software. There are many additional features, from basic power on/off control offered through the GUI, CLI, or API to advanced server profile configuration to ensure that servers have the correct firmware configuration settings. 

#3 – Server deployment—The time between when a server is racked and powered until it is live (time to value) can be greatly reduced by leveraging the automation integrated into OpenManage. Starting with streamlining one-to-one deployments, the iDRAC features a lifecycle controller that rapidly configures elements such as RAID storage configurations and populate deployments with up-to-date operating system drivers. In addition, iDRAC also features a zero-touch deployment to automatically download a server configuration profile (SCP) and even complete an unattended operating system installation the first time the server powers up on a customer’s network. Beyond one-to-one solutions, OpenManage offers a broad number of deployment solutions, including: OpenManage Enterprise, offering firmware setting configuration and supporting agnostic operating system installation through ISO images; Microsoft System Center integration; and deeper customizable VMware installations through OpenManage Enterprise for VMware vCenter. Finally, for customers using tools such as Ansible, Terraform, or Prometheus, OpenManage supplies integration packs and sample code leveraging Dell's APIs.

#4 – Manage and update firmware—There are multiple methods to update PowerEdge server firmware, depending on needs. Methods range from one-to-one, using iDRAC/Lifecycle Controller, to console-based methods for updating multiple servers. Leveraging large-scale automation, these tools can audit existing servers, compare online catalogs, then download and apply the correct updates quickly and consistently with massive time savings compared to manual methods. One example is the integration into VMware using OpenManage Enterprise for VMware vCenter, which offers cluster-aware updates, updating one cluster node at a time using DSR to keep workloads up and running. Dell supplies Repository Manager to build custom firmware catalogs like the packaged interpretable ISOs that are used by other Dell updating tools where servers are isolated or air gapped. And, of course, Dell supplies an Ansible module offering firmware updates to the DevOps user base.

#5 – Configuration drift detection—OpenManage Enterprise provides compliance features that detect, highlight, and remediate configuration drift issues, with simple processes for both firmware versions and firmware configuration settings.

#6 – Secure supply chain assurance—Using Dell’s Secure Component Verification (SCV) allows organizations to ensure that their new servers are delivered with the same components installed at Dell Technologies’ manufacturing facility, using a digital, cryptographically secured signed inventory certificate.

#7 – Power usage reporting (and carbon emissions calculations)—There are multiple ways to view server power consumption data, depending on needs and preferences. One way is to open the iDRAC web GUI, while another way is to use scripts, either Racadm or Redfish, to retrieve the data. iDRAC can also send data to the OpenManage Enterprise Power Manager plug-in, where power data, including carbon emissions, is processed and grouped, and can be displayed, reported, and actioned. OpenManage Enterprise can also forward this information to CloudIQ for PowerEdge for additional analysis and visualization. For those customers looking for maximum data, iDRAC9 can stream these power statistics as telemetry data to analytics solutions such as Splunk or ELK Stack for real-time in-depth analysis.

#8 – Power usage control—Power consumption capping ability is integrated into iDRAC. OpenManage Enterprise Power Manager adds the capability to apply power caps to individual servers or groups of servers. This power capping can be permanent, scheduled at particular times for specific weekends, or ad hoc in response to an incident when reduction in power consumption is required, such as when running on UPS or on-premises generators.

#9 – Thermal event management—While thermal monitoring alerting and even shutdown is integrated into PowerEdge servers through the iDRAC, OpenManage Enterprise Power Manager augments this through powerful Emergency Power Reduction (EPR) policies. This feature reduces the power consumption of servers through a power cap policy to throttle a group of servers. EPR policies can be used as a permanent or scheduled method to limit server power consumption or as an immediate temporary measure during a thermal emergency, for example, CRAC unit failure. 

#10 – Performance monitoring—From the iDRAC GUI, CLI, and API, server performance telemetry data can be obtained. OpenManage Enterprise Power Manager can consume and report this data, automatically highlighting idle servers. Telemetry information can be passed to third-party solutions such as Splunk. Finally, CloudIQ can analyze information and present the information in a dashboard format with graphical visualization, and, for key metrics, highlight anomalies based on historic seasonality data.

#11 – Enterprise secure key management—iDRAC provides a standards-based Key Management Interoperability Protocol (KMIP) to encrypt data at rest on self-encrypting SSDs or self-encrypting hard drives and pass the key to a key management system. Solutions such as Thales CipherTrust Manager offer centralized key management for multiple PowerEdge servers and many other products.

#12 – Detailed server telemetry—iDRAC9 provides more than 180 data metrics that can integrate advanced server hardware operation telemetry. Many of these can be reported and visualized in CloudIQ or streamed to analytics solutions such as Splunk. This server telemetry data allows customers to access detailed information to avoid failure events, optimize server operation, and enhance cyber resiliency.

#13 – Automatic call and ticket creation—This ranges from the Dell services plug-in for OpenManage Enterprise, which offers the creation of a support case directly with Dell without any human intervention, to integration with ServiceNow by Dell’s integration pack. Alternatively, OpenManage Enterprise offers a flexible set of actions, including running scripts, SNMP forwarding Syslog event, and emailing based on the monitoring of SNMP events. This automation can be used to pass information to a third-party solution for incident management.

#14 – Capacity planning—The iDRAC provides a large amount of performance statistics. This data can be collected and analyzed by the Dell CloudIQ IOPS solution to produce a forward-looking capacity analysis on items such as CPU usage based on real historical data values for a given server and workload.

#15 – Cloud-based infrastructure management—Dell's AIOp’s CloudIQ can not only consolidate multiple instances of OpenManage Enterprise, but it can also integrate Dell storage, server, data protection, networking, HCI, and CI products. Hosted in Dell’s secure data center, CloudIQ combines proactive monitoring, machine learning, and predictive analytics to reduce risk, plan ahead, and improve productivity from core to edge.

#16 – Cybersecurity from concept to retirement—Dell Cyber Resilient Architecture 2.0 includes features such as iDRAC silicon-based root of trust, dynamic USB port management, UEFI Secure Boot, and signed firmware updates. All these features are controlled by OpenManage tools that let customers protect, detect, and recover in response to security threats.


We hope that this list has given you a few suggestions on how the OpenManage portfolio can help your organization. Servers are a vital element of organizations’ infrastructure and the foundation of modern business, and it’s critical to manage and monitor them to deliver visibility, productivity, and control. Server management tools not only make tasks easy, faster, and consistent but also decrease failures with increased efficiency. Remember, don't just manage, automate.

Is your organization using all the features that Dell OpenManage offers and getting the maximum benefits from investing in PowerEdge servers? Ask your account manager for more details.

References

#1 Monitoring Dell Integrated System for Microsoft Azure Stack Hub with System Center Operations Manager

#2 Support for Integrated Dell Remote Access Controller 9 (iDRAC9)

#3 How to create and deploy a Server Template in OpenManage Enterprise (video)

#4 Updating Firmware and Drivers on Dell PowerEdge Servers

#5 Improve Operational Efficiency Through OME Server Drift Management

#6 Dell Technologies Secured Component Verification for PowerEdge

#7 #8, #9 Server Power Consumption Reporting and Management

#10 CloudIQ Provides Data Driven Server Management Decisions

#11 OpenManage Secure Enterprise Key Manager Solutions Brief

#12 Transform Datacenter Analytics with iDRAC9 Telemetry Streaming

#13 Support for OpenManage Integration with ServiceNow

#14 Talking CloudIQ: Capacity Monitoring and Planning

#15 CloudIQ: AIOps for Intelligent IT Infrastructure Insights

#16 Cyber Resilient Security in Dell PowerEdge Servers

Additional resources

 

Home > Servers > Modular Servers > Direct from Development: Tech Notes

Intel PowerEdge MX Intel 4th Gen Xeon

Unlock New MX CPU and Storage Configurations with a Thermally Optimized Air-Cooled Chassis

Doug Messick Mark Maclean Doug Messick Mark Maclean

Fri, 03 Mar 2023 20:08:02 -0000

|

Read Time: 0 minutes

As the server industry trend of increasing CPU power goes on, Dell Technologies continues to offer customers feature-rich air-cooled configurations. Dell Engineering has applied thermal innovation and machine learning to the Dell PowerEdge MX chassis to support the MX760c server sled with a broad range of 4th Gen Intel® Xeon® Scalable processors and local storage configurations.

This Direct from Development tech note describes the new capabilities using air cooling that Dell has added to the PowerEdge MX configurations. 

Introduction

The PowerEdge MX7000 is a modular chassis that allows customers to build a set of compute, storage, networking, and management to meet their specific workload needs. Industry trends of new technologies, including CPUs increasing power per server sled, continually push the capability to air-cool feature-rich configurations. Dell Engineering used machine learning combined with next-generation fans to offer high-performance 4th Gen Intel® Xeon® Scalable processors in an air-cooled chassis with more local storage configurations than previously available.

Dell Engineering expertise 

There are 8! = 40,320 modular sled permutations in the 8-slot MX chassis. Dell Engineering conducted a Design of Experiments (DOE) to train a machine learning model that dynamically calculates the airflow cooling capacity for each of the eight slots. This technology enables Dell to maximize the shared cooling infrastructure of the MX7000, unlocking configurations that were previously not possible, and provide clear guidance to customers about how to thermally optimize their chassis. When a chassis configuration is optimized for cooling, the fans run more efficiently at lower speeds across the server workload, which lowers fan power, reduces cooling costs, and decreases acoustics of the chassis.

Thermally optimized chassis

The ability of the MX7000 chassis to air-cool the eight slots is directly affected by the storage configuration of each sled as well as the placement of sleds in the chassis. For example: Pulling air through a sled that has six hard drives is harder than with a sled that has four hard drives. Machine learning is built into the sled and chassis firmware to dynamically analyze the ability of the chassis to deliver air-cooling to each sled.


 
A consistent storage configuration maximizes cooling across all sleds and enables the MX760c to support up to six 2.5-in. storage devices with the latest 4th Gen Intel® Xeon® CPUs.




 


A varied storage configuration with MX760c sleds enables support for up to four 2.5-in. storage devices to maximize cooling through each sled.


 

 

 

MX7000 air-cooling enhancements

MX7000 chassis and MX sleds introduced the capability to dynamically calculate the cooling based on the chassis configuration. This capability enabled Dell to offer a thermally optimized chassis with a consistent storage configuration that increases cooling for sleds by 20 percent. Dell used this additional cooling capability to offer high-power CPUs with storage configurations that were not supported by previous generations.


The industry trend of increasing power per node every generation has significantly challenged the ability to deliver air-cooled solutions. The MX7000 chassis introduced the next-generation Gold Grade chassis fans with the MX760c sleds to provide an air-cooled solution with the latest high-powered CPUs. Gold Grade fans deliver 25 percent more cooling per sled than the previous-generation Silver Grade fans.

 

Enterprise Infrastructure Planning Tool

The Dell Enterprise Infrastructure Planning Tool (EIPT) helps IT professionals plan and tune their systems and infrastructure for maximum efficiency. Customers can model their customized MX7000 chassis and sled configurations in EIPT. The trained machine learning model enables the tool to identify the maximum data center ambient temperature supported by the sleds. It also identifies the most thermally optimized configuration when sleds have a varied storage configuration. This means that new and existing customers can identify the most efficient sled-to-slot configuration to optimize their chassis for maximum cooling capability while lowering power, costs, and fan noise.

Conclusion

Dell continues to deliver innovative solutions that expand the air-cooled feature-rich configuration choices for the PowerEdge MX7000 chassis and server sleds. Dell Engineering combined machine learning technology with next-generation fans to provide customers the latest high-performance CPUs with more local storage configurations than previous generations in an air-cooled chassis. In addition to the expanded air-cooling configurations, Dell also offers Direct Liquid Cooling (DLC) for the PowerEdge MX7000 chassis and server sleds. The features and potential benefits of DLC are discussed in a separate Direct from Development tech note.

References 


Home > Servers > Modular Servers > Direct from Development: Tech Notes

PowerEdge OpenManage

PowerEdge MX7000 and OpenManage Enterprise–Modular Edition Advanced License

Mark Maclean Roger Foreman Mark Maclean Roger Foreman

Wed, 22 Feb 2023 18:09:49 -0000

|

Read Time: 0 minutes

Summary

Dell OpenManage Enterprise–Modular runs on the Dell PowerEdge MX7000 management module. OpenManage Enterprise–Modular facilitates configuration and management of PowerEdge MX chassis using a single web-based GUI or CLI and API. Script examples and code in Python or PowerShell using RACADM CLI and REST API are at Dell Technologies GitHub. OpenManage Enterprise–Modular is used to monitor and manage the chassis and chassis components such as compute sleds, network devices, I/O modules, and local storage devices. 

The following additional OpenManage Enterprise–Modular features are available when the Advanced license is installed:

  • OpenID Connect
  • Telemetry 
  • RSA multifactor authentication
  • Automatic certificate renewal

OpenID Connect

OpenID Connect (OIDC) is a solution that supports single sign-on (SSO). The following table lists the predefined roles that must be configured in the OIDC provider for OIDC users to log in to OpenManage Enterprise–Modular:

Table 1. OIDC predefined roles

Role in OpenManage Enterprise–Modular

Role in OIDC provider

Description

CHASSIS_ADMINISTRATOR

CA

Can perform all tasks on the chassis

COMPUTE_MANAGER

 

CM

Can deploy services from a template for compute sleds and perform tasks on the service

STORAGE_MANAGER

SM

Can perform tasks on storage sleds in the chassis

FABRIC_MANAGER

FM

Can perform tasks that are related to fabrics

VIEWER

VE

Has read-only access

Telemetry

Chassis telemetry, including power consumption, thermal data, and fans speeds, is available to be passed into third-party analytics solutions such as Splunk. This telemetry data of more than 30 metrics is provided as granular, time-series data that can be streamed rather than polled. 

RSA multifactor authentication

RSA SecurID can be used as another means of authenticating a user on a system. The OpenManage Enterprise–Modular Edition with the Advanced license supports RSA SecurID as a two-factor authentication (2FA) method.

 

Figure 1. OpenManage Enterprise–Modular RSA configuration

Figure 2. OpenManage Enterprise–Modular login using RSA two-factor authentication

Automatic certificate renewal

The automatic certificate enrollment feature of OpenManage Enterprise–Modular enables fully automated installation and renewal of certificates used by the web server. When this feature is enabled, the existing web server certificate is replaced by a new certificate through a client for Simple Certificate Enrollment Protocol (SCEP) support. SCEP is a protocol standard used for managing certificates for large numbers of network devices using an automatic enrollment process. OpenManage Enterprise–Modular interacts with SCEP-compatible servers such as the Microsoft Network Device Enrollment Service (NDES) to automatically maintain SSL and TLS certificates. 

Figure 3. Automatic certificate configuration 

Reviewing entitlement

The license is stored on the management controller and can be imported or viewed as shown in the following figure. This license can be factory-installed or supplied by Dell's digital locker offering. 

Figure 4.  Advanced license installed

References

 

 

Home > Servers > Modular Servers > Direct from Development: Tech Notes

PowerEdge MX modular servers

PowerEdge MX Validate Baseline to Improve Operational Efficiency

Seamus Jones Mark Maclean Terri Brewer Seamus Jones Mark Maclean Terri Brewer

Mon, 16 Jan 2023 21:29:16 -0000

|

Read Time: 0 minutes

Summary

Modern compute platforms consist of many components requiring multiple firmware elements. This can lead to complexity and risk when updating these components. To eliminate this problem for MX customers, Dell produces a biennial firmware baseline and validates the complete end-to-end stack with testing built on real customer use cases. Dell OpenManage system management orchestration then offers a simple route to update, at scale, live environments to this desired state.

This Direct from Development (DfD) tech note describes at a high level the Dell methodology for applying updates with no disruption in service. This enables lowering risk, streamlining the update process, and saving time for organizations.

Market positioning

The PowerEdge MX is a scalable modular platform comprising compute, networking, and storage elements, and designed for data center consolidation with easy deployment and rich integrated management. PowerEdge MX features an industry- leading no midplane design and scalable network fabric, within a chassis architecture to support today’s emerging processor technologies, new storage technologies, and new connectivity innovations well into the future.

PowerEdge MX firmware baseline

Reduce complexity and simplify operations by leveraging Dell’s MX validated solution infrastructure firmware baseline. This is a set of system and component firmware for

the MX platform that is rigorously tested as “one release” in a number of configurations, using the most popular operating system environments based on real world customer use cases. When the updates have passed this testing as a group, a validated solution stack firmware catalog that details the release versions is published. Several solutions in the OpenManage portfolio can then consume the catalog as an update blueprint.

 Figure 1. MX Baseline Components

Dell MX firmware baselines offer customers an elegant and automated method for platform wide updates. Advantages for customers include:

  • Aggregates multiple releases into one consolidated update
  • Dell end-to-end validation helps eliminate the risk of element incompatibility
  • Reduces the number of maintenance windows and the amount downtime required for updating

Anatomy of the PowerEdge MX baseline

The PowerEdge MX validated solution baseline consists of many elements, including system BIOS, iDRAC, NICs, CNAs, fibre channel adapters, HBAs and other critical updates. In addition, the stack extends into the chassis to include network switch code and management controller software “OME-M”. The MX platform baseline testing includes the Chassis I/O Modules such as MX9116n, MX7116n, MX5108, and MXG610 capabilities in all forms with scaled VLANs. It also includes testing with different configurations, protocols, and workloads. For Fiber Channel and FCoE, baseline testing also includes testing scenarios in NPIV Proxy Gateway, FIP Snooping Bridge, and Direct Attached mode. An example end-to-end stack test is VMWare ESXi running on the compute sleds connected to a PowerStore storage array using FCoE Ethernet and testing updating from an old baseline to the new baseline. When the Dell updates pass evaluation, a validated solution stack of the platform firmware catalog file containing details of the tested versions is published online ready to be consumed by Dell update mechanisms, such as the update manager integrated into OME. Think of the validated baseline as a recipe for success.

When it comes to apply updates, Dell’s OpenManage system management automation provides a timesaving centralized process with intelligent safeguards to eliminate downtime. The benefits of using OME-M to perform updates using the catalog include: automatically identifying components that require updates, downloading the updates from the Dell support site, creating and scheduling update jobs, correctly ordering tasks, and reporting. The following example shows a sample catalog, highlighting the non-compliant elements. An administrator needs only to click the “Make Compliance” to start the task to update multiple elements in the MX environment.

Figure 2. Detailed view of a firmware update

VMware enhancement

For customers running their VMware environment on PowerEdge MX platform, this firmware update process can be enhanced using OMEVV (OpenManage Enterprise plugin for VMware vCenter) to be “VMware cluster” aware, in order to safeguard services from outages. Cluster aware updates mean intelligent rules that allow patching only one member of a VMware cluster at a time. Leveraging ESXi maintenance mode, DRS, and vMotion, before patching a physical host,

virtual machines are systematically migrated “hot” to other ESXi hosts, ensuring that workloads and services running on the cluster are kept online at all times. After applying the updates, the host restarts and re-joins the cluster. DRS can then live migrate virtual machines back to the newly updated host. This sequence is repeated for each host in the cluster, offering a controlled rolling upgrade for the entire cluster.

Figure 3. OMEVV/VMware host rolling updates

 

OMEVV also includes a scheduling engine to manage timed updates during quiet periods or to set maintenance windows. Larger customers can run parallel updates on up to 15 clusters simultaneously from a single console.

 

IOMs

If a customer is using an MX environment with MX9116n/MX7116n network switches in SmartFabric mode, they simply select “make compliant” from the OME-M GUI. No searching for the correct switch code, no manual upload code to the switch, it is all taken care of as part of the catalog. OME-M interfaces with switches to upload the new code. If the switches are configured as a pair, the update runs automatically on one switch at a time to ensure problem free connectivity during the updates.

 

RESTful API

The OpenManage Enterprise APIs enable the customer to integrate with other management products such as Ansible play books or build tools based on common programming and scripting languages, including Python and PowerShell. These APIs are fully documented. Dell posts many examples on GitHub code repository for administrators / developers to download and use for free.


In Conclusion

Customers who rely on Dell PowerEdge MX for their compute needs can streamline the update process, saving time and ensuring firmware compliance, by leveraging MX validated solution stack firmware baselines. In addition, for VMware environments, intelligent rolling firmware updates for hosts offer updating with zero service outages, and no end user downtime.

References

To learn more, see:

Home > Servers > Systems Management > Direct from Development: Tech Notes

PowerEdge systems management Power Manager Servers

Reduce Server Power Usage and Save Money with Power Manager

Lori Matthews Mark Maclean Lori Matthews Mark Maclean

Mon, 16 Jan 2023 18:41:07 -0000

|

Read Time: 0 minutes

Summary 

Between the substantial rise in energy costs and organizations’ sustainable initiatives to reduce global warming, lowering data center power usage is a key strategy for many IT teams. This Direct from Development Tech Note describes the capabilities of Dell OpenManage Enterprise Power Manager version 3.0, which is a fully integrated extension to Dell OpenManage Enterprise. Power Manager provides increased visibility of server power data, including consumption, anomalies, and utilization. Customers can use this tool to discover and then proactively manage server power consumption plus server thermals while also assessing their carbon footprint.

Introduction

The phrase “you can’t manage what you can’t measure” is often attributed to W. Edwards Deming, the statistician. In terms of server power usage, this adage means that organizations need data plus tools to manage and lower server power usage, resulting in a reduced carbon footprint. With Dell OpenManage Enterprise Power Manager, PowerEdge customers can both monitor and actively manage server power usage. In addition to reporting power and thermal data, Power Manager can also cap server power consumption and manage thermal events. Version 3.0 also introduces a new carbon usage calculation feature for customers who want to understand their server estate emissions.

Figure 1.  Server power usage data and threshold

Power reduction strategy

OpenManage Enterprise Power Manager supports creating a power reduction strategy easily and efficiently through several key elements. 

Current usage

Discovering the current usage across an entire server estate is simple. Each managed server’s iDRAC gathers various metrics, such as power consumption, thermal utilization, and server utilization. OpenManage Enterprise collects and displays the data in dashlet graphs (mini dashboards), such as Power History (Watt) (shown in Figure 2). Within the tool, administrators can place servers into racks, aisles, and then data center collections to reflect the real-world environment to assist with reporting and actions. These dashlets offer powerful visualization of the data, from one server to an entire server fleet, for the last few hours or up to an entire year. If required, customers can add power values for unmonitored devices for a more complete view of data center power usage. An OpenManage Enterprise Advanced or Advanced+ license is required on each server to enable Power Manager.

Figure 2. Power history for one group of servers

Review and analyze

Through its dashlets, Power Manager accelerates customers’ understanding by providing relevant data that highlights servers that should be reviewed. These include top energy consumers (kWh), as shown in Figure 3.


Figure 3. Top energy-consuming servers (kWh)

This data is also consolidated into reports and is available in the custom report builder as well. The prebuilt library contains numerous useful reports, including Power Manager: Server Utilization Report and Power Manager: Power and Thermal Report (shown in Figure 4). These reports highlight underutilized and idle servers that could be candidates for consolidation or decommissioning.

Figure 4. Power and thermal report

Administrators can assess power draw by virtual machines (VMware ESXi and Microsoft Hyper-V hosts) as well as power draw by key components such as CPU, RAM, server fans, and local storage.

Customers who want carbon footprint data can use the integrated greenhouse gas emissions reports that detail energy consumed (kWh) and greenhouse gas emissions per server and per group. All report data can be exported as HTML, PDF, CSV, or XLS, and any report can be run ad hoc or automatically delivered by email on a regular basis through the OpenManage Enterprise report schedule.

Figure 5. List of power-related reports

Take action

Administrators can consider using power capping during hours that are outside of normal operations or in test and development environments. Modern servers are relatively efficient when idling; however, the introduction of power capping can guarantee low power usage. Administrators can use Power Manager’s static policies to set budget power for a device or group, or even the entire server estate, as shown in Figure 6. Power caps can be set in watts or percentage.

Figure 6.      Creating a power-capping policy for multiple servers

For example, an administrator might have no power capping policies during the day when full server performance is required and configure a lower power cap for evenings and weekends when server workload is less.

Additional suggestions to decrease power consumption and carbon footprint include:

    • Review and change the server BIOS system profile. For example, change Maximum Performance to Performance Per Watt. Expect Power Manager to manage this profile setting in future releases.
    • Replace or consolidate older servers that use outdated CPU technology. Those older servers are not as power- efficient as the latest generation of PowerEdge. Tools such as Dell Live Optics, through which you can review current server operating system performance data such as RAM capacity and storage performance, and Dell Enterprise Infrastructure Planning Tool (EIPT) can help with further investigation and “what-if” migration modeling.
    • Improve the overall efficiency of data center cooling, thereby improving power usage effectiveness (PUE). For example, review air flow for more effective cooling, resolving data center hot spots/cold spots, or implement highly efficient liquid-cooled Dell servers.
    • Move to renewable energy sources/suppliers to aid in decreasing carbon emissions.

 

References



Home > Servers > Systems Management > Direct from Development: Tech Notes

PowerEdge systems management Power Manager iDRAC

Server Power Consumption Reporting and Management

Kim Kinahan Mark Maclean Delmar Hernandez Jeremy Johnson Lori Matthews Kyle Shannon Doug Iler Kim Kinahan Mark Maclean Delmar Hernandez Jeremy Johnson Lori Matthews Kyle Shannon Doug Iler

Mon, 16 Jan 2023 18:31:46 -0000

|

Read Time: 0 minutes

Summary

Between customers’ sustainability initiatives to reduce carbon emissions, and demands to control energy consumption and costs, the ability to report, analyze and action server power usage data has become a key initiative. This DfD tech note explores the rich server power usage data available from Dell PowerEdge servers and the various methods to collect, report, analyze, and act upon it. 

What is server power consumption?

A wide variety of server power information is offered by the iDRAC. The amount and frequency of information varies by iDRAC version and licensed features and the choice of optional tools and consoles.

One-to-one and one-to-many

There are multiple ways to view power consumption data from the iDRAC, depending on needs and preferences. One way is to open the web interface GUI. Another way is using scripts, either Racadm or Redfish, to retrieve the data. iDRAC can also send data to the OpenManage Enterprise Power Manager Plugin. OpenManage Enterprise can also forward this information to CloudIQ for PowerEdge. For those customers looking for the ultimate solution, iDRAC9 can stream these power statistics as telemetry data to analytics solutions such as Splunk or ELK Stack for real-time in-depth analysis.

Figure 1. PowerEdge management stack, with power management and data reporting highlighted

PowerEdge server power data

Embedded with every Dell PowerEdge server, the integrated Dell Remote Access Controller (iDRAC) enables secure and remote server access for out-of-band and agent-free server management tasks. Features include BIOS configuration, OS deployment, firmware updates, health monitoring, and maintenance. One key set of data that iDRAC provides is power usage. IT admins have used iDRAC data to view and react to power issues for over 10 years. The iDRAC engineering teams have continued to expand the capabilities within the iDRAC UI as well as the information available to “one to many” consoles such as OpenManage Enterprise. iDRAC9 with Datacenter feature set enabled extends the solution even further with telemetry streaming.

iDRAC

iDRAC monitors the power consumption, processes, and reports continuously at the individual server level. The browser user interface displays the following power values:

  • Power consumption warning and critical thresholds
  • Cumulative power, peak power, and peak amperage values
  • Power consumption over the last hour, last day, or last week
  • Average, minimum, and maximum power consumption with historical peak values and peak timestamps
  • Peak headroom and instantaneous headroom values (for rack and tower servers)

iDRAC9 provides a graphical view of these power metrics such as the power consumption example shown here.

Figure 2. iDRAC9 GUI power consumption data

iDRAC9 connects to all critical server components and, in conjunction with the Datacenter license, can collect over 180 server metrics in near-real-time. These metrics include granular, time-stamped data for critical functions such as processor and memory utilization, network card, power, thermal, and more. iDRAC9 can stream this telemetry data in real time.

Figure 3.  iDRAC power telemetry data collected by Splunk 

Get Server Power – RACADM CLI Examples

The RACADM command-line provides a basic scriptable interface that enables you to retrieve server power either locally or remotely. In addition to the CLI interface, iDRAC also supports the Redfish RESTful API. Example Powershell and Python scripts that can be used to collect power data can be download from the Dell area in github.com. The RACADM CLI can be access from the following interfaces:

  • Local - Supports running RACADM commands from the managed server's operating system (Linux/Windows). To run local RACADM commands, install the OpenManage DRAC Tools software on the managed server.
  • SSH or Telnet (also known as Firmware RACADM) - Firmware RACADM is accessible by logging into iDRAC using SSH or Telnet.
  • Remote - Supports running RACADM commands from a remote management station such as a laptop or desktop running Windows or Linux. To run remote RACADM commands, install the OpenManage DRAC Tools software on the management station.

Here are some examples using the remote iDRAC9 SSH CLI method, post authentication.

  • Instantaneous server power usage:
  • Server power stats:


OpenManage Enterprise Power Manager

The Power Manager Plugin for OpenManage Enterprise uses the power data securely collected from iDRACs to observe, alert, report, and, if required, place power caps on servers. For ease of management, servers can be logically grouped together, such as in a rack, a row, or in custom grouping, such as a workload. Using this data, customers can drive data center efficiency in several ways, such as by easily identifying idle servers for repurposing or retirement. Using built in reports or creating a custom report, customers can identify server racks not using their full available power capacity to deploy new hardware without needing additional power. Customers can mitigate risk by detecting when groups of servers are nearing their power capacity during specific timeframes. Using automated policies, customers can maximize power available to business-critical applications by reducing noncritical consumption by using scheduled or permanent power capping.

Important in today’s climate concerns are reports on carbon emissions based on server usage. Power Manager provides reports on the carbon emissions for individual servers as well as racks and custom groups of servers. This information can be used to identify areas of concern and to show progress in carbon emission reductions based on power policies, removal of idle servers, and other initiatives such as consolidation and refresh.

The power data is displayed by applets integrated into OpenManage Enterprise. (See examples in the following figure.) There are also several predefined reports built into the report library designed around power usage. Power Manager automates actions driven by specific power or thermal events, including running scripts, applying power caps, and forwarding alerts. Power Manager collects this power data and stores it for up to 365 days.

Figure 4. View of a rack group alert threshold graphic for power and thermal

Figure 5. Rack view showing max/min/avg power for the last six hours

CloudIQ for PowerEdge – Reporting Server Power

Another method to visualize and report the power data is by CloudIQ. Utilizing the OpenManage Enterprise CloudIQ Plugin, customers can connect their PowerEdge servers to the Dell hosted CloudIQ secure portal. This is a cloud based software-as-a-service portal, hosted in the Dell data centers, that provides powerful analytic, health, and performance monitoring for servers. CloudIQ can consolidate multiple OpenManage Enterprise instances, providing a truly global view of an organization’s server estate. Within CloudIQ, power data can be graphed and reported on over time. These graphs can easily be exported or emailed as PDFs and the raw data exported as CSV for further reviews. In fact, in addition to collecting power metrics, CloudIQ can track and collect over 50 server metrics for users to review. CloudIQ also interfaces with other elements of Dell’s infrastructure, including storage and networking, giving customers the ability to correlate data, events, and trends across multiple technologies. CloudIQ is offered at no additional cost for all PowerEdge servers with ProSupport or higher contracts.

When power data is collected in CloudIQ, advanced AI algorithms process this data and automatically flag whether the server power usage behavior is outside normal parameters, based on historic data from that particular server.

Fiure 6. individual server power data with historical seasonality – no anomaly

Multiple servers can be put onto the same graph, making it easy to identify any rogue behavior by individual servers.

Figure 7. Multi server power usage report

The visualization of this data can be displayed from just hours to a whole year, with the ability to zoom in on a particular time.

 

Conclusion

Dell PowerEdge servers offer an extensive amount of data about power consumption by the advanced capabilities of the iDRAC. This power information is available on the iDRAC UI, as is telemetry information ready to be consumed by analytic solutions such as Splunk. This information is also accessible from the RACAMD CLI and RESTful API. Dell Technologies’ own one to many management solutions can also collect, collate, and report this information. Dell lets server admins select from a wide variety of tools and methodologies to meet the needs of their datacenter server power management requirements.

References

 iDRAC

OpenManage Enterprise Power Manager

CloudIQ for PowerEdge

GitHub for Dell Technologies, including iDRAC and OME/ Power Manager examples Dell Technologies · GitHub

API guide and landing page for developers including iDRAC & OME/ Power Manager https://developer.dell.com/

Home > Servers > Systems Management > Direct from Development: Tech Notes

OpenManage systems management OME

Improve Operational Efficiency Through OME Server Drift Management

Manoj Malhotra Mark Maclean Manoj Malhotra Mark Maclean

Mon, 16 Jan 2023 16:23:05 -0000

|

Read Time: 0 minutes

Summary

As they say “drift happens” … Ideally, firmware versions and configuration settings such as for iDRAC and system BIOS set up across a server environment should remain consistent. Configuration drift refers to the phenomenon where server(s) configurations ‘drift’ toward an inconsistent state. This Direct from Development (DfD) tech note describes how capabilities in Dell’s OpenManage Enterprise server management appliance facilitates the simplification of drift management, gives visibility of problems while at the same time reduces the time and effort to resolve.

Introduction

The failure to ensure a consistent server firmware version and configuration settings or not to detect unauthorized changes increases the risk of operational problems, security breaches, and even server outages. Why does this happen? – This situation can have many causes, including poor processes, routine hardware upgrades and replacements, or even attacks from external threats. What is the scope of the impact? – Any number of firmware versions or configuration settings. For example, in a secure environment many elements such as iDRAC user accounts / USB ports / server boot order may be areas of key interest. Dell’s OpenManage Enterprise management console (“OME” for short), provides compliance features that detect, highlight, and remediate issues, with simple processes for both firmware versions and configuration settings. OME also provides easy-to- create baseline configurations, using the intuitive server configuration templates/firmware catalogs, to streamline the capture/creation of required values, analyze multiple servers, and then apply the desired state. To perform any tasks in OME, you must have the correct role-based user privileges and scope-based operational access to the devices.

Managing configuration settings

Let’s look at configuration settings first. This is based on the iDRAC’s “server configuration profile” concept. A compliance template captures the server BIOS, iDRAC, and components’ configuration settings. A template can consist of hundreds of firmware settings, including iDRAC, BIOS, PERC RAID, NICs, and FC HBA configurations.

Figure 1: Configuration compliance status of server against configuration baseline
 
The OpenManage Enterprise Advanced license must be enabled on each server’s iDRAC to use this configuration compliance solution.
There are four basic steps to ensure configuration compliance:

  1. Create a compliance template to capture all required server configuration settings.
  2. Associate the compliance template to one or more servers to create a baseline group.
  3. Compare the template with the actual settings for each server and report.
  4. Remediate non-compliant servers with a single-click. Customers can create a compliance template from an existing deployment template, either by using OME to extract it from a “reference” server or by importing an existing template from a file. Each server associated with the baseline has its own itemized compliance status.

Figure 2.     Drill down view of “Compliance Report” screen that shows a compliance failure

When servers appear on the non-compliant list, remediation is simple to accomplish. A “one-click” compliance using the “Make compliant” button can be started immediately or scheduled. Note: a server reboot may be required to make the selected devices compliant.

Figure 3.     One-click “Make Compliant” button

After this baseline is created, more servers can be added to the baseline at any time, and the corresponding server template can be amended, cloned, or exported to another instance of OME. Finally, in “Reports” there is a pre-defined “Devices Per Configuration Baseline” report, which details the servers associated with each configuration baseline and each device’s compliance status. Using the reporting mechanism, the report can be downloaded or emailed. (In an upcoming release, OME will automate the process of report scheduling and emailing.)

 Managing firmware versions

In the modern server there are many components that have firmware, such as system BIOS, iDRAC, NICs, PERC, and hard drives. OME can inventory, report, and update firmware versions. If managing firmware versions is required to deliver consistency across a fleet of servers, this can be achieved by using the “Firmware and Driver Compliance” element of OME.

 Managing firmware version compliance, including firmware updating, does not require an OpenManage Enterprise Advanced license.

There are four steps to perform this compliance:

    1. Build a list of firmware versions to be scrutinized against the servers that require checking. This required server firmware “build” can be created from a default catalog of firmware versions (use OME to download the latest one from Dell Support). You can also build a custom catalog from repository manager or by using the Update Manager plugin for OME that is available with OME 3.5 or higher.
    2. Select the servers to be compared for compliance to create a baseline group.
    3. OME compares the catalog against the installed firmware then reports the overall and itemized compliance status of each server in the baseline.
    4. Remediate non-compliant servers with a single click.

Figure 4.     View of firmware versions created in “custom” catalog by Update Manager plugin

Figure 5.     Drill down view of Compliance Report in case of firmware compliance failures

 

When servers appear as non-compliant, remediation is simple to accomplish. A “one-click” compliance task can be started immediately or scheduled by the “Make compliant” button. Note: a server reboot may be required to make the selected devices compliant. Again, in “Reports” there are pre-defined reports named “Firmware Compliance per Device Report”/”Firmware Compliance Per Component Report”. These reports detail the server’s firmware versions and status. Using the reporting mechanism, these can be downloaded or emailed. As we mentioned earlier, firmware version compliance, including firmware updating does not require an OpenManage Enterprise Advanced license. In addition, driver compliance and updates are available for servers running Microsoft Windows 2016, 2019, or 2022.

 

Conclusion

Configuration and firmware compliance increases control while decreasing drift related issues and risk. Dell OpenManage Enterprise not only brings advanced feature rich server management to PowerEdge customers -- it also brings the power of automation to reduce effort, decrease time to resolution, and reduce management costs.

 

References

For additional details see:

 

 

 

 

 

 

 

 


Home > Servers > Systems Management > Direct from Development: Tech Notes

CloudIQ systems management

CloudIQ Provides Data Driven Server Management Decisions

Mark Maclean Kyle Shannon Mark Maclean Kyle Shannon

Mon, 16 Jan 2023 16:04:16 -0000

|

Read Time: 0 minutes

Summary

CloudIQ for PowerEdge provides a single easy-to-use portal to view the health and information of Dell Servers. CloudIQ’s powerful reporting backend enables customers to visualize and analyze server performance data. Key hardware metrics are collected, regardless of operating system and applications installed. Beyond reporting current server performance data, CloudIQ historical seasonality and anomaly detection accelerates issue detection and resolution for customers. This Direct from Development (DfD) tech note describes both the existing data server metrics reporting capabilities and the new historical seasonality with anomaly detection feature in PowerEdge for CloudIQ.

Introduction

CloudIQ is a cloud based proactive application that delivers insights and recommendations that give customers a consolidated view of PowerEdge servers and other Dell data center infrastructure, including storage, networking, and data protection systems. It can also consolidate multiple OpenManage Enterprise instances into a single portal.

Server Metrics 

iDRAC

The advanced agent-free architecture in iDRAC (Integrated Dell Remote Access Controller) incorporated in each PowerEdge server provides data about CPU performance, thermals, and power consumption. In order to collect these server metrics, each iDRAC needs to have at least an Enterprise, or OpenManage Enterprise Advanced license installed. If Data Center licenses are installed on the iDRACs, additional metrics for NIC traffic, Fibre Channel traffic, and SSD/NVMe device data are also available. Server metrics are compiled on individual iDRACs and then collected by OME. OME then consolidates and securely uploads this data to CloudIQ every 15 minutes.

CloudIQ

Within CloudIQ, the performance page displays a summary per server in a dashboard view (Fig. 1). Clicking into a single server, customers can view several ready to use server performance visualizations for significant measurements, such as CPU usage, system thermals, and power consumption. This includes the new ability to track and display historical seasonality data and anomaly detection (Fig. 2). The customer can also create custom graphs in the “report browser” feature (Fig. 3).

Figure 1 : Server Performance – Summary Dashboard

Anomaly Detection

The new ability based on historic seasonality data lets CloudIQ highlight irregular server behavior. Customers can now view a range of statistically normal behavior for each server’s performance metrics on the performance details page. This is calculated using data from each specific server based on a rolling three-week analysis per metric. The metrics chart visuals now highlight an anomaly any time the metric breaches the normal range within the last 24 hours. Anomaly detection is supported for all metrics that are displayed on the system performance page.


Figure 2: Server Performance – Example “Power Consumption” highlighting anomaly detection

Custom Reports

CloudIQ can create custom reports for up to 55 different server metrics. Customers can schedule reports to be run and emailed on a daily, weekly, or monthly basis. The data can also be exported as a CSV or PDF file.

Figure 3 : Server Performance - From Metric Browser – Example custom graph showing NIC data

Example Server Metrics

The following table shows a selection of some server metrics available. For a complete list, see Appendix A2 in the white paper PowerEdge Metrics in CloudIQ using OpenManage Enterprise (OME): An Overview.

Metrics

Sample Timing

License Required

System Performance

 

 

CPU Usage % 1

Avg of 5 minute sample

OME-Advanced or Data Center

IO Usage (PCI Express traffic) % 1

Avg of 5 minute sample

OME-Advanced or Data Center

Memory Usage (channels bandwidth ) % 1

Avg of 5 minute sample

OME-Advanced or Data Center

System Usage % (amalgamation of CPU / IO and memory usage) 2

Avg of 5 minute sample

OME-Advanced or Data Center

System Power

 

 

System Power Consumption kWh

Avg, Min and Max of 15 minute sample

Enterprise

System Thermal

 

 

Temperature Reading Degrees C 2

Avg of 5 minute sample

Enterprise

Sys Net Airflow CFM 2

Avg of 15 minute sample

OME-Advanced/ Data Center

NICs

 

 

TxBytes 2

Total in 5 minute samples

Data Center

RxBytes 2

Total in 5 minute samples

Data Center

FC HBAs

 

 

FCRxKBCount 2

Total in 5 minute samples

Data Center

FCTxKBCount 2

Total in 5 minute samples

Data Center

  1. – System performance data on 12 and 13 generation servers only require an iDRAC enterprise license.
  2. – iDRAC9 only

Basic Metrics include Power, Thermal, and CPU. YX5X servers have different Basic Metrics, based on whether they are AMD or Intel:

  • Intel model Basic Metrics include Power, Thermal, CPU, IO, and Memory utilization.
  • AMD model Basic Metrics include Power, Thermal, and CPU

Conclusion

Some customers say, “slow is the new down”! With in-depth visibility of key performance metrics for servers, storage, and networking infrastructure, CloudIQ allows customers stay on top of all their Dell data center resources, giving them the ability to manage, analyze, and plan proactively.

References

For more details about the available PowerEdge Metrics in CloudIQ, see the full table in Appendix A2 of the white paper PowerEdge Metrics in CloudIQ using OpenManage Enterprise - An Overview.

For more information, see:


Home > Servers > Systems Management > Direct from Development: Tech Notes

PowerEdge CloudIQ cybersecurity

Dell CloudIQ Cybersecurity For PowerEdge: The Benefits Of Automation

Mark Maclean Kyle Shannon Mark Maclean Kyle Shannon

Mon, 16 Jan 2023 15:08:26 -0000

|

Read Time: 0 minutes

Summary

There are many server settings that customer infrastructure teams can select to harden servers against growing cyber threats. But how can they find and use Dell’s security configuration settings best- practices? And how can they efficiently and continuously check if the settings are incorrectly configured or have changed? The answer is the cybersecurity feature in the CloudIQ for PowerEdge AIOps solution. It compares the configuration of deployed PowerEdge servers to a security related configuration policy. When CloudIQ identifies a deviation between the actual settings and the recommended configuration settings, it notifies the administrator and recommends remediation steps to correct the issue(s). This Direct from Development (DfD) tech note details the time savings that customers can achieve by using the CloudIQ automated cybersecurity policy engine versus manually examining compliance.

Introduction

In today’s always-on, always-connected environment, all organizations constantly need to enhance their cybersecurity strategy to mitigate the increasing threat of attack. Using the built-in cybersecurity feature of Dell CloudIQ, customers can easily build security policies to protect PowerEdge servers. A policy consists of ready-to-use tests that customers can enable simply by selecting a checkbox. The tests consist of infrastructure security settings that are based on Dell best practices and the American NIST (National Institute of Standards and Technology) cybersecurity framework. Dell CloudIQ Cybersecurity for PowerEdge both enables the easy creation of a policy and automates the policy policing—making it simple, efficient, and predicable.


Figure 1. CloudIQ Cybersecurity Dashboard

CloudIQ is the AIOPs proactive monitoring and analytics application that delivers system health insights and recommendations for Dell infrastructure solutions, including storage, data protection, networking, and of course, PowerEdge servers.

The cybersecurity policy engine built into CloudIQ has over 30 security configuration rules for PowerEdge that can be implemented simply. Because CloudIQ is cloud based, it can integrate with any number of OpenManage Enterprise (OME) instances across multiple datacenters, through the OME CloudIQ Plugin. This means that CloudIQ can apply the same policy to multiple OME managed servers, regardless of their location. This feature is delivered by CloudIQ without any additional configuration at the iDRAC or OME level. When a policy is established, CloudIQ continuously checks the desired state of PowerEdge security configuration settings against the current “as is” configuration. If a server is found to be out of policy compliance, it is highlighted. The results are scored by CloudIQ, with the most vulnerable servers being given a “high” risk level. Individual problems can be viewed with recommended remediation. These recommended security configuration corrections can then be executed one-to-one per server using the iDRAC GUI. If multiple hosts are found to be non-compliant, then OME can be used to deliver a configuration update template file or execute a RACADM script to correct the security configurations for multiple servers.

The Benefits Of Automation

To understand the profound impact of the automation of this process, we have tested it against a manual process for 1, 10, 100*, and 1,000* servers. Based on the testing of the CloudIQ Cybersecurity approach for a customer with 1000* servers, we found the following:

  • The CloudIQ task completed 99% quicker than a manual review.*
  • CloudIQ reduced the time by 98 hours to complete the task once.*
  • Using CloudIQ Cybersecurity automation saves over a week of effort immediately versus manual.*
  • Once enabled, CloudIQ monitors of all these key security configuration settings continuously.

*Projected outcomes based on analysis of results, results may vary. 

In the lab testing, we found that manually checking 15 settings on the iDRAC GUI took 5 minutes 56 seconds. By contrast, creating a CloudIQ cybersecurity policy consisting of 15 active test items and selecting target server(s) only took 2 minutes 58 seconds. In addition, whether creating the policy for 1, 10, 100, or 1000 servers, this task took the same amount of time. However, using the manual process, each additional server added an additional 5 minutes 56 seconds to complete the checks. Also, after the policy is set, CloudIQ continues to check the servers’ as-is settings for compliance. 

Results Summary

The following graph highlights the differences between automation and the manual process, showing the time saving advantages.

See Table 1 near the end of this document for full results.

Testing Overview

To demonstrate the ease of use and the impact of automation, we tested two different approaches: manual versus automation. To use this Cybersecurity feature of CloudIQ:

  • OpenManage Enterprise 3.9 “OME” or higher must be installed, with the CloudIQ Plugin 1.1 or higher enabled
  • the PowerEdge server(s) must be covered by Dell Pro Support
  • the target servers for the policy must already be discovered by OME

To build the policy, the user must have the CyberSec admin rights assigned in CloudIQ. Some of the configuration rules used in the test security policy are the iDRAC default values. However, any of these values can be changed on an individual iDRAC by administrators with the correct rights, opening a security weakness.

Figure 2. Configuration Data Flow

Testing Procedure

To ensure an accurate comparison of the test approaches, we rigidly tested and documented our testing. We selected 15 common settings, a mixture of BIOS and iDRAC configuration values, and enabled 15 tests in the trial policy.

Tests were conducted in-house on July 6, 2022, at Dell Technologies in Austin TX, in the technical marketing lab facility and online using Dell’s CloudIQ offering.

I. USB ports: Disabled

II. iDRAC active NIC: Dedicated

III. System lock down: Enabled

IV. iDRAC config from host: Disabled

V. IPMI over LAN: Disabled

VI. Secure boot: Enabled

VII. Password policy: Strong

VIII. VNC: Disabled

IX. SNMP version 3: Enabled

X. SSH: Disabled

XI. Syslog: Enabled

XII. Active directory authentication: Enabled

XIII. IP blocking: Enabled

XIV. Virtual media encrypted: Enabled

XV. NTP time synchronization: Enabled

Steps for an automated approach: using CloudIQ PowerEdge Cybersecurity policy

Starting from the CloudIQ “sign in page” https://cloudiq.emc.com: 

  1. Sign into CloudIQ.

2. From the menu down at the left-hand side of the screen. select Cybersecurity.

3. Select Policy.

4. Select the templates tab.

5. Select add template.

6. Name template.

7. Select PowerEdge from product drop down menu, then click next.

8. In the template evaluation plan, configure the following:

9. Access Control – select: IP blocking is enabled/SSH is disabled/The SNMP configured for V3/Active directory authentication is enable / VNC disabled

10. Audit and Accountability – select: NTP time synchronization enabled / Remote Syslog enabled

11. Configuration Management – select: configure iDRAC from Post/System lockdown enabled/USB ports disabled

12. Identification and Authentication – select: Password has minimum strength score of strong

13. System and Coms Protection – select: IPMI over lan disabled / virtual media encryption enabled / dedicated nic

14. System and information – secure boot enabled

15. Select finish.

16. Select the systems tab.

17. Select the required hosts from the list of hosts (in our test we selected a list of 1 or 10 or 100 or 1000).

18. Click assign.

19. Select the required template from the drop down template list menu.

20. From the menu down at the left-hand side of screen, select system risk to view results.



Figure 3. Select rules to build a policy

Steps for the manual approach: checking configuration values in iDRAC GUI

From a browser displaying the iDRAC login screen:

1. Login

2. USB – Configuration/BIOS settings/integrated devices/user accessible USB ports: all ports off

3. Secure boot – Configuration/BIOS settings/TPM advanced /secure boot: enabled

4. VNC – Configuration/Virtual console/VNC server/Enable VNC server: Disabled

5. SNMPv3 – Configuration/System setting/Alert config/SNMP trap/SNMP setting/SNMP Trap format: SNMP v3

6. Syslog – Configuration/System settings/Alert configuration/Remote syslog settings/Remote syslog: Enabled

7. Virtual Media encryption – Configuration/Virtual media/Attached media/Virtual Media encryption: Enabled

8. Dedicated port – iDRAC settings: Active NIC interface: dedicated

9. Local iDRAC Config – iDRAC settings/services/local config/disable iDRAC local configuration: enabled

10. IPMI – iDRAC settings/connectivity/network/IPMI settings/Enable IPMI over lan: disabled

11. Password Policy – iDRAC settings/users/global users settings/Password setting/Policy/Score: Strong1

12. AD authentication – iDRAC settings/Users/Directory services/Microsoft AD: Enabled

13. SSH – iDRAC settings/services/SSH/Enabled: Disabled

14. IP blocking – iDRAC settings/Connectivity/Network/Advanced networking setting/IP blocking/Blocking: Enabled

15. NTP time synchronization – iDRAC settings/settings/Time zone/NTP server/Enable NTP: Enabled

16. Lockdown – check padlock icon on top right of screen is displaying locked mode

These steps were tested using Dell PowerEdge R540 BIOS 2.12.2 and iDRAC9 firmware: 5.10.00


Enforcing the strong password policy manually ensures new password compliance with the password policy, however pre-existing accounts could still have weak passwords waist cloudIQ flags any iDRAC with weak password.

Results

Number of servers

CloudIQ Cybersecurity

Policy

Manual Checking

1

2 Min 58 Sec

5 mins 56 secs

10

2 Min 58 Sec

59 min

100

2 Min 58 Sec

9 hours 53 mins

500

2 Min 58 Sec

49 hours 26 mins

1000

2 Min 58 Sec

98 hours 53 min

Table 1.       Results of Testing

 

Summary

Our testing showed that automation using the Dell CloudIQ for PowerEdge cybersecurity policy engine brought major benefits in time efficiency, repeatability, predictability, and of course, peace of mind. The benefits also dramatically increased as we extrapolated the number of servers in the testing data.

 

References

CloudIQ on Dell.com - for data sheets and demo videos

Take Control of Server Cybersecurity with Intelligent Cloud-Based Monitoring Blog

Building and Tracking Dell CloudIQ Cyber Security Policies for PowerEdge Servers Video

Technical Knowledge Page For OpenManage Enterprise CloudIQ Plugin

Additional Cybersecurity Related Solutions from Dell

 

 

Home > Servers > Systems Management > Direct from Development: Tech Notes

PowerEdge CloudIQ cybersecurity

Harden Your Server Cybersecurity With Dell CloudIQ

Mark Maclean Kyle Shannon Mark Maclean Kyle Shannon

Mon, 16 Jan 2023 15:08:21 -0000

|

Read Time: 0 minutes

Summary

It can take years for an organization to build a good reputation with its customers and few minutes of a cybersecurity related incident to ruin it. Cybersecurity teams and server administrators must use every tool in their armory to harden infrastructure. Here is a feature of Dell CloudIQ that every Dell PowerEdge customer should know about. This Direct from Development (DfD) tech note describes the cybersecurity capabilities for PowerEdge servers that are built into CloudIQ. CloudIQ is a cloud AI/ML- based monitoring and predictive analytics application for the Dell infrastructure product portfolio. Hosted in the secure Dell IT Cloud, CloudIQ collects and analyzes health, performance, and telemetry to pinpoint risks and to recommend actions for fast problem resolution.

Introduction

Dell CloudIQ offers a cybersecurity feature that now includes Dell PowerEdge servers. The cybersecurity feature built into CloudIQ lets customer server teams build a policy called an evaluation plan. This evaluation plan is built from a number of ready to use “click to pick” configuration criteria tests. This list of configuration settings and values is based on Dell Technologies best practices and the American NIST (National Institute of Standards and Technology) cybersecurity framework.

 An approach for rapid results

A specialist with the right skills who understands the exact security configuration settings with correct values could build a server configuration profile “SCP” and use it directly with the iDRAC or OME configuration template feature to set server configurations. However, CloudIQ offers a much quicker and prescriptive method to implement a cybersecurity assessment policy that is built on Dell’s recommended settings and values. To further streamline the cybersecurity process, CloudIQ can aggregate multiple OME instances, offering one consolidated view of servers across many locations. Some organizations may choose to use both OME and CloudIQ to demonstrate the separation of configuration compliance and security management.


Figure 1. Cybersecurity status summary from the CloudIQ Overview page

This cybersecurity tile on the CloudIQ overview page provides an aggregated risk level status view, breaking down the number of systems in each risk category and the total number of detected issues. The risk is determined by the severity and the number of issues per server.

For example, a server with one or more high risk problems is categorized as high risk. Another server with more than five non-high risks, of which one is a medium issue, would also be categorized as high risk. 

Identify risks fast

The system risk dashboard classifies each server with a policy applied, displaying each server in its own card with the cybersecurity risk level status. This helps customers quickly prioritize actions and speed time to resolution.


Figure 2. Cybersecurity System Risk all systems dashboard 

Beyond the dashboard, the security assessment status displays the details for each server, with recommended action to return any deviated security configuration to the preferred state. The donut chart displays how many rules been selected as a percentage from total tests in the risk evaluation plan that are assigned to the particular server.

Figure 3. Cybersecurity Risk details and recommendations

On the system detail page, under the cybersecurity tab, are details about the evaluation plan and its status. The bottom of the page has two tabs: Cybersecurity Issues, detailing each non-compliant element with its corrective action, and Evaluation Plan, displaying the entire plan and the selection status of each test.

Figure 4. Test selection

CloudIQ users can also select to receive a Daily Digest email, including a Cybersecurity status summary.

Figure 5.         CloudIQ Daily Digest email

 

Enablement and security

As you would expect, many security access controls are built into CloudIQ around administrator and user accounts. There are two Cybersecurity roles built to CloudIQ: Cybersecurity Admin and Cybersecurity Viewer. These roles can be assigned from accounts that have CloudIQ administrator rights.

Figure 6. RBAC setup

 To support cybersecurity for PowerEdge within CloudIQ, customers must be running OpenManage Enterprise 3.9 or higher, with the CloudIQ plugin 1.1 or higher enabled. All servers require Dell ProSupport coverage and must already be discovered by OME.

PowerEdge cybersecurity evaluation plan test elements

The following table lists each test criteria and the test plan family to which it belongs.

Family

Title

System & Communications

IPMI over LAN interface is disabled

System & Communications

IPMI Serial over LAN is disabled

System & Communications

Virtual Console encryption is enabled

System & Communications

Virtual Media encryption is enabled

System & Communications

Auto-Discovery is disabled

System & Communications

VLAN capabilities of the iDRAC are enabled

System & Communications

iDRAC Web Server has TLS 1.2 or TLS 1.3 enabled

System & Communications

iDRAC Web Server HTTP requests are redirected to HTTPS requests

System & Communications

Virtual Console Plug-in type is enabled

System & Communications

iDRAC is using the dedicated NIC

System & Communications

iDRAC Web Server has TLS 1.2 or TLS 1.3 enabled

Access Control

IP Blocking is enabled

Access Control

VNC server is disabled

Access Control

The SNMP agent is configured for SNMPv3

Access Control

Quick Sync Read Authentication to the server is enabled

Access Control

SSH is disabled

Access Control

User Generic LDAP authentication on iDRAC is enabled

Access Control

User Active Directory authentication on iDRAC is enabled

Configuration Management

USB Ports are disabled

Configuration Management

Telnet protocol is disabled1

Configuration Management

System Lockdown is enabled

Configuration Management

Configure iDRAC from the BIOS POST is disabled

Audit & Accountability

NTP time synchronization is enabled

Audit & Accountability

NTP is secured

Audit & Accountability

Remote Syslog is enabled

System & information integrity

Local Config Enabled iDRAC configuration on Host system is disabled

System & information integrity

Secure Boot is enabled

Identification & Authentication

Password has a minimum score of Strong Protection

Identification & Authentication

LDAP Certificate validation is enabled

Identification & Authentication

Active Directory Certificate validation is enabled

Identification & Authentication

iDRAC Webserver SSL Encryption using 256 bit or higher

Identification & Authentication

iDRAC Web Server - SCEP is enabled

 

 

1 Starting with iDRAC firmware release version 4.40.00.00, the telnet feature is removed from iDRAC.

Summary

Unlike the typical IT team member, CloudIQ doesn’t need to eat, sleep, or go on holiday, so organizations can rely on CloudIQ cybersecurity policies to continuously monitor for non-compliant servers. Cybersecurity built into CloudIQ lets customers speed up the delivery of server security through automation of pre-defined tests and status visualization. This provides high levels of flexibility for server administrators, all while maintaining the governance and control that cybersecurity teams need to enforce. CloudIQ further reduces risk and improves IT productivity by displaying cybersecurity, plus the system health status of servers, and the wider Dell infrastructure portfolio—all together in the same convenient, cloud-based portal.

 

References

CloudIQ on Dell.com - for product information, demo videos and more

Take Control of Server Cybersecurity with Intelligent Cloud-Based Monitoring Blog

Building and Tracking Dell CloudIQ Cybersecurity Policies for PowerEdge Servers Video

Technical Knowledge Page For OpenManage Enterprise CloudIQ Plugin

Additional Cybersecurity Related Solutions from Dell

Home > Servers > Systems Management > Blogs

OpenManage modular servers systems management rack servers tower servers troubleshooting

OpenManage Enterprise Adds the Troubleshoot Option

Mark Maclean Mark Maclean

Fri, 02 Dec 2022 16:26:32 -0000

|

Read Time: 0 minutes

What happens when you can’t get the restaurant staff’s attention? You have an error in connecting to the servers! That can also happen when discovering new servers in OpenManage Enterprise.

Fortunately, OpenManage Enterprise 3.9 has added a built-in troubleshooting toolkit to help diagnose problems during device discovery. This feature tests connectivity from the management appliance across the network to the managed devices.

When onboarding a fleet of devices, such a group of iDRACs into OpenManage Enterprise, when a discovery job completes, if any devices are missing, how can administrators identify the problem? For example, how can they quickly check whether the SNMP is enabled and configured correctly on a remote iDRAC? How can they make sure they're using the right credentials for an iDRAC, or how do they know if OpenManage Enterprise can communicate with the target devices on the network? These difficulties and many other errors can now be resolved using the OpenManage Enterprise embedded troubleshooting option.

When you log into the OpenManage Enterprise console, you can find the embedded troubleshooting commands in the monitor menu. To run a test, select “troubleshooting” then enter the problem device IP address or hostname. Troubleshooting can start with a simple ping, then select the required protocol for deeper investigation, as shown here.

 

Beyond the basic ping that ensures that the target host can be “seen” on the network, the integrated troubleshooting also supports various protocols. Examples of these include WSMAN, REDFISH, and SNMP for testing the remote SNMP service and the community name on iDRACs. Operating system connectivity tests are also included, such as SSH for Windows and Linux. 

If a device cannot be reached, this might be caused by a firewall. To ensure that the correct ports are open, see the section Supported protocols and ports in OpenManage Enterprise in the OpenManage Enterprise 3.9 User's Guide which lists the required TCP/IP ports, with details about traffic direction and usage. 

The goal is to have any server’s managed state be “Managed with Alerts”, as shown here. This state means that the discovery has completed successfully: the device is recognized as a server, and the SNMP service on the iDRAC has been configured correctly to send event traps to the OpenManage Enterprise management appliance.

In my role as a technical marketing engineer, I now regularly guide customers to this embedded troubleshooting tool to diagnose problems. This has often been a major step in understanding configuration issues and resolving any device discovery difficulties rapidly.

Resources

Learn more at: Support for Dell OpenManage Enterprise  

Author: Mark Maclean, OpenManage Technical Marketing Engineering




Home > Servers > Systems Management > Blogs

PowerEdge OpenManage systems management cybersecurity

Strengthen the Security Posture of your PowerEdge Servers

Kim Kinahan Mark Maclean Kim Kinahan Mark Maclean

Tue, 25 Oct 2022 19:27:27 -0000

|

Read Time: 0 minutes

We've heard it said “Give a hacker a 0-day vulnerability, and they will have access for a day; teach a hacker to phish, and they will have access for life.” That made us smile. However, at Dell Technologies we take security very seriously with the mindset that security should be built in, not an add on. In our roles at Dell, we focus on the server management portfolio and we have created a number of tools to help organizations strengthen the security posture of PowerEdge servers.

 

Starting with CloudIQ, our cloud-based AI OPS infrastructure analytics offering, we incorporate a cybersecurity engine that includes a selection of click to enable security policies for PowerEdge servers, based on Dell best practices. We recently published two DfD (direct from development) papers: 

 

*Projected outcomes based on Dell internal analysis of results of one and ten servers, customer results may vary.

Then looking on premise — OpenManage Enterprise (OME), Dell’s server management solution, scales up to 8000 nodes. OME provides full and rich server configuration drift detection and remediation  management of the server configuration profiles accessed from each individual server’s iDRAC. For an overview of that feature, and details about firmware versions and the firmware configuration process, see Improve Operational Efficiency Through OME Server Drift Management.

References

Authors: Kim Kinahan and Mark Maclean, PowerEdge Technical Marketing Engineering

LinkedIn


Home > Servers > Systems Management > Blogs

PowerEdge VMware vCenter OpenManage

New OpenManage Enterprise Advanced+, Ready to Bring New Customer Benefits

Mark Maclean Mark Maclean

Wed, 10 Aug 2022 19:10:25 -0000

|

Read Time: 0 minutes

Recently I heard a joke: How many developers does it take to change a light bulb …none, it's a hardware problem. Historically Dell has been perceived as a hardware vendor. This means that some customers still have not realized the many features and benefits that our Dell developed OpenManage Server management software portfolio delivers.  

OpenManage Enterprise (OME), Dell's on-premise server lifecycle management console, is core to Dell's server management solutions. Since its release in September 2018, OME continues to increase in functionality to drive down the number of separate standalone tools and consoles required to manage the lifecycle of Dell PowerEdge servers.

This management solution supports Dell's strategy of unifying, streamlining and delivering automation. OpenManage Enterprise manages approximately 50% of all Dell servers currently deployed, highlighting how valuable and useful customers find this solution*. The standard version of OME is free, with advanced features such as server deployment and power management requiring additional licenses.

 

Starting June 20th, 2022, Dell has added a new Advanced+ license that enables even more OME functionality. It includes the license for the new OME plugin for VMware integration OpenManage  Enterprise integration of VMware vCenter (OMEVV), and the bundling in of the ServiceNow integration license. It will also include the OpenManage Enterprise Microsoft System Center Integration plugin (OMEMSSC) which will be available early in 2023.

Here’s a summary of OME features and licensing:

OME Free

OME Advanced
(Includes all OME Free features)

OME Advanced+
(Includes all OME Advanced features)

  • Network discovery
  • Hardware & firmware inventory
  • Health monitoring
  • Alerts and actions
  • Firmware updates
  • Dell driver update (Windows)
  • 3rd party MIB import support
  • Warranty status
  • Built in and custom reporting
  • OpenManage Mobile support
  • OME Services plugin
  • OME CloudIQ plugin
  • OME Update Manager plugin
  • Power manager plugin
  • Server template deployment
  • Configuration compliance
  • Auto deployment
  • MX template deployment
  • IOA provisioning including VLAN
  • ServiceNow integration
  • VMware OMEVV plugin
  • OMEMSSC plugin (planned in 2023)

Note: These licenses are tied to an individual server and hosted on the iDRAC but are different from iDRAC licensing (such as iDRAC Datacenter) because they do not enable iDRAC features. Instead, they enable external software and features such as OME, as shown in the table above.   

Conclusion

As server administration teams are asked to manage more infrastructure, in less time than ever before, it is crucial for these teams to leverage any new solutions to drive efficiency.

To enable the new features and benefits of the Advanced+ license, including OMEVV, ask your Dell sales team about OMEVV and for a quote for the OME Advanced+ license.  

To learn more, see:

* 50% in claim based on attached license sales and OME downloads

Author: Mark Maclean, PowerEdge Technical Marketing Engineering

Home > Servers > Systems Management > Blogs

PowerEdge security OpenManage

OpenManage Enterprise: Security Built In

Mark Maclean Mark Maclean

Sun, 10 Jul 2022 15:50:17 -0000

|

Read Time: 0 minutes

I've heard it said that the two biggest cybersecurity fears that customer security teams have are: everyone who works at the company and everyone who doesn't. 

Given this fact, this blog describes the most common security features designed in to open manage enterprise, Dell's on-premise server life cycle management solution.

So let’s review the security built into Dell OpenManage Enterprise (or “OME” for short). OME has many security features to protect data held within the appliance and to guard against unauthorized access and use. The Dell server management team aims to provide best in class, on-premise, one-to-many PowerEdge server management capabilities with OME, and ensures that these can be used while meeting customers security requirements.

In December 2021, Dell Technologies released OpenManage Enterprise 3.8.4 update with a mitigation for Apache Log4j Java vulnerability. This Java Vulnerability was a catalyst for many customers to have a broader security review of many commonly used IT tools and solutions.

Since then, Dell has released OME 3.9 that includes an updated plugin for Dell CloudIQ with the new PowerEdge Cybersecurity feature (see the video Building & Tracking Dell CloudIQ Cyber Security Policies for PowerEdge Servers).

Secure foundations

OpenManage Enterprise “OME” is a systems management appliance that is delivered in a virtual machine format, ready to be deployed. This virtual appliance is based on hardened Security-Enhanced Linux (SELinux) with an internal firewall configured. Policies ensure data protection and managed access to the OME workflows. OME stores all sensitive data encrypted with the OME generated encryption key. All user credentials are stored with a one-way hash and cannot be decrypted. In addition to local user authentication, OME offers authentication by means of AD/LDAP or OpenID Connect. Of course, OME supports only user connections over a TLS v1.2 channel and redirects all HTTP requests to HTTPS to ensure that communications follow a secure channel.

Role and scope access control

OME has Role Based Access Control (RBAC) that clearly defines the user privileges for the three built-in roles—Administrator, Device Manager, and Viewer. Scope-based Access Control (SBAC) is an extension of the RBAC feature that allows an administrator to restrict a Device Manager role to a subset of device groups, called “scope”. For more information about RBAC and SBAC, see Role and scope-based access control in OpenManage Enterprise on the Dell Support site.

Login policies 

OME security configuration settings allow customers to restrict incoming connections to the appliance. This can be done by a restricted “allowed” network IP range, so that only certain IP addresses are valid for access. Also, a “lockout” policy can be created, using either username or an IP address to block multiple unauthorized access attempts.

Network interfaces  

OpenManage Enterprise enables customers to add multiple network interfaces that allow for the configuration of a more secure management network. For example, applying different firewall rules to the interface can provide a greater level of security for the external-facing network interface.

In addition, OME supports customizing the TCP ports used by core https console access and for the NFS share. IPv6 Protocol, including communications to and from iDRACs, is also supported as an option.

Auditing and logging

Auditing provides a historical view of the users and activity on the system. For example, an audit log is recorded when a group is assigned, access permissions change, or a user role is modified.

These events are written to the OME audit log files and can be exported to CSV file format. In addition, if an administrator enables forwarding to a syslog system and configures an appropriate event rule, OME can forward event message(s) to the syslog server.

To wrap up

This blog has highlighted some of the key methods and features Dell uses to keep OpenManage Enterprise secure, so that customers can use it with confidence.

Resources

To learn more about OME and related topics, see:

Author: Mark Maclean, PowerEdge Technical Marketing Engineering