SolutionPack for iDRAC PowerEdge
Wed, 01 Mar 2023 17:16:08 -0000
|Read Time: 0 minutes
Summary
Dell Storage Resource Manager (SRM) provides comprehensive monitoring, reporting, and analysis for heterogeneous block, file, object, and virtualized storage environments. It enables you to visualize applications to storage dependencies, monitor, and analyze configurations and capacity growth. It has visibility into the environment’s physical and virtual relationships to ensure consistent service levels.
To enable storage administrators to monitor their physical and virtual compute environment, Dell provides SRM solution packs. These solution packs include SolutionPack for Physical Hosts, Microsoft Hyper-V, IBM LPAR, Brocade FC Switch and Cisco MDS/Nexus with passive host discovery options, VMware vSphere & vSAN, and Dell VxRail.
With the new SolutionPack for iDRAC PowerEdge, we can monitor the status of server hardware components such as power supplies, temperature probes, cooling fans, and battery. We can also gather historical information about electrical energy usage and other key performance indicators that measure the proper functioning of a server device.
SRM cross-domain functionality
To illustrate SRM’s cross-domain functionality, we examine the most common use case, where Dell PowerEdge physical servers are deployed as part of VMware hypervisor clusters.
SolutionPack for VMware vSphere & vSAN provides capacity, performance, and relationship data for all VMware discovered components, such as VMs, hypervisors, clusters, and datastores, as well as their relationship with fabric and backend storage arrays. Here is one example of the end-to-end topology of the virtualized environment:
Figure 1. Example of end-to-end topology of a virtualized environment
To gain physical access to the PowerEdge servers and their hardware components, we rely on integrated Dell Remote Access Controller (iDRAC), which is a baseboard management controller that is integrated in PowerEdge servers.
iDRAC exposes hardware components’ data through several APIs, one of them being SNMP. With SRM SNMP collector, which is part of the SolutionPack for iDRAC PowerEdge, we discover iDRACs from which we pull PowerEdge server data. This data includes electrical energy usage (Wh), probes temperature (C), power supply output (W), and cooling devices speed (RPM). It also includes status of power supplies, battery, cooling devices, temperature probes, and server availability. SRM provides historical reports for all the metrics, with a maximum 7-year data retention for weekly aggregates.
With the data available from the iDRAC PowerEdge, VMware vSphere & vSAN, and relevant fabric and storage array solution packs, users can seamlessly navigate from the context of physical server hardware component reports to the context of the physical server reports within the broader SAN environment.
SolutionPack for iDRAC PowerEdge data collection and alerts
Let’s examine the component status data, performance data, and alerts provided by the SolutionPack for iDRAC PowerEdge.
Status and performance data
Initial Card View and Table View
The Summary page Card View and Table View for PowerEdge servers show hardware components status (temperature probes, cooling devices, battery, power supply), server availability, daily electrical energy usage (kWh), energy cost ($), and daily carbon emission (kgCO2e). Energy cost and carbon footprint metrics are calculated based on server location. In the following example, we see significant difference in daily carbon emission between Poland and Germany, even though there is small difference in daily energy usage. The same applies to energy cost prices.
Figure 2. Card view of hardware component status
Figure 3. Table view of hardware component status (first 10 columns)
Figure 4. Table view of hardware component status (final columns—continuation of preceding figure)
Energy cost and carbon emissions per country are calculated dynamically based on data enrichment enabled on SRM collectors. Metrics collected from each iDRAC are automatically tagged with location, carbon intensity, and energy cost properties. Here is an example of data enrichment configuration from the SRM admin UI:
Figure 5. SRM admin UI showing data enrichment configuration
CSV files that contain values for energy cost and carbon intensity per country are available publicly and can be transferred automatically through FTP to SRM collectors as part of the data enrichment process. Here is a CSV file excerpt that contains kWh cost ($) per country:
Figure 6. Excerpt of kwh-cost-per-country CSV file
And here is a CSV file excerpt that contains carbon intensity per kWh per country: Figure 7. Excerpt of carbon-intensity-by-country CSV file
The CSV file for data enrichment with device,location mapping is specific to every customer.
End-to-end topology map
From the initial Card View or Table View, you can drill down to the PowerEdge server end-to-end topology map. This is a host-based landing page where you can see the server’s relationship with the rest of the SAN components, as well as server attributes, performance, capacity, alerts, and inventory data. This is an example of SRM’s powerful capability to aggregate and visualize data from multiple contexts within the same reporting UI.
Figure 8. End-to-end topology map
iDRAC PowerEdge Inventory report
The iDRAC PowerEdge Inventory report shows servers’ hardware component names, quantities, server hostname, serial number, operating system version, model, and IP address:
Figure 9. Inventory report (first six columns)
Figure 10. Inventory report (final columns—continuation of preceding figure)
Drilling down from the preceding table leads to the daily status dashboard of a selected server’s hardware components. Here are a few examples:
Figure 11. Status of cooling devices
Figure 12. Power supply output watts
Figure 13. Energy usage (Wh)
iDRAC PowerEdge Performance report
The iDRAC PowerEdge Performance report shows key metric values for servers’ hardware components, such as probes temperature (C), temperature lower and upper thresholds, cooling device (RPM), and cooling device critical and non-critical thresholds. Each selected row plots interactively historical performance data on the charts below the table, including server electrical energy usage (Wh), probes temperature (C), and cooling devices (RPM).
Figure 14. Trend chart—Electrical energy usage (Wh)
Figure 15. Trend chart—Probes temperature (C) values plotted alongside threshold values
The following trend chart shows cooling device (RPM) values plotted on the same chart with the active alerts relevant to the cooling device. The alert is displayed as a black dot with pop-up details of the issue that caused the alert. This feature greatly improves troubleshooting and is another example of SRM’s powerful capability to aggregate and visualize data from multiple contexts within the same reporting UI.
Figure 16. Trend chart—Cooling device (RPM) values plotted on the same chart with the active alerts relevant to the cooling device
The following bar charts show Carbon Emission, Energy Cost ($), Cooling (RPM), Energy Usage (kWh), and Temperature (C) per location during the last month. You can drill down on each bar chart to see reports for each location to analyze the top 10 contributing items per device type (hypervisor, host) and per server.
Figure 17. Carbon Emission and Energy Cost bar charts
Figure 18. Energy Usage and Temperature bar charts
Alerts
The iDRAC PowerEdge Operations report shows currently active alerts received from iDRAC as SNMP traps. The solution ack contains 80 certified alert definitions that cover iDRAC System Health and Storage category alerts, including AmperageProbe, Battery, Cable, CMC, Fan, FC, LinkStatus, MemoryDevice, Network, OS, PhysicalDisk, PowerSupply, PowerUsage, TemperatureProbe, TemperatureStatistics, VoltageProbe, LiquidCoolingLeak, and others.
You can enable any or all alerts on each iDRAC under Configuration > System Settings > Alert Configuration > Alerts. You can configure SNMP trap receivers under Configuration > System Settings > Alert Configuration > SNMP Traps. In this case, the SNMP trap receiver is the SRM collector server.
Figure 19. Active alerts on iDRAC PowerEdge Operations report
By right-clicking an alert row, you can acknowledge, assign, close, take ownership of, or assign a ticket ID to the alert.
Figure 20. Acting on an alert
By clicking on an alert row, you can see a detailed report about the alert. Also, the SRM alerting module includes functionality to forward selected alerts to external applications, such as ServiceNow ITSM through a Webhook API or fault management applications through an SNMP trap or email.
You can navigate directly from the alerts report to the affected server’s landing page by clicking the device name link in the Device column of the All Alerts report. SRM relates alert-specific data with the time-series data originated from the same device and seamlessly navigates through corresponding reports. The following figure shows an affected server’s summary report with the topology and underlying Operations section showing the server’s active alerts.
Figure 21. Server summary report with topology and active alerts
Conclusion
SRM’s powerful framework allows storage administrators to easily integrate environmental data for PowerEdge physical servers into the existing end-to-end SAN inventory, performance, capacity, and alert reports. SRM reduces the time that is required to identify the cause of issues occurring in the data center.
With the new SolutionPack for iDRAC PowerEdge, administrators can monitor PowerEdge hardware components and obtain historical information about energy usage and other key performance indicators.
Supported platforms
The iDRAC PowerEdge Solution Pack supports:
- Dell iDRAC MIB v4.3
- Dell PowerEdge models listed at https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=96cdj#SupportedOs
Author: Dejan Stojanovic
Related Blog Posts
Sweet 16 ways OpenManage helps customers to maximize their investment in PowerEdge
Wed, 12 Apr 2023 01:27:49 -0000
|Read Time: 0 minutes
As we at Dell announce details of the new wave of PowerEdge servers (details here), we want to highlight 16 examples of how the OpenManage portfolio of systems management software enhances our server range. Like I always say, where there are servers, there are server management requirements.
The OpenManage portfolio exists to save customers of any size time and money, eliminating the necessity of high-touch, manual steps to deliver efficiency. Designed to scale, with integrated security, Dell’s OpenManage strategy is to give customers a choice by using orchestration, automation, and integration, leveraging APIs with open standards.
#1 – Server health monitoring—This is server management 101. However, given the fact that PowerEdge servers are the foundation of the modern data center, this basic element is critical to application and services uptime. OpenManage solutions have many ways to get this information from the agent-free iDRAC directly (GUI/SNMP/SMTP/syslog/API and more) or through the Dell OpenManage Enterprise console, OpenManage mobile, Dell CloudIQ, VMware vCenter integration, Microsoft System Center, and leading third-party management software such as Nagios.
#2 – Remote access to servers—If deep one-to-one control for troubleshooting, deployment, configuration, console access, and so on is needed, then iDRAC is the answer. Dell's unique iDRAC9 offers out-of-band remote server connection, including firmware configuration, full server console remote control through eHTML5 (sometimes called vKMV) GUI, virtual media, and server telemetry. iDRAC agentless architecture offers server monitoring and control from anywhere without the need to install any software. There are many additional features, from basic power on/off control offered through the GUI, CLI, or API to advanced server profile configuration to ensure that servers have the correct firmware configuration settings.
#3 – Server deployment—The time between when a server is racked and powered until it is live (time to value) can be greatly reduced by leveraging the automation integrated into OpenManage. Starting with streamlining one-to-one deployments, the iDRAC features a lifecycle controller that rapidly configures elements such as RAID storage configurations and populate deployments with up-to-date operating system drivers. In addition, iDRAC also features a zero-touch deployment to automatically download a server configuration profile (SCP) and even complete an unattended operating system installation the first time the server powers up on a customer’s network. Beyond one-to-one solutions, OpenManage offers a broad number of deployment solutions, including: OpenManage Enterprise, offering firmware setting configuration and supporting agnostic operating system installation through ISO images; Microsoft System Center integration; and deeper customizable VMware installations through OpenManage Enterprise for VMware vCenter. Finally, for customers using tools such as Ansible, Terraform, or Prometheus, OpenManage supplies integration packs and sample code leveraging Dell's APIs.
#4 – Manage and update firmware—There are multiple methods to update PowerEdge server firmware, depending on needs. Methods range from one-to-one, using iDRAC/Lifecycle Controller, to console-based methods for updating multiple servers. Leveraging large-scale automation, these tools can audit existing servers, compare online catalogs, then download and apply the correct updates quickly and consistently with massive time savings compared to manual methods. One example is the integration into VMware using OpenManage Enterprise for VMware vCenter, which offers cluster-aware updates, updating one cluster node at a time using DSR to keep workloads up and running. Dell supplies Repository Manager to build custom firmware catalogs like the packaged interpretable ISOs that are used by other Dell updating tools where servers are isolated or air gapped. And, of course, Dell supplies an Ansible module offering firmware updates to the DevOps user base.
#5 – Configuration drift detection—OpenManage Enterprise provides compliance features that detect, highlight, and remediate configuration drift issues, with simple processes for both firmware versions and firmware configuration settings.
#6 – Secure supply chain assurance—Using Dell’s Secure Component Verification (SCV) allows organizations to ensure that their new servers are delivered with the same components installed at Dell Technologies’ manufacturing facility, using a digital, cryptographically secured signed inventory certificate.
#7 – Power usage reporting (and carbon emissions calculations)—There are multiple ways to view server power consumption data, depending on needs and preferences. One way is to open the iDRAC web GUI, while another way is to use scripts, either Racadm or Redfish, to retrieve the data. iDRAC can also send data to the OpenManage Enterprise Power Manager plug-in, where power data, including carbon emissions, is processed and grouped, and can be displayed, reported, and actioned. OpenManage Enterprise can also forward this information to CloudIQ for PowerEdge for additional analysis and visualization. For those customers looking for maximum data, iDRAC9 can stream these power statistics as telemetry data to analytics solutions such as Splunk or ELK Stack for real-time in-depth analysis.
#8 – Power usage control—Power consumption capping ability is integrated into iDRAC. OpenManage Enterprise Power Manager adds the capability to apply power caps to individual servers or groups of servers. This power capping can be permanent, scheduled at particular times for specific weekends, or ad hoc in response to an incident when reduction in power consumption is required, such as when running on UPS or on-premises generators.
#9 – Thermal event management—While thermal monitoring alerting and even shutdown is integrated into PowerEdge servers through the iDRAC, OpenManage Enterprise Power Manager augments this through powerful Emergency Power Reduction (EPR) policies. This feature reduces the power consumption of servers through a power cap policy to throttle a group of servers. EPR policies can be used as a permanent or scheduled method to limit server power consumption or as an immediate temporary measure during a thermal emergency, for example, CRAC unit failure.
#10 – Performance monitoring—From the iDRAC GUI, CLI, and API, server performance telemetry data can be obtained. OpenManage Enterprise Power Manager can consume and report this data, automatically highlighting idle servers. Telemetry information can be passed to third-party solutions such as Splunk. Finally, CloudIQ can analyze information and present the information in a dashboard format with graphical visualization, and, for key metrics, highlight anomalies based on historic seasonality data.
#11 – Enterprise secure key management—iDRAC provides a standards-based Key Management Interoperability Protocol (KMIP) to encrypt data at rest on self-encrypting SSDs or self-encrypting hard drives and pass the key to a key management system. Solutions such as Thales CipherTrust Manager offer centralized key management for multiple PowerEdge servers and many other products.
#12 – Detailed server telemetry—iDRAC9 provides more than 180 data metrics that can integrate advanced server hardware operation telemetry. Many of these can be reported and visualized in CloudIQ or streamed to analytics solutions such as Splunk. This server telemetry data allows customers to access detailed information to avoid failure events, optimize server operation, and enhance cyber resiliency.
#13 – Automatic call and ticket creation—This ranges from the Dell services plug-in for OpenManage Enterprise, which offers the creation of a support case directly with Dell without any human intervention, to integration with ServiceNow by Dell’s integration pack. Alternatively, OpenManage Enterprise offers a flexible set of actions, including running scripts, SNMP forwarding Syslog event, and emailing based on the monitoring of SNMP events. This automation can be used to pass information to a third-party solution for incident management.
#14 – Capacity planning—The iDRAC provides a large amount of performance statistics. This data can be collected and analyzed by the Dell CloudIQ IOPS solution to produce a forward-looking capacity analysis on items such as CPU usage based on real historical data values for a given server and workload.
#15 – Cloud-based infrastructure management—Dell's AIOp’s CloudIQ can not only consolidate multiple instances of OpenManage Enterprise, but it can also integrate Dell storage, server, data protection, networking, HCI, and CI products. Hosted in Dell’s secure data center, CloudIQ combines proactive monitoring, machine learning, and predictive analytics to reduce risk, plan ahead, and improve productivity from core to edge.
#16 – Cybersecurity from concept to retirement—Dell Cyber Resilient Architecture 2.0 includes features such as iDRAC silicon-based root of trust, dynamic USB port management, UEFI Secure Boot, and signed firmware updates. All these features are controlled by OpenManage tools that let customers protect, detect, and recover in response to security threats.
We hope that this list has given you a few suggestions on how the OpenManage portfolio can help your organization. Servers are a vital element of organizations’ infrastructure and the foundation of modern business, and it’s critical to manage and monitor them to deliver visibility, productivity, and control. Server management tools not only make tasks easy, faster, and consistent but also decrease failures with increased efficiency. Remember, don't just manage, automate.
Is your organization using all the features that Dell OpenManage offers and getting the maximum benefits from investing in PowerEdge servers? Ask your account manager for more details.
References
#2 Support for Integrated Dell Remote Access Controller 9 (iDRAC9)
#3 How to create and deploy a Server Template in OpenManage Enterprise (video)
#4 Updating Firmware and Drivers on Dell PowerEdge Servers
#5 Improve Operational Efficiency Through OME Server Drift Management
#6 Dell Technologies Secured Component Verification for PowerEdge
#7 #8, #9 Server Power Consumption Reporting and Management
#10 CloudIQ Provides Data Driven Server Management Decisions
#11 OpenManage Secure Enterprise Key Manager Solutions Brief
#12 Transform Datacenter Analytics with iDRAC9 Telemetry Streaming
#13 Support for OpenManage Integration with ServiceNow
#14 Talking CloudIQ: Capacity Monitoring and Planning
#15 CloudIQ: AIOps for Intelligent IT Infrastructure Insights
#16 Cyber Resilient Security in Dell PowerEdge Servers
Additional resources
- Dell server management portfolio: OpenManage microsite
- API catalog (interactive support resource): Dell Technologies Developer
- Ansible Python PowerShell module library and code examples: Dell Technologies GitHub
- Dell systems management offerings: Dell Systems Management Overview Guide
Explore Real-World Cases with the Dell SRM Interactive Demo
Thu, 17 Nov 2022 15:04:10 -0000
|Read Time: 0 minutes
Summary
At Dell Technologies, we are proud to announce a new interactive demo for Storage Resource Manager (SRM), located here:
This interactive demo is based on the SRM release 4.7.0.0, which introduces several new features, enhancements, and platform supports.
Interactive Demo Info
The landing page of the interactive demo provides a summary of the use cases and features covered. This demo has the same look and feel as the actual HTML-based SRM user interface, where you can scroll up and down the page and click on each page object.
Dell SRM provides insight into data center operations from application to storage. Through automated discovery and reporting, Dell SRM breaks down the silos. Its simple use-case driven user interface simplifies tasks such as:
- Capacity Planning
- Performance Analysis
- Configuration Compliance
- Chargeback
- Workload Analysis
There are eight independent interactive demo modules available, each of which covers a main SRM use case or feature:
- Enterprise Capacity Dashboard
- Capacity Planning What-If Scenario
- Performance Analysis - Host to LUN Troubleshooting
- Topology and End-To-End Relationships
- Chargeback Report by VirtualMachine
- Configuration Compliance Policies
- Configuration Compliance What-if Analysis
- Custom Report Wizard
Sample Screens from Interactive Demos
Here is a peek inside each of the eight demo modules:
1. Enterprise Capacity Dashboard
2. Capacity Planning What-If Scenario
3. Performance Analysis - Host to LUN Troubleshooting
4. Topology and End-To-End Relationships
5. Chargeback Report by VirtualMachine
6. Configuration Compliance Policies
7. Configuration Compliance What-if Analysis
8. Custom Report Wizard
Supported Platforms
The data that is available in this comprehensive eight module demo is from the following supported vendors and technologies:
|
|
Enjoy this demo and let us know how you like it!
Resources
Author: Dejan Stojanovic