OpenManage Enterprise - Customer Success Stories
Wed, 24 Apr 2024 15:43:00 -0000
|Read Time: 0 minutes
We are often asked what the best tool is for managing Dell PowerEdge servers. In this blog, discover how both our in-house Dell IT team and Cambridge University, a long-term customer, use our server management solutions to manage thousands of PowerEdge servers, ultimately avoiding outages, boosting overall server productivity, reducing maintenance windows, and delivering increased operational efficiency.
How Dell IT excels in server management using Dell OpenManage
Dell’s in-house IT team manages over 18,000 PowerEdge servers. The fleet of servers range from brand new to five years old, resulting in a mix of server models and generations. These servers are located across eight major data centers globally. Workloads include Dell.com and back-office systems such as Dell’s order management system. In fact, Dell runs over 600 business applications. Many of these are mission critical, and an outage can have a major impact on customers, sales, and support, down to stopping even the production line.
Server hardware management is done via OpenManage Enterprise (OME), encompassing alerting, monitoring, firmware updating, and configuration deployment and management, as well as power consumption monitoring. Each data center has a dedicated OpenManage Enterprise instance responsible for approximately 2,500 servers.
Monitoring of server health events is covered by OME and integration with Service Now, which automatically creates trouble tickets and routes them to the appropriate team for remediation. Power usage data is collected and monitored, then used to optimize power load per rack cabinet and flag underutilized servers showing lower than expected power draw.
To aid automation and rapid distribution firmware, updates are collected, tested, and released via a customized catalogue. These custom catalogues are assembled and tested by the Dell IT server management team and are consumed by OME to orchestrate server updates. Urgent updates to resolve security CVEs can be pushed out at will by OME following a change management approval. The largest patch job completed by the team so far was an iDRAC firmware update task for 14,500 servers in one change request, demonstrating how scalable OME automation is.
Security is built into Dell’s processes and tools. Microsoft Active Directory integration enables the OME audit log to record who did what and when, recording the AD user account name. The team also use OME configuration drift detection reporting, which audits a server’s current configs against the desired state, highlighting non-conforming servers that OME can then resolve by re-applying a server template.
With Dell IT using OME at major scale in their complex production environment, any customer can be confident OME will perform at scale. As Dell IT says, “If you have Dell PowerEdge servers, you really need to be running OpenManage Enterprise.”
University of Cambridge server management at scale
With an estate of 3,500 Dell servers plus other devices in one data center, the team at Cambridge University needs efficient and scalable server management. The HPCC server group uses integrated Dell Remote Access Controller (iDRAC) embedded in every server and OME to maximize the day-to-day efficiency of admin tasks such as health monitoring, firmware updates, and configuration.
Config management and drift detection are achieved via OME’s configuration compliancy features. Each cluster has a collection of firmware configuration settings. These templates are set and monitored centrally via OME with alerting set for non-compliant hosts. Firmware updates are also streamlined using OME and customized in-house firmware repositories built with OME update manager. Updates are scheduled and then left to run automatically against multiple servers, freeing administrators to focus on more novel tasks. Finally, server health monitoring is real-time. Any alerts are sent from iDRAC to OME with the status notified and logged by OME. Using the Dell TechDirect service portal, the team is able log fault calls and request any required parts from Dell.
Operational highlights include:
- Reduction in time to resolution of faults
- Quicker and easier implementation of firmware updates
- Set BIOS settings configuration across an entire cluster in one easy automated job
Beyond the Dell OpenManage tools, Cambridge uses the iDRAC server telemetry feature to stream power and thermal data to Graphite and Grafana. These Dell metrics, along with values from other data center infrastructure, are aggregated and visualized for analysis of trends, ensuring the clusters are powered and cooled effectively.
Join the ranks of satisfied customers who have optimized their server management operations and enjoy the peace of mind brought about by Dell OpenManage.
Resources
- Podcast: How Would You Go About Orchestrating a Fleet of More Than 18,000 Servers?
- Dell System Management Info Hub
- OpenManage Enterprise Support
Authors:
Mark Maclean, PowerEdge & OpenManage Technical Marketing Engineering
Steve Daborn, Senior Global Product Marketing Manager
Linkedin : uk.linkedin.com/in/markmacleandell | linkedin.com/in/stephendaborn