Reliability in Dell Technologies PowerEdge Servers
Download PDFThu, 25 Apr 2024 18:31:15 -0000
|Read Time: 0 minutes
Introduction
Reliability is defined as the characteristic of a product or system that assures the performance of its intended function over time and assures operation in a defined environment without failure. Reliability is designed into PowerEdge servers, and it is constantly evaluated and improved throughout the product lifecycle. Full in-house test and analysis capabilities allow Dell Technologies to develop and implement robust product qualification and release procedures.
Dell Technologies Design Guidelines
Dell Technologies server design-to-criteria includes:
- Servers to operate continuously at 40C degrees/80% relative humidity, and allow for short term excursions to 45 degrees C and 90% relative humidity
note: 40C/85%RH capability is configuration specific, but the vast majority of PowerEdge server configurations allow for these conditions
- Additional design life margin, and accommodation for the potential of lifetime limited warranty
- Potential deployment in uncontrolled environments – locations with polluted air and dust
- Customer special requests – for example, higher shock and vibration tolerance
Dell Technologies Design for Reliability Process
The Dell Technologies Reliability Engineering team is part of the Server Product Development team and has developed a full suite of procedures. Many are based on industry standards which define DfR: Subsystem Qualification, Ongoing Reliability Testing, Validation, Shock and Vibration, and associated Failure Analysis requirements. This suite must be met and fulfilled before any product is released.
Dell Technologies uses internally developed web-based design for reliability (DfR) tools for systems development. In addition to using these tools at Dell Technologies, we require that our supply base use these tools in their product development processes to ensure our suppliers also design in reliability.Design for Reliability Starts at the Component Level
Dell Technologies reliability begins with choosing and approving component suppliers. Dell Technologies specifies JEDEC qualified components from all suppliers (JEDEC is a global industry group that creates standards for broad range of technologies). To ensure enterprise-class reliability, Dell Technologies may require qualification testing beyond the standard JEDEC suite depending on the nature of the component – new, unique, different, and difficult or NUDD. Dell Technologies has specific qualification requirements for NUDDs.
Subsystem Level Comes Next
Dell Technologies defines qualification protocol for all subsystems (HDD, SSD, PSU, fans, memory, PCIe cards, PERC, and daughter cards) and ensures that the supply base executes to Dell Technologies requirements. Dell Technologies does this by:
- Defining test requirements, sample sizes, ramp rates, durations, and accept/reject criteria
- Working closely with Suppliers during their product development process
- Reviewing and approving results, and addressing qualification fails, if any
- Auditing product by conducting our own in house testing as appropriate
- Auditing supplier Quality and Assembly/Test processes
- Requiring ongoing reliability testing (ORT) on all subsystems throughout their shipping life
The System is the Third Level of Reliability
Dell Technologies does extensive testing and analysis of all systems during development and prior to release:
- Dell Technologies has developed and refined a suite of multiple environment over-stress validation tests that it executes on every system during its development and prior to release
- Dell Technologies has a separate suite of shock and vibration tests, many of which are industry-standards-based, that we execute on every system prior to release
- Dell Technologies has full internal capability to analyze test fails in our own in-house Failure Analysis Labs
Dell Technologies Reliability is designed in and closes the loop: from the component level to subsystem level to system level. Our product qualification and release systems ensure that design criteria, including deployment life, additional deployment life margin, and accommodation for potential lifetime limited warranty, are met before product is launched. This qualification and release system is based on industry standards and on our own rigorous methods which have been developed and refined over multiple generations of PowerEdge products. This includes Ongoing Reliability Testing (ORT) on components and subsystems which is required to be implemented throughout the shipping life of PowerEdge servers.
Dell Technologies’ focus is on Design for Reliability - using a full suite of internally developed web-based tools, HW Validation Tests, and Shock and Vibration tests. Full in-house capabilities allow Dell Technologies to conduct all phases of product qualification and release in house, including multiple environment overstress tests, shock and vibration tests, and failure analysis.
Dell Technologies also conducts research on long term reliability of our products in expanded operating environments. This research, and associated multimillion-dollar investments in applied research facilities, allow Dell Technologies to continue to improve reliability on PowerEdge products.