Home > Storage > PowerStore > Data Protection > Dell PowerStore: Clustering and High Availability > Dynamic Resiliency Engine (DRE)
Enterprise-class storage systems require high levels of reliability and protection from data loss and latent drive failures. Traditional data protection schemes are based on RAID groups of a fixed layout that protect a volume’s data. The bandwidth and rebuild speed in this traditional design are limited by the number of drives participating in the group. Also, the speed of the rebuild is limited by the number of tolerated drive failures of that RAID level. For example, a 6+2 could achieve the read speed of six drives but only sustain two drives worth of rebuild speed in the case of a dual-drive failure.
The reliability of the data being protected depends heavily on the bit error rate (BER) of the drive, the amount of data that must be rebuilt, and the number of drives. As capacity and drive counts increase, it becomes more difficult to maintain reliability when traditional RAID protection schemes are used.
Furthermore, economics and reliability are key buying decisions. The cost of a storage solution is driven by the cost of the drives and ultimately, it is price/performance and effective capacity ($/IOP and $/Effective Capacity). As storage and drive capacity grows, the following occurs:
The PowerStore Dynamic Resiliency Engine (DRE) is a 100% software-based approach to redundancy that is more distributed, automated, and efficient than traditional RAID. It meets RAID 6 and/or RAID 5 parity requirements with superior resiliency and at a lower cost. PowerStore implements proprietary algorithms where every drive is partitioned into multiple virtual chunks and redundancy extents are created by using the chunks across several drives.
It automatically consumes the drives within an appliance and creates appropriate redundancy using all the drives. This process improves overall performance and allows performance to scale as more drives are added to the appliance. Data written to a volume can be spread across any number of drives within an appliance. As new drives are added, the data is automatically rebalanced.
In PowerStoreOS 4.0, a change to DRE allows the system to free previously reserved space which helps increase the usable capacity of the system. Once an appliance is upgraded from a previous release to 4.0 or later, the space will automatically be available as usable capacity within the system. The amount of space being freed directly depends on the appliance model and drive configuration.
Unlike traditional RAID protection strategies, PowerStore does not require dedicated spare drives. Spare space is distributed across the entire appliance, and a small chunk of space is reserved from each drive used for sparing if a drive fails. A single drive’s worth of spare space is reserved for every resiliency set in an appliance. Resiliency sets are explained later in this section.
When a drive fails, only the portion of the drive which has data written will be rebuilt. By doing so, the spare capacity is efficiently managed by consuming only the required space. This feature also shortens rebuild time because only data that has been written to the drive must be rebuilt.
PowerStore implements resiliency sets to improve reliability while minimizing spare overhead. Having multiple failure domains (resiliency sets) increases the reliability of the system since it allows the appliance to tolerate a drive failure in each of the sets if the failure occurs simultaneously.
The appliance can tolerate multiple drive failures even within the same resiliency set, if the failure occurs at different instances (a second drive fails after the rebuild on first drive failed drive is complete). Starting with PowerStoreOS 2.0, during the initial configuration of an appliance, you can select a single drive failure tolerance for a 25-drive resiliency set or double drive failure tolerance for a 50-drive resiliency set. If deploying a multi-appliance cluster, you can mix different drive failure tolerance on the appliances. As shown in the following figure, Appliance A could be set to single-drive failure tolerance while Appliance B is set to double-drive failure tolerance.
Note: that the 3200Q PowerStore model requires double-drive failure tolerance.
As capacity and drives are added to the system over time, resiliency sets dynamically increase. For example, you might have an appliance that has a single-drive-fault-tolerance set. This means that the system is configured with a 25-drive resiliency set. If you add a 26th drive to the system, the resiliency set dynamically splits into two sets. Furthermore, resiliency sets can span across physical enclosures based on the number of drives in the appliance and have mixed drive sizes.
Resiliency sets offer the following key benefits:
DRE dynamically transfers unused user capacity to replenish spare capacity if there is sufficient unused capacity available on the appliance.
PowerStore uses machine-learning algorithm and automatically adjusts the rebuild rate to prioritize host IO when there is a drive failure to optimize performance, while maintaining reliability.
An appliance can have a minimum of six drives and can scale up to 96 drives. The PowerStore 3200Q (PowerStoreOS 4.0 and higher) is an exception to this rule and requires a minimum of 11 QLC drives to be installed in the system. Capacity can be added non-disruptively, giving customers the flexibility to expand their storage by adding one or more drives based on their need.
PowerStore implements proprietary algorithms to manage drives with different sizes by optimizing the distribution of redundancy extents across multiple drives.