The PowerMax storage system provides data reduction without compromising performance. PowerMax data reduction is a combination of the following components:
- Hardware acceleration—Each PowerMax node is configured with a hardware compression module that handles data compression and decompression. These hardware modules are also capable of generating Hash IDs that enable deduplication.
- Activity-Based Reduction (ABR)—The latest data is typically the most active, resulting in an access skew. ABR relies on an access skew to prevent constant compression and decompression of data extents that are frequently accessed. The ABR function uses machine learning (ML) algorithms to mark the busiest 20 percent of all allocated data extents in the system to skip the data reduction workflow. Highly active data extents remain uncompressed, regardless of whether the storage group has data reduction enabled. As the data extents become less active, they are automatically compressed. If enough free storage capacity is available, newly active extents become part of the hottest 20 percent and remain uncompressed.
- Deduplication—Deduplication is a capacity-savings method that identifies identical copies of data and stores a single instance of each copy. During the data reduction and inline compression process, the hardware acceleration modules generate Hash IDs using the SHA-256 algorithm to 32 KB data blocks created by fine-grain data packing. These Hash IDs are then used in the deduplication process to help identify identical data. The impact of ABR on the deduplication workflow is similar to that of compression.
- Fine-grain data packing—When PowerMax compresses data, each 128 KB track is split into four 32 KB buffers. All buffers are compressed in parallel, and the total of the four buffers results in the final compressed size of the data. This process includes a zero reclaim function that prevents the allocation of buffers with all zeros and no data. For small size write or read operations, only the necessary buffers participate.
- Compaction—Compaction is a process that performs data placement, intuitively placing reduced or unreduced data on disk in the best location available. The operation of storing data on disk uses write objects. Each object is 6 MB of contiguous back-end data device capacity across the drives configured in the system. Write objects are aligned on 1 KB boundaries and are consumed sequentially in a single use. Write objects are spread across full stripes for all supported RAID types to optimize writes. Each object supports reduced or unreduced data.
- Extended Data Compression (EDC)—If data that is already compressed is enabled for compression and has not been accessed for over 30 days, it is automatically compressed again. This additional compression further improves storage efficiency. In addition:
- Compression is enabled or disabled at an SG level, allowing ease of management. Most databases can benefit from storage compression. Customers might decide not to enable compression if the database is fully encrypted, or if an SG contains data that is continuously overwritten (such as database transaction logs).
- When compression is enabled, all new writes benefit from inline compression based on ML algorithms. If the SG already contains data when compression is enabled, it goes through background compression with a low priority relative to application I/Os.
- When new SGs are created in Unisphere, the data reduction option is enabled by default for these SGs and can be easily disabled.
For more information about PowerMax data reduction, see the Dell PowerMax: Data Reduction White Paper.