The PowerMax system further improves data reduction by introducing deduplication (dedupe). Dedupe improves storage utilization without compromising I/O performance. It works by generating unique hash IDs at the time of data ingress and comparing those IDs with existing hash IDs before storing the data on the disk. Dedupe is accomplished through a series of features including:
Host writes for any compression-enabled SGs go to persistent cache and are acknowledged immediately. Before the data is destaged to the physical media, the compression module generates the hash ID. This hash ID is checked for an existing entry in the hash table. If the match is found, only the pointers are updated to reference existing data. If the matching hash ID is not found, data is written and the thin device’s pointers are updated.
The following figure shows the dedupe workflow for all writes for a compression-enabled SG:
Figure 5. PowerMax dedupe workflow