Home > Storage > PowerMax and VMAX > Storage Admin > Dell PowerMax 2500 and 8500: Data Reduction > Deduplication algorithm
PowerMax systems use the SHA-256 hashing algorithm implemented in the data reduction hardware to find duplicate data. Then, the data is stored as a single instance for multiple sources to share. This process provides enhanced data efficiency while maintaining data integrity.
The SHA-256 algorithm generates a 32-byte code for each 32 KB block of data. Consider a system with 1 PB of written data with 5 percent updated per day. In one million years of operation, there is a 20 percent likelihood of a hash collision. Because each 128 KB track is handled as four blocks of 32 KB, a hash collision would have to occur on all four blocks in the same 128 KB track to have an actual hash collision. The odds of having all four blocks collide makes this scenario only theoretical (less than a 1 percent chance in a trillion years of operation). Also, when a match is found during the compare phase of deduplication, a byte-for-byte comparison is performed. This comparison is done to confirm that there is a match before the tables are updated and the pointers are set to allow access to the data.