Home > Storage > PowerScale (Isilon) > Product Documentation > Data Efficiency > Next-Generation Storage Efficiency with Dell PowerScale SmartDedupe > Deduplication efficiency
Deduplication can significantly increase the storage efficiency of data. However, the actual space savings will vary depending on the specific attributes of the data itself. As noted previously, the deduplication assessment job can be run to help predict the likely space savings that deduplication would provide on a given dataset.
Virtual machines files often contain duplicate data, much of which is rarely modified. Deduplicating similar operating system type virtual machine images (for example VMware VMDK files, that have been block-aligned) can significantly decrease the amount of storage space consumed. However, as noted previously, the potential for performance degradation as a result of block sharing and fragmentation should be carefully considered first.
SmartDedupe does not deduplicate across files that have different protection settings. For example, if two files share blocks, but file1 is parity-protected at +2:1, and file2 has its protection set at +3, SmartDedupe will not attempt to deduplicate them. This ensures that all files and their constituent blocks are protected as configured. Additionally, SmartDedupe does not deduplicate files that are stored on different SmartPools storage tiers or node-pools. For example, if file1 and file2 are stored on tier 1 and tier 2 respectively, and tier1 and tier2 are both protected at 2:1, OneFS will not deduplicate them. This helps guard against performance asynchronicity, where some of a file’s blocks could live on a different tier, or class of storage, from the others.
The following table shows some examples of typical space reclamation levels that have been achieved with SmartDedupe.
Note: These deduplication space savings values are provided solely as rough guidance. Because no two datasets are alike (unless they are replicated), actual results can vary considerably from these examples.
Workflow/data type | Typical space savings |
Virtual Machine Data | 35% |
Home Directories / File Shares | 25% |
Email Archive | 20% |
Engineering Source Code | 15% |
Media Files | 10% |