Deduplication efficiency

Thank you for your feedback!

Deduplication can significantly increase the storage efficiency of data. However, the actual space savings will vary depending on the specific attributes of the data itself. As noted previously, the deduplication assessment job can be run to help predict the likely space savings that deduplication would provide on a given dataset.
Virtual machines files often contain duplicate data, much of which is rarely modified. Deduplicating similar operating system type virtual machine images (for example VMware VMDK files, that have been block-aligned) can significantly decrease the amount of storage space consumed. However, as noted previously, the potential for performance degradation as a result of block sharing and fragmentation should be carefully considered first.
SmartDedupe does not deduplicate across files that have different protection settings. For example, if two files share blocks, but file1 is parity-protected at +2:1, and file2 has its protection set at +3, SmartDedupe will not attempt to deduplicate them. This ensures that all files and their constituent blocks are protected as configured. Additionally, SmartDedupe does not deduplicate files that are stored on different SmartPools storage tiers or node-pools. For example, if file1 and file2 are stored on tier 1 and tier 2 respectively, and tier1 and tier2 are both protected at 2:1, OneFS will not deduplicate them. This helps guard against performance asynchronicity, where some of a file’s blocks could live on a different tier, or class of storage, from the others.
The following table shows some examples of typical space reclamation levels that have been achieved with SmartDedupe.
Note: These deduplication space savings values are provided solely as rough guidance. Because no two datasets are alike (unless they are replicated), actual results can vary considerably from these examples.
Table 2. Typical workload space savings with SmartDedupe

Workflow/data type

Typical space savings

Virtual Machine Data

35%

Home Directories / File Shares

25%

Email Archive

20%

Engineering Source Code

15%

Media Files

10%

Workflow/data type	Typical space savings
Virtual Machine Data	35%
Home Directories / File Shares	25%
Email Archive	20%
Engineering Source Code	15%
Media Files	10%

Your Browser is Out of Date

Deduplication efficiency

Deduplication efficiency