As discussed earlier, deduplication is not free. There is always trade-off between cluster resource consumption (CPU, memory, disk), the potential for data fragmentation and the benefit of increased space efficiency.
- Because deduplication trades cluster performance for storage capacity savings, SmartDedupe is not ideally suited for heavily trafficked data, or high-performance workloads.
- Depending on an application’s I/O profile and the effect of deduplication on the data layout, read and write performance and overall space savings can vary considerably.
- SmartDedupe does not permit block sharing across different hardware types or node pools to reduce the risk of performance asymmetry.
- SmartDedupe does not share blocks across files with different protection policies applied.
- OneFS metadata, including the deduplication index, is not deduplicated.
- Deduplication is a long-running process that involves multiple job phases that are run iteratively.
- SmartDedupe does not attempt to deduplicate files smaller than 32 KB.
- Dedupe job performance typically improves significantly on the second and subsequent job runs, once the initial index and the bulk of the shadow stores have already been created.
- SmartDedupe does not deduplicate the data stored in a snapshot. However, snapshots of deduplicated data can be created.
- If deduplication is enabled on a cluster that already has a significant amount of data stored in snapshots, deduplication will take time to affect the snapshot data. Newly created snapshots will contain deduplicated data, but older snapshots will not.
- SmartDedupe deduplicates common blocks within the same file, resulting in even better data efficiency.
- In general, additional capacity savings may not warrant the overhead of running SmartDedupe on node pools with inline deduplication enabled.
- Deduplication of data contained within a writable snapshot is not supported in OneFS 9.3 or later.