Home > Storage > PowerScale (Isilon) > Product Documentation > Data Efficiency > Next-Generation Storage Efficiency with Dell PowerScale SmartDedupe > SmartDedupe use cases
As previously noted, an enterprise’s data typically contains substantial quantities of redundant information. Home directories, file shares, and data archives are examples of workloads that consistently yield solid deduplication results. Each time multiple employees save a spreadsheet, document, or email attachment, the same file is stored in full multiple times, taking up valuable disk capacity. SmartDedupe is typically used in the following ways:
Example A: File shares and home directory deduplication
By architecting and configuring home directory and file share repositories under unifying top-level directories (for example, /ifs/home and /ifs/data, respectively), an organization can easily and efficiently configure and run deduplication against these datasets.
Performance-wise, home directories and file shares are typically mid-tier workloads, usually involving concurrent access with a reasonable balance of read and write and data and metadata operations. As such, they make great candidates for SmartDedupe.
SmartDedupe should ideally be run during periods of low cluster load and client activity (nights and weekends, for example). Once the initial job has completed, the deduplication job can be scheduled to run every two weeks or so, depending on the data’s rate of change.
Example B: Storage-efficient archiving
SmartDedupe is an ideal solution for large, infrequently accessed content repositories. Examples of these include digital asset management workloads, seismic data archives for energy exploration, document management repositories for legal discovery, compliance archives for financial or medical records, and so on.
These are all excellent use cases for deduplication because the performance requirements are typically low and biased towards metadata operations, and there are typically numerous duplications of data. As such, trading system resources for data efficiency produces significant, tangible benefits to the bottom line. SmartDedupe is also ideal for SmartLock-protected immutable archives and other WORM datasets, typically delivering attractive levels of storage efficiency.
For optimal results, where possible, ensure that archive data is configured with the same level of protection. For data archives that are frequently scanned or indexed, metadata read acceleration is the recommended metadata SSD strategy.
Example C: Disaster recovery target cluster deduplication
For performance-oriented environments that would prefer not to run deduplication against their primary dataset, the typical approach is to deduplicate the read-only data replica on their target, or disaster recovery (DR), cluster.
Once the initial deduplication job has successfully completed, subsequent incremental deduplication jobs can be scheduled to run soon after completion of each SyncIQ replication job, or as best fits the rate of data change and frequency of cluster replication.