Deduplication (dedupe) is a capacity-savings method that identifies identical copies of data and stores a single instance. There are a few facets of deduplication that are needed for it to provide efficient capacity savings.
- Hash ID: The Hash ID is a unique identifier for incoming data that is used to determine if a dedupe relationship is needed. The system uses a SHA-256 algorithm to generate the Hash ID.
- Hash ID Table: Hash Tables are an allocation of system memory distributed between the system directors. These tables are a catalog of the Hash IDs used by the dedupe process to determine if a dedupe relationship is needed or if the data can be stored on disk.
- Dedupe Management Object (DMO): The DMO manages the pointers for deduped data between front-end devices and the data stored on disk. This also manages what Hash Table the Hash IDs are stored in when dedupe relationships exist.