Home > Data Protection > PowerProtect DD Series Appliances > Dell PowerProtect Data Domain SISL Scaling Architecture > Unique identification
SISL includes a series of techniques performed inline in RAM. They are executed prior to data storage to disk, to quickly filter both new unique segments and redundant duplicate segments: the summary vector and segment localities. The summary vector is an in-memory data structure used by DD OS to quickly identify new and unique segments. Identifying new segments saves the system from doing a lookup in the on-disk index only to find the segment is not there. Based on a Bloom filter, the summary vector is a bit array in RAM initially set to all zeros. When a new segment is stored, a few bit locations in the array are set to 1. The locations are chosen based on the fingerprint of the segment. When a subsequent segment arrives, its chosen locations are checked. If any locations are 0, the system knows conclusively that the segment has not previously been stored, and it can stop looking.
The summary vector is not, by itself, sufficient for declaring a segment redundant. A small fraction of the time, typically less than 1 percent, all the chosen locations have been set to 1 by different segments even though the new segment is unique. When this happens, the system needs to rely on other mechanisms to conclude recognition.
As shown in Figure 3, the summary vector can identify most new segments without looking up the segment in the on-disk fingerprint index. Initially all bits in the array are 0. On insertion, shown in (a), bits specified by several hashes, h1, h2, and h3 of the fingerprint of the segment, are set to 1. On lookup, shown in (b), the bits specified by the same hashes are checked. If any are 0, as shown in this case, the segment cannot be in the system.