The Collect job is responsible for locating unused inodes and data blocks across the file system. Collect runs by default after a cluster group change, in conjunction with AutoBalance, as part of the MultiScan job.
In its first phase, Collect performs a marking job, scanning all the inodes (LINs) and identifying their associated blocks. Collect marks all the blocks that are currently allocated and in use, and any unmarked blocks are identified as candidates to be freed for reuse, so that the disk space they occupy can be reclaimed and reallocated. All metadata must be completely read and marked in this phase to avoid freeing up, or sweeping, in-use blocks and introducing allocation corruption.
Collect’s second phase scans all the cluster’s drives and performs the sweeping of any unmarked blocks so that they can be reused.