OneFS inline data reduction combines both real-time compression and deduplication. Compression uses a lossless algorithm to reduce the physical size of data when it is written to disk and decompresses the data when it is read back. More specifically, lossless compression reduces the number of bits in each file by identifying and reducing or eliminating statistical redundancy. No information is lost in lossless compression, and a file can easily be decompressed to its original form.
Deduplication differs from data compression in that it eliminates duplicate copies of repeating data. While compression algorithms identify redundant data inside individual files and encode the redundant data more efficiently, deduplication inspects data and identifies sections or even entire files that are identical. Then, it replaces them with a shared copy.
Both compression and deduplication are transparent to all applications that sit on top of the file system including protocol-based services like NFS, SMB, HDFS, or S3. The primary purpose of OneFS inline data reduction is to reduce the storage requirements for data. This ability results in a smaller storage footprint, reduced power and cooling requirements, and a reduction in the overall per-TB storage cost. Also, inline data reduction helps to shrink the total amount of physical data written to storage devices. This feature is particularly beneficial for solid state drives (SSDs) and other media with finite overwrite limits, by significantly reducing flash-drive wear rates.