Introduction

Thank you for your feedback!

Data reduction comes in various forms and includes techniques such as compression and deduplication. These techniques can be applied to data at various points in their storage life cycle, such as in real time as files are written from a client to storage, or after data has already been stored on disk.
Compression typically uses a lossless algorithm both to reduce the physical size of data when it is written to disk and to decompress the data when it is read back, while retaining full fidelity. More specifically, lossless compression reduces the number of bits in each file by identifying and reducing or eliminating statistical redundancy. No information is lost in lossless compression, and a file can easily be decompressed to its original form.
Deduplication differs from data compression in that it eliminates duplicate copies of repeating data. Whereas compression algorithms identify redundant data inside individual files and encode the redundant data more efficiently, deduplication inspects data and identifies sections, or even entire files that are identical and replaces them with a shared copy.
Both compression and deduplication are transparent to all applications that sit on top of the file system, including protocol-based services such as NFS, SMB, HDFS, and S3. The primary purpose of OneFS inline data reduction is to reduce the storage requirements for data, resulting in a smaller storage footprint, reduced power and cooling requirements, and a reduction in the overall per-TB storage cost. Real time, inline data reduction also helps to shrink the total amount of physical data written to storage devices. This can be beneficial for solid state drives (SSDs) and other media with finite overwrite limits, by reducing flash drive wear rates.
There are three primary measures of storage capacity that are relevant here:
Figure 1. The primary measures of storage capacity
Note: Savings due to data reduction are highly dependent on the data itself and can vary considerably. This means that accurate rates of savings are not able to be predicted without comprehensive analysis of the actual data set. Any estimates provided in this document are for broad guidance only.

Your Browser is Out of Date

Introduction

Introduction