Performance with inline data reduction

Thank you for your feedback!

As with most things in life, data efficiency is a compromise. To gain increased levels of storage efficiency, additional cluster resources (CPU, memory, and disk IO) are used to execute the compressing and deduping and re-inflating of files. As such, the following factors can affect the performance of inline data reduction and the I/O performance of compressed and deduplicated pools:
- Application and the type of dataset being used
- Data access pattern (for example, sequential versus random access, the size of the I/O)
- Compressibility and duplicity of the data
- Amount of total data
- Average file size
- Nature of the data layout
- Hardware platform: the amount of CPU, RAM, and type of storage in the system
- Amount of load on the system
- Level of protection
Note: Clearly, hardware offload compression will perform better, both in terms of speed and efficiency, than the software fallback option – both on F810 nodes where the hardware compression engine has been disabled, and on all other nodes types where software data reduction is the only available option.
Another important performance impact consideration with inline data efficiency is the potential for data fragmentation. After compression or deduplication, files that previously enjoyed contiguous on-disk layout will often have chunks spread across less optimal file system regions. This can lead to slightly increased latencies when accessing these files directly from disk, rather than from cache.
Because inline data reduction is a data efficiency feature rather than performance enhancing tool, in most cases the consideration will be around cluster impact management. This is from both the client data access performance front and from the data reduction execution perspective, as additional cluster resources are consumed when shrinking and inflating files.
With inline data reduction enabled, highly incompressible data sets may experience a small performance penalty. Compression adds latency to the write path. If you consider that initially, writes go to memory (journal), simply performing compression is not free, and that added latency can impact performance. Workloads performing small, random operations will likely see a small performance degradation. Conversely, for highly compressible and duplicate data there may be a performance boost.
Since they reside on the same card, the compression FPGA engine shares PCI-e bandwidth with the node’s backend Ethernet interfaces. In general, there is plenty of bandwidth available. However, a best practice is to run incompressible performance streaming workflows on F810 nodes with inline data reduction disabled to avoid any potential bandwidth limits.
In general, rehydration (that is, un-deduplication) requires considerably less overhead than compression.
Note: When considering effective usable space on a cluster with inline data reduction enabled, bear in mind that every capacity saving from file compression and deduplication also serves to reduce the per-TB compute ratio (CPU, memory, and so on). For performance workloads, the recommendation is to size for performance (IOPS, throughput, and so on) rather than effective capacity.
Similarly, it is challenging to broadly characterize the inline dedupe performance overhead with any accuracy because it depends on various factors, including the duplicity of the data set, and whether matches are found against other LINs or SINs. Workloads requiring a large amount of deduplication might see an impact of 5-10%, but they enjoy an attractive efficiency ratio. In contrast, certain other workloads may see a slight performance gain because of inline dedupe. If there is block scanning but no deduplication to perform, the overhead is typically in the 1-2% range.
Typically, SmartDedupe space savings in addition to inline deduplication fail a cost benefit analysis against performance trade-offs. Minimally intrusive Cost-Benefit analysis can be performed in the following manner:
- With only inline deduplication in operation, obtain a performance baseline of user sensitive Key Performance Indicators (KPIs).
- While monitoring the chosen KPIs, run the DedupeAssessment job to obtain an estimate of additional space savings. Note that that assessment job puts less load on the cluster than the regular SmartDedupe job.
- Review estimated space saving vs performance changes but be aware that SmartDedupe will add more load than observed during the DedupeAssessment job run.
- Enable SmartDedupe if estimated space savings offer business benefit above performance trade-off.

Your Browser is Out of Date

Performance with inline data reduction

Performance with inline data reduction