Performance with inline data reduction

Thank you for your feedback!

As with most things in life, data efficiency is a compromise. To gain increased levels of storage efficiency, additional cluster resources (CPU, memory, and disk IO) are utilized to perform the compressing and deduping and re-inflating of files. As such, the following factors can affect the performance of inline data reduction and the I/O performance of compressed and deduplicated pools:
- Application and the type of dataset being used
- Data access pattern (for example, sequential compared to random access, the size of the I/O)
- Compressibility and duplicity of the data
- Amount of total data
- Average file size
- Nature of the data layout
- Hardware platform: the amount of CPU, RAM, and type of storage in the system
- Amount of load on the system
- Level of protection
Clearly, hardware offload compression will perform better, both in terms of speed and efficiency, than the software fallback option. This improvement is evident on both on F810 nodes where the hardware compression engine has been disabled, and on all other node types where software data reduction is the only available option.
Another important performance impact consideration with inline data efficiency is the potential for data fragmentation. After compression or deduplication, files that previously enjoyed contiguous on-disk layout will often have chunks spread across less optimal file system regions. This can lead to slightly increased latencies when accessing these files directly from disk, rather than from cache.
Because inline data reduction is a data efficiency feature rather than performance-enhancing tool, usually the consideration will be around cluster impact management. This consideration includes both the client data access performance front and from the data reduction execution perspective, as additional cluster resources are consumed when shrinking and inflating files.
With inline data reduction enabled, highly incompressible datasets may experience a small performance penalty. Conversely, for highly compressible and duplicate data, there may be a performance boost. Workloads performing small, random operations will likely see a small performance degradation.
Since they reside on the same card, the compression FPGA engine shares PCIe bandwidth with the node’s backend Ethernet interfaces. In general, there is plenty of bandwidth available. However, a best practice is to run incompressible performance streaming workflows on F810 nodes with inline data reduction disabled to avoid any potential bandwidth limits.
In general, rehydration requires considerably less overhead than compression.
When considering effective usable space on a cluster with inline data reduction enabled, understand that every capacity saving from file compression and deduplication also serves to reduce the per-TB compute ratio (CPU, memory). For performance workloads, the recommendation is to size for performance (IOPS, throughput) rather than effective capacity.
Similarly, it is challenging to broadly characterize the inline deduplication performance overhead with any accuracy since it depends on various factors including the duplicity of the dataset, whether matches are found against other LINs or SINs. Workloads requiring a large amount of deduplication might see an impact of 5-10%, although they experience an attractive efficiency ratio. In contrast, certain other workloads may see a slight performance gain because of inline deduplication. If there is block scanning but no deduplication to perform, the overhead is typically in the 1-2% range.
Typically, SmartDedupe space savings in addition to inline deduplication fail a cost benefit analysis against performance trade-offs. Minimally intrusive Cost-Benefit analysis can be performed in the following manner:
- With only inline deduplication in operation, obtain a performance baseline of user sensitive Key Performance Indicators (KPIs).
- While monitoring the chosen KPIs, run the DedupeAssessment job to obtain an estimate of additional space savings.
Note: This job puts less load on the cluster than the regular SmartDedupe job.
- Review estimated space saving vs performance changes but be aware that SmartDedupe will add more load than observed during the DedupeAssessment job run.
- Enable SmartDedupe if estimated space savings offer business benefit above performance trade-off.

Your Browser is Out of Date

Performance with inline data reduction

Performance with inline data reduction