Storage capacity requirements continue to grow exponentially, and IT organizations are looking for ways to increase storage efficiency in order to meet their growing capacity requirements at the lowest cost. One way to do this is to use data deduplication and compression, which can result in more capacity at a lower cost. Many environments can achieve an effective capacity that is twice the raw capacity. These data reduction capabilities can be utilized on all-flash VxRail clusters.
Compression and deduplication techniques have been in place for a number of years but have not been widely adopted because of the overhead and system resources required to implement. Today, VxRail all-flash models with many cores and lots of memory per processor are a powerhouse! Along with the architectural efficiencies of vSAN, the space savings more than offset the slight overhead. A VxRail all-flash configuration often provides more effective capacity at a lower cost than a hybrid HDD solution.
With vSAN, deduplication and compression occurs inline when data is de-staged from the cache to the capacity drives. First data is deduplicated by removing redundant copies of blocks that contain the exact same data. This is done at the 4K block level.
The figure below shows a typical virtual machine environment.
Figure 56. Inline data deduplication
While all VMs are unique, they share some amount of common data. Rather than saving multiple copies of the same data, identical blocks are saved once on SSD, and references to the unique blocks are tracked using metadata that is maintained in the capacity tier.
The deduplication algorithm is applied at the disk-group level and results in only a single copy of each unique 4K block per disk group. While duplicated data blocks may exist across multiple disk groups, by limiting the deduplication domain to a disk group, a global lookup table is not required, which minimizes network overhead and CPU utilization.
LZ4 compression is applied after the blocks are deduplicated and before being written to SSD. If the compression results in a block size of 2KB or less, the compressed version of the block is persistently saved on SSD. If the compression does not result in a block size of less than 2KB, the full 4K block is written to SSD.
Almost all workloads benefit some from deduplication. However typical virtual server workloads with highly redundant data such as full clone virtual desktops or homogenous server operating systems benefit most. Compression provides further data reduction. Text, bitmap, and program files are very compressible, and 2:1 is often possible. Other data types that are already compressed, such as certain graphics formats and video files or encrypted files, may yield little or no reduction.
Deduplication and compression are disabled by default and are enabled together at the cluster level. (See the figure below.) While it can be enabled at any time, enabling it when the system is initially setup is recommended to avoid the overhead and potential performance impact of having to deduplicate and compress existing data through post processing rather than to do it inline.
Figure 57. Deduplication and compression enabled
Deduplication algorithms break data files into contiguous segments, or compute fingerprints, used to identify duplicate segments and reduce the data footprint. This is a basic deduplication concept. The specific approach varies among system vendors, but any deduplication method consumes CPU to compute the segment fingerprints or hash keys, and it executes I/O operations when performing lookups on the segment index tables.
vSAN computes the fingerprints and looks for duplicated segments only when the data is being de-staged from the cache to the capacity tier. This means that under normal operations, VM writes to the write buffer in the cache SSD should not incur any latency impact.
The cost of the deduplication occurs when data is de-staged from the cache to the capacity tier. It consumes a portion of CPU capabilities reserved for vSAN, and the disk operations generated by the index lookups consumes a portion of the backend I/O capabilities.
Because resource consumption varies according to I/O patterns, data types and so on, consult with an Dell EMC or VMware specialist before deciding whether deduplication is recommended for your application.
More information can be found in Technical Whitepaper VMware VSAN 6.2 Space Efficiency Technologies at http://www.vmware.com/files/pdf/products/vsan/vmware-vsan-62-space-efficiency-technologies.pdf.