Home > Storage > PowerScale (Isilon) > Industry Solutions and Verticals > Electronic Design Automation > PowerScale: Best Practices for Semiconductor EDA Design Environments > Inline data reduction
Data reduction comes in various forms and includes techniques such as compression and deduplication. These techniques can be applied to data at various points in their storage life cycle, such as in real time as files are written from a client to storage, or after data has already been stored on disk.
In EDA environments, there are multiple duplicated project workspaces for each engineer working on the same project. It’s highly recommended to use inline data reduction on the project directories to save storage space.
Note: Scratch directories are exception because the nature of scratch usage. Usually, scratch directories are quota enabled for each engineer or project based so engineers frequently delete files after simulation jobs done to save space and free up quota.
Compression typically uses a lossless algorithm both to reduce the physical size of data when it is written to disk and to decompress the data when it is read back, while retaining full fidelity. More specifically, lossless compression reduces the number of bits in each file by identifying and reducing or eliminating statistical redundancy. No information is lost in lossless compression, and a file can easily be decompressed to its original form.
Deduplication differs from data compression in that it eliminates duplicate copies of repeating data. Whereas compression algorithms identify redundant data inside individual files and encode the redundant data more efficiently, deduplication inspects data and identifies sections, or even entire files that are identical and replaces them with a shared copy.
Both compression and deduplication are transparent to all applications that sit on top of the file system, including protocol-based services such as NFS, SMB, HDFS, and S3. The primary purpose of OneFS inline data reduction is to reduce the storage requirements for data, resulting in a smaller storage footprint, reduced power and cooling requirements, and a reduction in the overall per-TB storage cost. Real time, inline data reduction also helps to shrink the total amount of physical data written to storage devices. This can be beneficial for solid state drives (SSDs) and other media with finite overwrite limits, by reducing flash drive wear rates.
In-line compression is enabled by default on new F810 clusters running OneFS 8.2.1 and later, on new H5600 clusters running OneFS 8.2.2 and later, on new F600 and F200 clusters running OneFS 9.0, on F900 clusters running OneFS 9.2, on H700/7000 and A300/3000 clusters running OneFS 9.2.1 and later, and F710 and F210 clusters running OneFS 9.7 and later.
In-line deduplication is enabled by default for new clusters running OneFS 9.4 or later. For earlier OneFS releases, inline dedupe is disabled by default.
It is highly recommended to enable both inline compression and inline deduplication.
Enable inline compression using below CLI command:
# isi compression settings view
Enabled: No
# isi compression settings modify --enabled=True
# isi compression settings view
Enabled: Yes
Enable inline deduplication using below CLI command:
# isi dedupe inline settings view
Mode: disabled
# isi dedupe inline settings modify –-mode enabled
# isi dedupe inline settings view
Mode: enabled
Assess mode.
OneFS inline data deduplication can be run in assess mode from the CLI with the following syntax:
# isi dedupe inline settings modify –-mode assess
The most comprehensive of the data reduction reporting CLI utilities is the ‘isi statistics data-reduction’ command. For example:
# isi statistics data-reduction
Recent Writes Cluster Data Reduction
(5 mins)
--------------------- ------------- ----------------------
Logical data 76.64M 99.71T
Zero-removal saved 0 -
Deduplication saved 0 76.44T
Compression saved 62.75M 1.81T
Inlined data 18.20M 4.85T
Preprotected physical 13.89M 16.61T
Protection overhead 27.78M 8.46T
Protected physical 41.67M 47.38T
Zero removal ratio 1.00 : 1 -
Deduplication ratio 1.00 : 1 4.28 : 1
Compression ratio 5.52 : 1 1.08 : 1
Data reduction ratio 5.52 : 1 4.65 : 1
Inlined data ratio 2.31 : 1 1.29 : 1
Efficiency ratio 1.84 : 1 2.10 : 1
--------------------- ------------- ----------------------
From the OneFS CLI, the ‘isi compression stats’ command provides the option to either view or list compression statistics. When run in ‘view’ mode, the command returns the compression ratio for both compressed and all writes, plus the percentage of incompressible writes, for a prior five-minute (300 seconds) interval. For example:
# isi compression stats view
stats for 300 seconds at: 2024-03-21 04:48:55 (1711010935)
compression ratio for compressed writes: 5.21 : 1
compression ratio for all writes: 5.21 : 1
incompressible data percent: 0.16%
total logical blocks: 8275
total physical blocks: 1554
writes for which compression was not attempted: 0.00%
From the OneFS CLI, the ‘isi dedupe stats’ command provides cluster deduplication data usage and savings statistics, in both logical and physical terms. For example:
# isi dedupe stats
Cluster Physical Size: 248.86T
Cluster Used Size: 48.39T
Logical Size Deduplicated: 79.73T
Logical Saving: 76.44T
Estimated Size Deduplicated: 103.39T
Estimated Physical Saving: 99.13T