For data integrity, ECS uses checksums. Checksums are created during write operations and are stored with the data. On reads, checksums are calculated and compared with the stored version. A background task scan verifies checksum information proactively.
For data protection, ECS uses triple mirroring for journal chunks and separate EC schemes for repo (user repository data) and btree (B+ tree) chunks.
Erasure coding provides enhanced data protection from a disk, node, and rack failure in a storage efficient fashion compared with conventional protection schemes. The ECS storage engine implements the Reed Solomon error correction using two schemes:
ECS requires a minimum of five nodes with using the default of 12+4, the resulting 16 segments are dispersed across nodes at the local site. The data and coding segments of each chunk are equally distributed across nodes in the cluster. For example, with eight nodes, each node has two segments (out of 16 total). The storage engine can reconstruct a chunk from any 12 of the 16 segments.
ECS requires a minimum of six nodes for the cold archive option, in which a 10+2 scheme is used instead of 12+4. EC stops when the number of nodes goes below the minimum required for the EC scheme.
When a chunk is full or after a set period, it is sealed, parity is calculated, and the coding segments are written to disks across the fault domain. Chunk data remains as a single copy that consists of 16 segments (12 data, 4 code) dispersed throughout the cluster. ECS only uses the code segments for chunk reconstruction when a failure occurs.
When the underlying infrastructure of a VDC changes at the node or rack level, the Fabric layers detect the change and trigger a rebalance scanner as a background task. The scanner calculates the best layout for EC segments across fault domains for each chunk using the new topology. If the new layout provides better protection than the existing layout, ECS re-distributes EC segments in a background task. This task has minimal impact on system performance; however, there will be an increase in inter-node traffic during rebalancing. Balancing of the logical table partitions onto the new nodes also occurs and newly created journal and B+ tree chunks are evenly allocated on old and new nodes going forward. Redistribution enhances local protection by leveraging all of the resources within the infrastructure.
Note: It is recommended not to wait until the storage platform is completely full before adding drives or nodes. A reasonable storage utilization threshold is 70% taking consideration daily ingest rate and expected order, delivery and integration time of added drives/nodes.