Dell ECS represents a powerful leap forward in enterprise data storage solutions. As businesses increasingly generate vast amounts of unstructured data, ranging from multimedia files to IoT data streams, the need for scalable, secure, and cost-effective storage has never been greater. Dell ECS is designed to meet this demand by offering a highly flexible and efficient object storage platform that can handle the diverse and dynamic needs of modern enterprises.

This blog will introduce how ECS tolerates the failure situations. It is recommended to read the Dell ECS Data protection and data path overview part II first to get some background knowledge on ECS data protection. You can also refer to ECS architecture and overview paper if you want to know more about ECS.

ECS is designed to tolerate a range of equipment failure situations. The range of failure conditions spans a varying scope including:

Single hard drive failure in a single node
Multiple hard drive failure in a single node
Multiple nodes with single hard drive failure
Multiple nodes with multiple hard drive failures
Single node failure
Multiple node failure
Loss of communication to one replicated VDC
Loss of one entire replicated VDC

In either a single site, dual-site, or geo-replicated configuration, the impact of the failure depends on the quantity and type of components affected. However, at each level, ECS provides mechanisms to defend against the impact of component failures. Some of these mechanisms have already been discussed in Dell ECS Data protection and data path overview part II blog. They will be reviewed here again and in the following figure to show how they are applied to the solution. These include:

Disk failure
1. EC segments or replica copies from the same chunk are not stored on the same disk
2. Checksum calculation on write and read operations
3. Background consistency checker re-verifying checksums
Node failure
1. Distribute segments or replica copies of a chunk equally across nodes in a VDC
2. ECS Fabric keeps services running and manages resources such as disks and network.
3. Partition records and tables protected by partition ownership failover from node to node.
Rack failure within VDC
1. Distribute segments of replica copies of a chunk equally across racks in a VDC.[WD1] [ZJ2]

Figure 1: Protection mechanisms at the disk, node, and rack levels

Note: Rack awareness may not provide protection for 12+4 EC on less than 4 racks & 10 + 2 EC on less than 6 racks.

Note: For the rack aware, when adding new rack to the exist cluster, some of the data will be moved to the new rack to balance the data across all the racks equally. However, the process could take a long period of time to avoid having a performance impact on the system. If the customer keeps writing aggressively and fills the first rack, then all the new writes will happen only on the new rack.

Resources

The following Dell Technologies documentation provides additional information related to this blog. Access to these documents depends on an individual’s login credentials. For access to a document, contact a Dell Technologies representative.

Your Browser is Out of Date

ECS Data Protection and Data Path Overview Part III

Resources