ECS Data Protection and Data Path Overview Part III
Mon, 09 Sep 2024 21:56:45 -0000
|Read Time: 0 minutes
Dell ECS represents a powerful leap forward in enterprise data storage solutions. As businesses increasingly generate vast amounts of unstructured data, ranging from multimedia files to IoT data streams, the need for scalable, secure, and cost-effective storage has never been greater. Dell ECS is designed to meet this demand by offering a highly flexible and efficient object storage platform that can handle the diverse and dynamic needs of modern enterprises.
This blog will introduce how ECS tolerates the failure situations. It is recommended to read the Dell ECS Data protection and data path overview part II first to get some background knowledge on ECS data protection. You can also refer to ECS architecture and overview paper if you want to know more about ECS.
ECS is designed to tolerate a range of equipment failure situations. The range of failure conditions spans a varying scope including:
- Single hard drive failure in a single node
- Multiple hard drive failure in a single node
- Multiple nodes with single hard drive failure
- Multiple nodes with multiple hard drive failures
- Single node failure
- Multiple node failure
- Loss of communication to one replicated VDC
- Loss of one entire replicated VDC
In either a single site, dual-site, or geo-replicated configuration, the impact of the failure depends on the quantity and type of components affected. However, at each level, ECS provides mechanisms to defend against the impact of component failures. Some of these mechanisms have already been discussed in Dell ECS Data protection and data path overview part II blog. They will be reviewed here again and in the following figure to show how they are applied to the solution. These include:
- Disk failure
- EC segments or replica copies from the same chunk are not stored on the same disk
- Checksum calculation on write and read operations
- Background consistency checker re-verifying checksums
- Node failure
- Distribute segments or replica copies of a chunk equally across nodes in a VDC
- ECS Fabric keeps services running and manages resources such as disks and network.
- Partition records and tables protected by partition ownership failover from node to node.
- Rack failure within VDC
Figure 1: Protection mechanisms at the disk, node, and rack levels
Note: Rack awareness may not provide protection for 12+4 EC on less than 4 racks & 10 + 2 EC on less than 6 racks.
Note: For the rack aware, when adding new rack to the exist cluster, some of the data will be moved to the new rack to balance the data across all the racks equally. However, the process could take a long period of time to avoid having a performance impact on the system. If the customer keeps writing aggressively and fills the first rack, then all the new writes will happen only on the new rack.
Resources
The following Dell Technologies documentation provides additional information related to this blog. Access to these documents depends on an individual’s login credentials. For access to a document, contact a Dell Technologies representative.