Home > Storage > ObjectScale and ECS > Product Documentation > ECS General Best Practices > Replication groups
Replication groups allow grouping of storage pools from different geographically located VDCs for replication of data across sites. Replication of data across sites has the following advantages:
Create the minimum number of replication groups due to the indexing overhead associated with storage pool and replication group pairs. Do not create more replication groups that serve the same function. For example, two replication groups containing the same set of VDC storage pools will add additional and unnecessary overhead.
In general, create one replication group for local data (nonreplicated) and one for replicated data that spans all VDCs. Organizations with more than two sites might consider additional replication groups for times when data should only be replicated to a subset of all sites. Generally, one replication that spans all sites is sufficient. Compliance requirements might dictate additional replication groups be created, for example, where data privacy or sovereignty laws prohibit shared data across specific borders.
For scenarios in which there is massive data with short-term and long-term retention times, especially for the backup and archive use case, it is better to create two separate replication groups for them. Having short-term and long-term data that are mixed and stored in a chunk within one replication group will impact the efficiency of the garbage collection (space reclamation) in ECS. The separate replication groups can avoid this problem because the garbage collection mechanism is based on each replication group.
When three or more sites are in a replication group, efficiencies in storage overhead can be gained. ECS can XOR chunks written at two sites at a third site. To gain these efficiencies, new writes must occur at two or more sites. To balance the efficiency across all sites in a replication group, all sites must have relatively similar write workloads. This benefit might not be appropriate for all workloads, especially in scenarios where WAN latency creates unacceptable bottlenecks. However, there are tradeoffs when data is spread across sites. For instance, there is an additional latency for WAN lookups of objects not local to the VDC. Geo-caching does alleviate some of this; however, this latency can pose some issues for applications if data is not in cache.
Replication group best practices |
|