Chunks contain 128 MB of data consisting of one or more objects from a bucket or buckets that share the same replication group settings. Replication is performed asynchronously and initiated at the chunk level by the chunk partition owner. If the chunk is configured with geo-replication, gets added to a replication queue as it is written to the primary site’s chunk. It does not wait for the chunk to be sealed. Worker I/O threads continuously process the queue.
The write operation first occurs locally, including adding data protection, and then it is replicated and protected at the remote site. The following figure and steps provide an example of the write process for a 128 MB object to a geo-replicated bucket.
Figure 8. Write data workflow for a 128 MB object to a geo-replicated bucket
- Write request for Object A is sent to a node—in this example, Site 1 Node 1. Site 1 becomes the Object A owner.
- Data is inline erasure coded and written to a chunk in Site 1.
- Table partition owners—in this example, Node 4—update the appropriate tables (for example, chunk, object, and bucket listing tables) and write the transactions into the table journal logs. This metadata is written to a metadata chunk that is triple mirrored in Site 1.
- Acknowledgment of the successful write is sent to the client.
- For each chunk, the chunk partition table owner (in this example, Node 4) performs these actions:
- Adds the data inside the chunk to the replication queue after it is written locally; does not wait for the chunk to be sealed
- Reads the data fragments of the chunk (parity fragments are only read if required to re-create a missing data fragment)
- Encrypts and replicates the data to site 2 by HTTP
- Table partition owners for the replicated chunks—in this example site 2 Node 4—update the appropriate tables and write the transactions into the table journal logs, which are triple mirrored.
- Each replicated chunk is initially written on the Site 2 using triple mirroring.
- Acknowledgment is sent back to the primary site’s chunk partition table owner.
Note: Data written to the replicated site is erasure coded after a delay that allows time for other processes, such as XOR operations, to be completed first.