Home > Storage > ObjectScale and ECS > Product Documentation > Dell ECS: High Availability Design > Temporary site outage (TSO)
Temporary site outages occur when a site is temporarily inaccessible to other sites in a replication group. ECS allows administrators two configuration options that affect how objects can be accessed during a temporary site outage.
The default is for Access During Outage to be disabled.
The Access During Outage option can be set at the bucket level, so you can enable this option for some buckets and not for others. This bucket option can be changed at any time as long as all sites are online; it cannot be changed during a site failure.
During a temporary site outage:
Because ECS provides strong consistency, I/O requests require checking with the owner before responding. If a site is inaccessible to other sites within a replication group, some access to buckets and objects might be disrupted.
The following table shows which access is required for an operation to succeed.
Operation | Requirements for success |
Create object | The bucket owner must be accessible. |
List objects | The bucket owner and all objects in the bucket must be accessible by the requesting node. |
Read object | The requestor must be either one of these:
|
The following figure shows an example of the bucket and object layout in a three-site configuration.
The following table lists which operations will succeed or fail in the three-site configuration example in the preceding figure if Site 1 is inaccessible to the other sites in the replication group. To simplify interpretation of the table, the inaccessible site is listed as “failed” and the other two sites are listed as “online.”
Operation | Bucket/object | Request sent to | ||
Site 1 (failed) | Site 2 (online) | Site 3 (online) | ||
Create objects in | Bucket A | Success Bucket owned locally | Fail Cannot access bucket owner | Fail Cannot access bucket owner |
Bucket B | Fail Cannot access bucket owner | Fail Cannot access bucket owner | Fail Cannot access bucket owner | |
Bucket C | Fail Cannot access bucket owner | Success Bucket owned by online site | Success Bucket owned locally | |
List objects in | Bucket A | Fail Although bucket is owned locally, it contains an object that is owned by a site it cannot access | Fail Cannot access bucket owner | Fail Cannot access bucket owner |
Bucket B | Fail Cannot access bucket owner | Fail Although bucket is owned locally, it contains an object that is owned by the failed site | Fail Although bucket owner is online, the bucket contains an object that is owned by the failed site | |
Bucket C | Fail Cannot access bucket owner | Success Bucket owner is online site and all objects are from online sites | Success Bucket owned locally and all objects are from online sites | |
Read or update object | Object 1 | Success Object owned locally | Fail Cannot access object owner | Fail Cannot access object owner |
Object 2 | Success Object owned locally | Fail Cannot access object owner | Fail Cannot access object owner | |
Object 3 | Fail Cannot access object owner | Success Object owned locally | Success Object is not locally owned so gets object owner from bucket owner, which is online | |
Object 4 | Fail Cannot access object owner | Success Object owned locally | Fail Object is not locally owned so requires accessing the bucket owner, which is the failed site | |
Object 5 | Fail Cannot access object owner | Success Object is not locally owned, so it gets object owner from bucket owner | Success Object owned locally | |
Object 6 | Fail Cannot access object owner | Success Object is not locally owned, so gets object owner from bucket owner, which is online | Success Object owned locally |
When a site is first inaccessible to other sites within a replication group, the behavior is detailed in the default TSO behavior section. After the heartbeat is lost between sites for a sustained period of time (the default is 15 minutes), ECS marks a site as failed.
Enabling Access During Outage (ADO) on a bucket changes the TSO behavior after a site is marked as failed, allowing objects in that bucket to use eventual consistency. Thus, after a site is marked as temporarily failed, any buckets that have the option Access During Outage enabled will support reads and, optionally, writes from a non-owner site. ECS accomplishes this result by allowing usage of the replicated metadata when the authoritative copy on the owner site is unavailable. You can change the Access During Outage bucket option at any time except during a site failure.
The benefit of enabling Access During Outage is that it allows access to data after a site is marked as failed. The disadvantage is that the data returned might be outdated.
ECS 3.1 and later versions have an additional bucket option for Read-Only Access During Outage. This option ensures that object ownership never changes and removes the chance of conflicts otherwise caused by object updates on both the failed and online sites during a TSO.
The Read-Only Access During Outage option is available during bucket creation only, it cannot be modified afterwards.
The disadvantage of Read-Only Access During Outage is that after a site is marked as failed, no new objects can be created. Also, no existing objects in the bucket can be updated until after all sites are back online.
As previously mentioned, a site is marked as failed when the heartbeat is lost between sites for a sustained period of time. The default is 15 minutes. If the heartbeat is lost for a sustained period of time:
A failed site might still be accessible by clients and applications—for example, when a company’s internal network loses connectivity to a single site, but extranet networking remains operational. As an example, in a five-site configuration, if Sites 2 through 5 lose network connectivity to Site 1 for a sustained period of time ECS will mark Site 1 as temporarily failed. If Site 1 is still accessible to clients and applications, it can service requests for locally owned buckets and objects because lookups to other sites are not required. However, requests to Site 1 for non-owned buckets and objects will fail. The following table shows which access is required after a site is marked as failed for an operation to succeed if Access During Outage is enabled.
Operation | Request sent to the failed site (in a federation that contains three or more sites) | Request sent to an online site including:
|
Create object | Success for locally owned buckets unless Read-Only Access During Outage is enabled on the bucket Fail for remotely owned buckets | Success unless Read-Only Access During Outage is enabled on the bucket |
List objects | Only lists objects in its locally owned buckets if all objects are also locally owned | Success Will not include objects owned by failed site that have not finished being replicated |
Read object | Success for locally owned objects in locally owned buckets (might not be most recent version) Fail for remotely owned objects | Success. If the object is owned by the failed site, it requires that the original object had finished replication before the failure occurred |
Update object | Success for locally owned objects in locally owned buckets unless Read-Only Access During Outage is enabled on the bucket Fail for remotely owned objects | Success, unless Read-Only Access During Outage is enabled on the bucket. Acquires ownership of object |
After a site is marked as failed, create object will not succeed if Read-Only Access During Outage is enabled on the bucket. If it is disabled:
Note: Read requests sent to online sites where the bucket owner is the failed site will use the online site’s local bucket listing information and object history to determine the object owner.
An online site can update both objects owned by online sites and failed sites. If an object update request is sent to an online site for an object owned by the site marked as failed, the online site will update the latest version of the object available on a system marked as online.
The site performing the update becomes the new object owner and updates the object history with the new owner information and sequence number. This information will be used in recovery or rejoin operations of the original object owner to update the site’s object history with the new owner.
Note: Update requests sent to online sites where the bucket owner is the failed site will use the online site’s local bucket listing information and object history to determine the object owner.
The following example shows what would happen with the bucket and object layout for namespace 1 in the three-site configuration depicted in the following figure.
Table 18 shows an example in this three-site configuration if all three of these conditions are met:
Operation | Bucket/object | Request sent to | |
Site 1 (marked as failed) | Site 2 or Site 3 (online) | ||
Create objects in | Bucket A | Success | Success |
Bucket B | Fail Failed site can only create objects in locally owned buckets | Success
| |
Bucket C | Fail Failed site can only create objects in locally owned buckets | Success | |
List objects in | Bucket A | Fail Although the bucket is owned locally, it contains remotely owned objects | Success Will not include objects owned by failed site that have not been replicated |
Bucket B | Fail Failed site can only list objects in locally owned buckets | Success
| |
Bucket C | Fail Failed site can only list objects in locally owned buckets | Success
| |
Read or update object | Object 1 | Success, both object and bucket are owned locally | Success Requires object to have completed replication before TSO Update acquires object ownership |
Object 2 | Fail, bucket is not owned locally | ||
Object 3 Object 4 Object 5 Object 6 | Fail Failed site can only read and update locally owned objects in locally owned buckets | Success
|
Once the heartbeat is reestablished between sites, the system marks the site as online, and access to this data continues as it did before the failure. The rejoin operation:
Note: ECS only supports access during the temporary failure of a single site.
In the following two-site example, both sites think they are online and mark the other site as failed when a TSO happens. All create, list, read, and update operations succeed.
During the TSO, all the objects are updated in each site. The following table shows the final data in the site once the heartbeat is reestablished between sites.
Object | Bucket name | Bucket owner | Object owner | “Winning” site |
Object 1 | Bucket A | Site 1 | Site 1 | Site 2 |
Object 2 | Bucket B | Site 2 | Site 1 | Random |
Object 3 | Bucket A | Site 1 | Site 2 | Random |
Object 4 | Bucket B | Site 2 | Site 2 | Site 1 |
Note: In this example, starting with ECS version 3.7, for Object 2 and Object 3, the winning site was changed from the site with the latest timestamp to Random.
As described in XOR encoding, ECS maximizes storage efficiency of data that is configured with a replication group containing three or more sites. The data in secondary copies of chunks might be replaced by data in a parity chunk after an XOR operation.
Requests for data in a chunk that has been encoded are serviced by the site containing the primary copy. If this site is failed, the request goes to the site with the secondary copy of the object. However, because this copy was encoded, the secondary site must first retrieve the copy of the chunks that were used for encoding from the online primary sites. Then it will perform an XOR operation to reconstruct the requested object and respond to the request. After the chunks are reconstructed, they are also cached so that the site can respond more quickly to subsequent requests.
The following table shows an example of a portion of a chunk manager table on Site 4 in a four-site configuration.
Chunk ID | Primary site | Secondary site | Type |
C1 | Site 1 | Site 4 | Encoded |
C2 | Site 2 | Site 4 | Encoded |
C3 | Site 3 | Site 4 | Encoded |
C4 | Site 4 |
| Parity (C1, C2 & C3) |
The following figure illustrates the requests involved in re-creating a chunk to service a read request during a TSO.
In this example, a read request for an object in chunk C1 when Site 1 is marked as failed initiates the following process:
Site 4 has already performed XORs on chunks C1, C2, and C3. It has replaced its local copy of the data from these chunks with data in parity chunk C4.
Note: The time for reconstruction operations to complete increases linearly based on the number of sites in a replication group.
Any data in a bucket configured with passive geo-replication has between two and four source sites and one or two dedicated replication targets. Data written to the replication targets might be replaced by data in a parity chunk after an XOR operation. Requests for passively geo-replicated data are serviced by the site containing the primary copy. If this site is inaccessible to the requesting site, the data must be recovered from one of the replication target sites.
With passive geo-replication, the source sites are always the object and bucket owners. If a replication target site is marked as temporarily failed, all I/O operations will continue as usual. The only exception is replication, which will continue to queue until the replication target site rejoins the federation.
If one of the source sites fails, requests to the online source site must recover non-locally-owned data from one of the replication target sites. In the following example, Site 1 and Site 2 are the source sites, and Site 3 is the replication target site. An object’s primary copy exists in chunk C1, which is owned by Site 1, and the chunk has been replicated to the target destination, Site 3.
If Site 1 fails and Site 2 gets a request to read that object, Site 2 will have to get a copy from Site 3. If the copy was encoded, the secondary site must first retrieve the copy of the other chunk that was used for encoding from the online primary site. Then it performs an XOR operation to reconstruct the requested object and respond to the request. After the chunks are reconstructed, they are also cached so that the site can respond more quickly to subsequent requests.
The following table shows an example of a portion of a chunk manager table on the passive geo-replication target.
Chunk ID | Primary site | Secondary site | Type |
C1 | Site 1 | Site 3 | Encoded |
C2 | Site 2 | Site 3 | Encoded |
C3 | Site 3 |
| Parity (C1 and C2) |
The following figure illustrates the requests involved in re-creating a chunk to service a read request during a TSO.
In this example, if a read came in for an object in chunk C1 when Site 1 is marked as temporarily failed, the following process occurs:
Site 3 has already performed XORs on chunks C1 and C2, meaning it has replaced its local copy of the data from these chunks with data in parity chunk C3.
Buckets configured with the options of Replicate to All Sites and Access During Outage can provide faster read performance. The faster read performance comes not only during a time when all sites are online but during a temporary site outage. No XOR decoding operation is required, and there is a greater chance that the data will be read locally.
Data in buckets with Replicate to All Sites enabled is replicated to each site. Create and update objects are handled the same as if Replicate to All Sites was disabled. However, read and list objects are handled slightly differently because some data might have only completed replication to some sites, but not all, before the primary site failed.
During a read operation, the node servicing the request first checks the latest version of the metadata from the object owner.
If the requesting node Is not the object owner:
During a list objects in a bucket operation, the node requires information from the bucket owner and head information for each object in the bucket. If the site that is the object owner or the bucket owner is down and Access During Outage is also enabled, the node can still service the request if all the remaining sites in the replication group are online. It will list the latest version of the bucket listing that it has; it might be slightly outdated and can vary between sites.
ECS only supports access during a temporary failure of a single site within a replication group; furthermore, only one site can be marked as failed. Thus, if more than one site within a replication group is failed concurrently, some operations will fail. The first site determined to be failed (due to sustained loss of heartbeat) is marked as failed in the UI. Any remaining sites that also have a sustained loss of heartbeat are not marked as failed and so are considered online.
As an example, in a replication group of five sites, if Site 1 is identified as having a sustained loss of heartbeat, it is marked as failed. If site 2 is also identified as having a sustained loss of heartbeat, it remains listed as online. In such case: