If there is an unexpected interruption in replication, both the R1 and R2 devices hold all I/Os temporarily while SRDF/Metro determines which side of the replication is allowed to resume I/Os. Since we only have the two storage systems to deal with, this process takes only a few seconds.
SRDF/Metro doesn’t require any integration with an underlying host cluster such as a Microsoft failover cluster. By simply stopping I/Os on one storage system and allowing them to resume on the other, the Windows failover cluster will reconfigure on its own. Nodes that can perform I/Os to the quorum files remain up and running, and nodes that can not will be removed from the cluster. Application roles, such as SQL server roles, will also start on these surviving nodes which will allow resumption of client connectivity for SQL Server. It is essential that SRDF/Metro resumes I/Os faster than the Windows failover cluster disk timeout interval to avoid a race condition. If Windows failover cluster determines that its quorum files can not be reached anywhere across the cluster before the storage resumes I/Os, then all the cluster nodes will go down. Since SRDF/Metro resumes I/Os in a matter of seconds this race condition does not occur.
As a safety measure to prevent data corruption, the SRDF/Metro devices on the side that resumes I/Os become (or remain) R1 devices, take a RW state (read/write state, making them available for database server read and write I/O operations), and maintain the same external SCSI personality they had during active/active replication. The SRDF/Metro devices on the side that stopped servicing I/Os become (or remain) R2 devices, take a WD state (write disabled state), and do not maintain the same external SCSI personality as they did during replication.