Home > Storage > PowerScale (Isilon) > Product Documentation > Protocols > Dell PowerScale: Solution Design and Considerations for SMB Environments > SMB continuous availability and witness
PowerScale OneFS 8.0 introduced support for SMB Continuous Availability (CA), which can enable users to perform both planned and unplanned disruptive event of PowerScale nodes in a cluster without interrupting server applications storing data on these file shares. It improves the resilience of SMB3-capable client connections to SMB shares during events such as PowerScale node reboots. This feature applies to Microsoft Windows 8, Windows 10, Windows Server® 2012 R2 and Windows Server 2016 clients as part of SMB 3.0 new features.
When the SMB client initially connects to the file share, the client determines whether the file share has the continuous availability property set. If it does, this means the file share supports SMB continuous availability. When the SMB client subsequently opens a file on the file share, it requests a persistent file handle. When the PowerScale node receives the request, the PowerScale node will return the file handle along with a unique key (Resume Key). The resume key can resume the handle state after planned or unplanned failover.
Figure 16 shows the workflow between SMB clients and PowerScale nodes when a failure occurs. If a planned move or failure occurs on the PowerScale cluster node to which the SMB client is connected, the SMB client attempts to reconnect to another cluster node with the resume key. Once it successfully reconnects to another node in the PowerScale cluster, the SMB client starts the resume operation using the resume key. When PowerScale receives the resume key, it will recover the handle state to the same state prior to the failure with end-to-end support. For some operations, it can be replayed. For other operations, it cannot be replayed. From a client perspective, it appears the I/O operations are stalled for a small amount of time.
How witness service works with SMB continuous availability
In SMB 3.0, Microsoft introduced a Remote Procedure Call (RPC) based mechanism to inform the clients of any state changes in the SMB servers. This service is called Service Witness Protocol (SWP) which ensures time-critical applications will quickly re-connect to a new node in an PowerScale cluster when there is a failure without waiting for Transmission Control Protocol (TCP) timeouts or SMB timeouts. It will minimize outages and is supported by any PowerScale node in the pool.
Figure 17 shows the workflow between SMB clients and PowerScale nodes with witness service. When the SMB client connects to a file share with CA on an PowerScale cluster, the SMB client will get the witness node list from PowerScale. The SMB client picks up a different cluster node in the same pool and issues a registration request to the witness node for availability events. The witness service then listens to cluster events related to the PowerScale node the SMB client is connected to.
When the node becomes inaccessible, the witness service receives a OneFS Group Management Protocol (GMP) event and notify client failure of the node. The primary role of the OneFS GMP is to help creating and maintaining a group of synchronized nodes. Once receiving the witness notification, clients will immediately failover and reconnect to the new node which significantly speeds up recovery from unplanned failures. The reconnection is reduced from 50-60 seconds (TCP timeouts) to only a few seconds.
These are some key considerations that we recommend during the design and implementation phases:
# isi set -c coal_only <directory_name>