Home > Storage > PowerScale (Isilon) > Product Documentation > Storage (general) > Dell PowerScale: Considerations and Best Practices for Large Clusters > Minimum access level for capabilities per drive state
Drive states | Minimum access level to read | Minimum access level to write | Restripe from |
UP | Normal | Normal | No |
UP, Smartfail | Soft-fail | Never | Yes |
DOWN | Never | Never | No |
DOWN, Smartfail | Never | Never | Yes |
DEAD | Never | Never | Yes |
STALLED | Read_Stalled | Modify_Stalled | No |
OneFS depends on a consistent view of a cluster’s group state. For example, some decisions, such as choosing lock coordinators, are made assuming all nodes have the same coherent notion of the cluster.
Group changes originate from multiple sources, depending on the particular state. Drive group changes are initiated by the drv module. Service group changes are initiated by processes opening and closing service devices. Each group change creates a group ID, consisting of a node ID and a group serial number. This group ID can be used to quickly determine whether a cluster’s group has changed, and is invaluable for troubleshooting cluster issues, by identifying the history of group changes across the nodes’ log files.
GMP provides coherent cluster state transitions using a process similar to two-phase commit, with the up and down states for nodes being directly managed by the GMP. The Remote Block Manager (RBM) provides the communication channel that connect devices in the OneFS. When a node mounts /ifs, it initializes the RBM in order to connect to the other nodes in the cluster and uses it to exchange GMP information, negotiate locks, and access data on the other nodes.
Before /ifs is mounted, a 'cluster' is just a list of MAC and IP addresses in array.xml, managed by isi_boot_d when nodes join or leave the cluster. When mount_efs is called, it must first determine what it is contributing to the file system, based on the information in drives.xml. After a cluster (re)boot, the first node to mount /ifs is immediately placed into a group on its own, with all other nodes marked down. As the Remote Block Manager (RBM) forms connections, the GMP merges the connected nodes, enlarging the group until the full cluster is represented. Group transactions where nodes transition to UP are called a ‘merge’, whereas a node transitioning to down is called a split. Several file system modules must update internal state to accommodate splits and merges of nodes. Primarily, this is related to synchronizing memory state between nodes.
The soft-failed, read-only, and dead states are not directly managed by the GMP. These states are persistent and must be written to array.xml accordingly. Soft-failed state changes are often initiated from the user interface, for example using the ‘isi devices’ command.
A GMP group relies on cluster quorum to enforce consistency across node disconnects. Requiring ⌊N/2⌋+1 replicas to be available ensures that no updates are lost. Since nodes and drives in OneFS may be readable, but not writable, OneFS has two quorum properties:
Read quorum is governed by having [N/2] + 1 nodes readable, as indicated by sysctl efs.gmp.has_quorum. Similarly, write quorum requires at least [N/2] + 1 writeable nodes, as represented by the sysctl efs.gmp.has_super_block_quorum. A group of nodes with quorum is called the ‘majority’ side, whereas a group without quorum is a ‘minority’. By definition, there can only be one ‘majority’ group, but there may be multiple ‘minority’ groups. A group which has any components in any state other than up is referred to as degraded.
File system operations typically query a GMP group several times before completing. A group may change over the course of an operation, but the operation needs a consistent view. This is provided by the group info, and includes the GMP’s group state, plus information about services provided by nodes in the cluster. This allows nodes in the cluster to discover when services go up or down on other nodes and take the appropriate action when that occurs.
Processes also change the service state in GMP by opening and closing service devices. A particular service transitions from down to up in the GMP group when it opens the file descriptor for a service-specific device. Closing the service file descriptor will trigger a group change that reports the service as down. A process can explicitly close the file descriptor if it chooses, but most often the file descriptor will remain open during the process and closed automatically by the kernel when it terminates.