Home > Storage > PowerScale (Isilon) > Product Documentation > Storage (general) > Dell PowerScale: Considerations and Best Practices for Large Clusters > Cluster composition and group state
One of the most significant impacts to a large cluster’s workflow at large scale is the effect of group changes resulting from the addition, removal, or rebooting of a node, or other hardware failure or transience. Having the ability to understand a cluster’s group state and changes is an invaluable tool when administering and managing large clusters. It allows you to determine the current health of a cluster, as well as reconstruct the cluster's history when troubleshooting issues that involve cluster stability or network health.
The primary role of the OneFS Group Management Protocol (GMP) is to help create and maintain a group of synchronized nodes. A group is a given set of nodes which have synchronized state, and a cluster may form multiple groups as connection state changes. GMP distributes a variety of state information about nodes and drives, from identifiers to usage statistics. The most fundamental of these is the composition of the cluster, or ‘static aspect’ of the group, which is managed by the isi_boot_d daemon and stored in the array.xml file.
Similarly, the state of a node’s drives is stored in the drives.xml file, along with a flag indicating whether the drive is an SSD. Whereas GMP manages node states directly, drive states are actually managed by the ‘drv’ module, and broadcast through GMP. A significant difference between nodes and drives is that for nodes, the static aspect is distributed to every node in the array.xml file, whereas drive state is only stored locally on a node.
A group change operation, based on GMP, is a coherent way of changing the cluster-wide shared state. Merge is the group change operation for addition of nodes. Merges affect cluster availability due to their need to pause any file system operations for the duration of the operation. The array.xml information is needed by every node in order to define the cluster and allow nodes to form connections. In contrast, drives.xml is only stored locally on a node. If a node goes down, other nodes have no method to obtain the drive configuration of that node. Drive information may be cached by the GMP, but it is not available if that cache is cleared.
Conversely, ‘dynamic aspect’ refers to the state of nodes and drives which may change. These states indicate the health of nodes and their drives to the various file system modules - plus whether or not components can be used for particular operations. For example, a soft-failed node or drive should not be used for new allocations. These components can be in one of the following seven states.
Component state | Description |
UP | Component is responding |
DOWN | Component is not responding |
DEAD | Component is not allowed to come back to the UP state and should be removed. |
STALLED | Drive is responding slowly. |
GONE | Component has been removed. |
Soft-failed | Component is in the process of being removed. |
Read-only | This state only applies to nodes. |
A node or drive may go from ‘down, soft-failed’ to ‘up, soft-failed’ and back. These flags are persistently stored in the array.xml file for nodes and the drives.xml file for drives.
Group and drive state information allows the various file system modules to make timely and accurate decisions about how they should use nodes and drives. For example, when reading a block, the selected mirror should be on a node and drive where a read can succeed (if possible). File system modules use the GMP to test for node and drive capabilities, which include the following:
Description | |
Readable | Drives on this node may be read. |
Writable | Drives on this node may be written to. |
Restripe From | Move blocks away from the node. |
Access levels help define ‘as a last resort’ with states for which access should be avoided unless necessary. The access levels, in order of increased access, are as follows.
Access Level | Description |
Normal | The default access level. |
Read Stalled | Allows reading from stalled drives. |
Modify Stalled | Allows writing to stalled drives. |
Read Soft-fail | Allows reading from soft-failed nodes and drives. |
Never | Indicates a group state never supports the capability. |
Drive state and node state capabilities are shown in the following tables. As shown, the only group states affected by increasing access levels are soft-failed and stalled.