OneFS implements various types of internal locking mechanisms along with the means to resolve the locks. These mechanisms are used across end-user protocols, system jobs, and file system domains. There are four types of lock events that InsightIQ tracks which are important for performance monitoring:
The Locked File Systems Events Rate measures the number of locks the system has at any given time. The higher this number, the more work your cluster is performing. Every file access requires a lock, so the higher this number, the more files and more processing the cluster is performing.
The blocked and contended numbers are two aspects of the same phenomenon. When a thread requests access to a file that another thread has already locked, the new thread will block. This action causes the blocked count to increase while the thread that has the lock is notified that another thread is requesting access to the file. This result causes the contended count to increase. A high blocked number or contended number that is sustained for a long time can indicate that the workload is suboptimal. lightly shifting the workload or access may improve performance. Oftentimes, this contention is caused by internal cluster operations, such as taking snapshots or running SyncIQ simultaneously on the same dataset.
Providing a reasonable value for this metric is difficult as the workload influences how often a blocked event will occur. The number will scale with the number of nodes and total workload on the system.
The most serious issues occur with the deadlock type of events. OneFS tries to avoid deadlocks, although they are expected in certain situations by design in order to improve performance. Breaking a deadlock is expensive in that the threads that are deadlocked need to unwind their work and try again. The Deadlocked File System Events Rate counts the number of occurrences where a lock had to be forcibly broken by the system. Generally, you should have zero occurrences. A very small number that spans a large time period is not a cause for concern. A high rate that happens frequently signifies a potential issue.