Chapters
Executive summary
Cluster definitions
Considerations for planning and designing a large cluster
Workload analysis
Large cluster health check
Cluster administration and management considerations
OneFS software considerations
Layout, protection, and failure domains
Data layout and tiering recommendations
Multi-tenant recommendations
Job engine recommendations
Data availability, protection, and disaster recovery considerations
Best practices checklist
Summary
Executive summary
Cluster definitions
Considerations for planning and designing a large cluster
Workload analysis
-
1Workload analysis
-
2How does the application work?
-
3How do users interact with the application?
-
4What does the network topology look like?
-
5What is the application’s performance profile?
-
6What are the application’s I/O requirements?
-
7What is the disk latency?
-
8How much CPU utilization?
-
9When to analyze workloads?
Large cluster health check
Cluster administration and management considerations
OneFS software considerations
-
1Cluster composition and group state
-
2Minimum access level for capabilities per node state
-
3Minimum access level for capabilities per drive state
-
4Understanding and analyzing group membership
-
5Identifying accelerator nodes
-
6SmartFailed and down nodes
-
7SmartFailed and down drives
-
8Read-only nodes
-
9Dead nodes
-
10Dead drives
-
11SmartFailed and stalled drives
-
12Reading group sequence numbers
-
13Group changes
-
14Interpreting group changes
-
15Constructing an event timeline
-
16Extra-large cluster example