HDFS is a distributed file system that has proven to be a highly reliable, high-throughput storage option for big data that is primarily write once/read many. The name node for the cluster is responsible for tracking and storing metadata about the location of blocks across the distributed worker nodes.
The worker nodes write and read blocks of 128 MB by default. These data blocks are distributed across all the worker nodes in a Hadoop cluster. The following table shows that the storage configuration for our two worker nodes (DN1 and DN2) is the same with five VMFS volumes of 600 GB on each node:
Table 10. Cloudera Hadoop on PowerFlex
|
Cloudera Hadoop: Name node |
|
|
||
Volume name |
VM disk configuration |
Description |
Size (GB) |
Number of volumes |
Total size (GB) |
TM_VM_MS_OS |
VMFS |
Oracle Linux operating system |
800 |
1 |
800 |
TM_VM_MS_DATA |
VMFS |
HDFS storage |
500 |
1 |
500 |
|
Cloudera Hadoop: Worker nodes |
|
|
||
TM_VM_DN<1/2>_OS |
VMFS |
Oracle Linux operating system |
800 |
1 |
800 |
TM_VM_DN<1/2>_DATA |
VMFS |
HDFS storage |
600 |
5 |
3,000 |