Cloudera Hadoop storage configuration

Thank you for your feedback!

HDFS is a distributed file system that has proven to be a highly reliable, high-throughput storage option for big data that is primarily write once/read many. The name node for the cluster is responsible for tracking and storing metadata about the location of blocks across the distributed worker nodes.

The worker nodes write and read blocks of 128 MB by default. These data blocks are distributed across all the worker nodes in a Hadoop cluster. The following table shows that the storage configuration for our two worker nodes (DN1 and DN2) is the same with five VMFS volumes of 600 GB on each node:

Table 10. Cloudera Hadoop on PowerFlex

	Cloudera Hadoop: Name node
Volume name	VM disk configuration	Description	Size (GB)	Number of volumes	Total size (GB)
TM_VM_MS_OS	VMFS	Oracle Linux operating system	800	1	800
TM_VM_MS_DATA	VMFS	HDFS storage	500	1	500
	Cloudera Hadoop: Worker nodes
TM_VM_DN<1/2>_OS	VMFS	Oracle Linux operating system	800	1	800
TM_VM_DN<1/2>_DATA	VMFS	HDFS storage	600	5	3,000

Your Browser is Out of Date

Cloudera Hadoop storage configuration

Cloudera Hadoop storage configuration