Home > Data Protection > Data Protection (general) > Dell Hadoop Application Agent: Hadoop Protection > Architecture overview
Hadoop clusters have a topology as shown in the following image, where there can be one or more NameNodes and multiple DataNodes.
The NameNode is the main node and does not store the actual data. It contains metadata, just like a log file or similar to a table of contents. It therefore requires less storage and less high computational resources.
There are n number of DataNodes (where n can be up to 1000) in the Hadoop Distributed File System that manage storage of data. These data nodes are the actual worker nodes that do the tasks and serve read and write requests from the file system’s clients.
The Hadoop app agent must be installed on only one node. The agent scripts along with DDHCFS (Data Domain Hadoop Compatible File System) and dependent libraries are distributed to participating data nodes dynamically using the Distcp MapReduce job.
Note: In this example, we have installed the Hadoop App agent on the NameNode, but any node that has appropriate roles needed to run distcp can be used to start the backup. (Distcp always runs at cluster level.)