Home > Data Protection > Data Protection (general) > Dell Hadoop Application Agent: Hadoop Protection > Hadoop app agent installation
The Dell Hadoop App agent can be downloaded from the Dell.com site and is available as a tar file (hadoopappagent_linux_x86_64.tar.gz).
Copy the tar file you have downloaded to any node in the Hadoop cluster. (The node must have the appropriate roles to run Distcp at the cluster level.) It must be installed by a user who has superuser access on HDFS.
Run the following commands to un-tar the package to any directory on the node.
# su - <username>
# tar zxvf hadoopappagent_linux_x86_64.tar.gz
After the agent is installed, we must configure the backup parameters using the file dlpm-env.cfg. We can create multiple copies of this configuration file. This is the main configuration file in which we specify parameters as shown in the following figure:
The backup of the HDFS filesystem can be performed for a single directory or for multiple directories at the same time.
s=/<HDFS_directory> t=/<DD_target_directory>
Specifies the maximum number of files in each container.
Maximum container size specified in GB which will be created under the DD target directory.
Values can range from 1 to 14
Used to enable retention lock (Governance mode) on the PowerProtect DD.
Decides if the staging directory created during the backup process on the DD will be deleted at the end of the backup.
Next section contains parameters to enable Kerberos for HDFS.
If Kerberos is used, set the variable HDFSKERBEROS to TRUE along with suitable values for other parameters in this section. Additional steps may be required for Kerberos enabled setup, such as generating a Kerberos keytab file to run Hadoop commands in unattended mode.
See the Dell Hadoop Application Agent User Guide for more information about setting up Kerberos.
In the next mandatory section, we will configure the PowerProtect DD information and look at other configurable parameters for Distcp.
In the next section, we will set the parameters for Distcp.
For detailed information about configuring Distcp, see the section called “Distcp Parameters” in the Dell Hadoop app agent user guide or the Hadoop DistCp Guide for additional information.
For backup of HBase tables and namespace, configure the dlpm-env.cfg file with the following values:
Set this value to TRUE to enable HBase backups.
Specify the HBase table name to back up to PowerProtect DD. Used if only a single HBase table needs to be backed up.
s=/<HBase_Table> t=/<DD_target_directory>
Set this value to TRUE if the snapshots created as part of the backup must be retained after the backup.
The configuration file also contains parameters to enable Kerberos for HBase backups.
If Kerberos is used, set the variable HBASEKERBEROS to TRUE along with suitable values for other parameters in this section. Additional steps may be required for Kerberos enabled setup, such as generating a Kerberos keytab file to run Hadoop commands in unattended mode.
See the Dell Hadoop Application Agent User Guide for more information about setting up Kerberos.