Home > Data Protection > Data Protection (general) > Dell Hadoop Application Agent: Hadoop Protection > Overview
Hadoop is a clustered distributed system that is used to process massive amounts of data. It provides software and a framework for distributed storage and processing of big data, using various distributed processing models.
Data that is stored in Hadoop clusters is often essential to business operations. Hadoop is often a repository for data that resides in existing data warehouses or transactional systems, so the data can be reloaded. However, social media data, machine learning models, logs, third-party feeds, open APIs, IoT data, and data from other sources might not be reloadable, easily available, or in the enterprise at all. This type of critical single-source data must be backed up and stored forever. One of the challenges with protecting Hadoop datasets is the scale of the problem. Many solutions rely on suboptimal protocols (such as NFS) and techniques (backup using NFS gateway node, for example) to protect Hadoop data that does not scale.
Dell Technologies has developed a Hadoop File System driver for PowerProtect DD series appliances that allows Hadoop data management functions to transparently use PowerProtect DD to store and retrieve data efficiently. This driver is called DD Hadoop Compatible File System (DDHCFS).
This white paper describes how PowerProtect DD series appliances can be integrated with a script-based Hadoop app agent solution for protecting Hadoop workloads.