Boomi DCP is compatible with both Hortonworks and Cloudera Hadoop as data sources and data sinks, but only Cloudera Hadoop for Boomi Data Prep backend (ETL operations). The Boomi node can be added to Cloudera at the time of cluster install, or later. Boomi has a multitude of adapters that enable read and write operations. The Hadoop services that are used in this manner include those services that are listed in Hadoop services.
Table 6. Hadoop services
|Cloudera Distribution for Apache Hadoop (CDH) HDFS||5.3.0, 5.7, 5.12, 6.0|
|CDH Hive||5.3.0, 5.7, 5.12, 6.0|
|Hortonworks Data Platform (HDP) HDFS||2.4, 2.6|
|HDP Hive||2.4, 2.6|
|Apache HDFS||2.5.0, 2.6.0, 2.7.1, 2.7.3|
|Apache Hive||0.13.1, 1.1.0, 1.2.1|
|Amazon EMR HDFS||5.8.0, 5.11|
|Amazon EMR Hive||5.8.0, 5.11|
Required Cloudera Hadoop services on the Boomi DCP node that are used for Boomi Data Prep backend (ETL operations) include:
- HDFS Gateway
- Hive Gateway
- Sqoop 1 Client Gateway
- YARN (MR2 included) Gateway
Required services on the cluster data nodes include:
- HDFS DataNode
- Hive Gateway
- YARN (MR2 included) NodeManager
All these services can be defined either at the time the cluster is installed, or later. Conceptually, the Boomi node becomes a member of the cluster for access, but not as a Hadoop storage or processing node. Boomi uses the gateway functions to perform the ETL operations.
Boomi creates jobs on the Hadoop cluster to:
- Read the data into HDFS.
- Place it in Hive.
- Perform various operations.
- Write the result out to the selected destination.
Data processing does not occur on the Boomi node, but on the source, destination, and Hadoop cluster.