CDP Private Cloud Base components

Cloudera Runtime is the core open-source software distribution within CDP that Cloudera maintains, supports, versions, and packages as a single entity. Cloudera Runtime includes multiple open-source projects, including Apache components, connectors and encryption components, and other components from Cloudera. These components constitute the core distribution of data management tools within CDP.

Cloudera Manager is a web application that administrators and others can use to configure, manage, and monitor CDP clusters and Cloudera Runtime services. You can also use the Cloudera Manager API to programmatically perform management tasks.

CDP Private Cloud Base software components shows the major Apache software components that constitute Cloudera Runtime 7.1.7 SP2 for CDP Private Cloud Base, along with a brief description of each. For more information about all included components, including versions, see Cloudera Runtime Component Versions on the Cloudera documentation website.

The associated Data Management with Cloudera Data Platform Private Cloud Base Design Guides describe where these components are deployed across the various nodes.

Table 2. CDP Private Cloud Base software components
Component	Description
Apache Arrow	Arrow is a cross-language development platform for in-memory data.
Apache Atlas	Atlas provides data governance capabilities for Hadoop. Atlas is also a common metadata store, which is designed to exchange metadata within and outside of the Hadoop stack.
Apache Avro	Avro is a row-oriented remote procedure call and data serialization framework for Apache Hadoop.
Apache Calcite	Calcite is a framework for building databases and data management systems. It includes a SQL parser, an API for building expressions in relational algebra, and a query planning engine.
Apache Hadoop	Apache Hadoop is a framework that enables distributed processing of large datasets across clusters of systems, using simple programming models. Apache Hadoop is designed to scale out from single servers to thousands of servers. Hadoop also includes YARN for resource management and job scheduling and HDFS, the Hadoop Distributed File System.
Apache HBase	HBase provides random, persistent access to data as a natively nonrelational database. HBase is ideal for scenarios that require real-time analysis and tabular data for end-user applications.
Apache Hive	Hive is a data warehouse system for summarizing, querying, and analyzing huge, disparate datasets.
Apache Impala	Impala provides high-performance, low-latency SQL queries on data stored in Apache Hadoop file formats.
Apache Kafka	Kafka is a distributed and highly available event streaming platform. It is used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Apache Knox	Knox is an application gateway for interacting securely with the REST APIs and user interfaces of one or more Hadoop clusters.
Apache Kudu	Kudu combines fast inserts and updates, and efficient columnar scans, to enable multiple real-time analytic workloads across a single storage layer. Kudu provides fast analytics on fast data.
Apache Livy	Livy is a service that enables easy interaction with a Spark cluster over a REST interface.
Apache MapReduce	MapReduce is a software framework for writing applications that process vast amounts of data in-parallel on large clusters in a reliable, fault-tolerant manner.
Apache Oozie	Oozie is a workflow and coordination service for managing Apache Hadoop jobs.
Apache ORC	Optimized Row Columnar (ORC) is a self-describing, type-aware columnar file format designed for Hadoop workloads.
Apache Ozone	Ozone is a scalable, redundant, and distributed object store optimized for big data workloads.
Apache Parquet	Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the data processing framework, data model, or programming language.
Apache Phoenix	Phoenix is an add-on for Apache HBase that provides a programmatic ANSI SQL interface.
Apache Ranger	Ranger is a CDP security component that enables you to control access to CDP services. Ranger also provides access auditing and reporting.
Apache Solr	Solr provides natural language access to data stored in, or ingested into, Hadoop, HBase, or cloud storage.
Apache Spark	Spark is a distributed, in-memory data processing engine designed for large-scale data processing and analytics.
Apache Sqoop	Sqoop is a CLI-based tool for bulk transfers of data between relational databases and HDFS or cloud object stores.
Apache Tez	Tez is an extensible framework for building high-performance batch and interactive data processing applications, which YARN coordinates in Apache Hadoop.
Apache YARN	YARN is the processing layer for managing distributed applications that run on multiple machines in a network.
Apache Zeppelin	Zeppelin is a multipurpose, web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.
Apache ZooKeeper	ZooKeeper is a centralized service that enables highly reliable, distributed coordination, including maintaining configuration information, naming, and providing distributed synchronization and group services.

Your Browser is Out of Date

CDP Private Cloud Base components