Cloudera Runtime is the core open-source software distribution within CDP that Cloudera maintains, supports, versions, and packages as a single entity. Cloudera Runtime includes multiple open-source projects, including Apache components, connectors and encryption components, and other components from Cloudera. These components constitute the core distribution of data management tools within CDP.
Cloudera Manager is a web application that administrators and others can use to configure, manage, and monitor CDP clusters and Cloudera Runtime services. You can also use the Cloudera Manager API to programmatically perform management tasks.
CDP Private Cloud Base software components shows the major Apache software components that constitute Cloudera Runtime 7.1.7 SP2 for CDP Private Cloud Base, along with a brief description of each. For more information about all included components, including versions, see Cloudera Runtime Component Versions on the Cloudera documentation website.
The associated Data Management with Cloudera Data Platform Private Cloud Base Design Guides describe where these components are deployed across the various nodes.
Component | Description |
Apache Arrow | Arrow is a cross-language development platform for in-memory data. |
Apache Atlas | Atlas provides data governance capabilities for Hadoop. Atlas is also a common metadata store, which is designed to exchange metadata within and outside of the Hadoop stack. |
Apache Avro | Avro is a row-oriented remote procedure call and data serialization framework for Apache Hadoop. |
Apache Calcite | Calcite is a framework for building databases and data management systems. It includes a SQL parser, an API for building expressions in relational algebra, and a query planning engine. |
Apache Hadoop | Apache Hadoop is a framework that enables distributed processing of large datasets across clusters of systems, using simple programming models. Apache Hadoop is designed to scale out from single servers to thousands of servers. Hadoop also includes YARN for resource management and job scheduling and HDFS, the Hadoop Distributed File System. |
Apache HBase | HBase provides random, persistent access to data as a natively nonrelational database. HBase is ideal for scenarios that require real-time analysis and tabular data for end-user applications. |
Apache Hive | Hive is a data warehouse system for summarizing, querying, and analyzing huge, disparate datasets. |
Apache Impala | Impala provides high-performance, low-latency SQL queries on data stored in Apache Hadoop file formats. |
Apache Kafka | Kafka is a distributed and highly available event streaming platform. It is used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. |
Apache Knox | Knox is an application gateway for interacting securely with the REST APIs and user interfaces of one or more Hadoop clusters. |
Apache Kudu | Kudu combines fast inserts and updates, and efficient columnar scans, to enable multiple real-time analytic workloads across a single storage layer. Kudu provides fast analytics on fast data. |
Apache Livy | Livy is a service that enables easy interaction with a Spark cluster over a REST interface. |
Apache MapReduce | MapReduce is a software framework for writing applications that process vast amounts of data in-parallel on large clusters in a reliable, fault-tolerant manner. |
Apache Oozie | Oozie is a workflow and coordination service for managing Apache Hadoop jobs. |
Apache ORC | Optimized Row Columnar (ORC) is a self-describing, type-aware columnar file format designed for Hadoop workloads. |
Apache Ozone | Ozone is a scalable, redundant, and distributed object store optimized for big data workloads. |
Apache Parquet | Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the data processing framework, data model, or programming language. |
Apache Phoenix | Phoenix is an add-on for Apache HBase that provides a programmatic ANSI SQL interface. |
Apache Ranger | Ranger is a CDP security component that enables you to control access to CDP services. Ranger also provides access auditing and reporting. |
Apache Solr | Solr provides natural language access to data stored in, or ingested into, Hadoop, HBase, or cloud storage. |
Apache Spark | Spark is a distributed, in-memory data processing engine designed for large-scale data processing and analytics. |
Apache Sqoop | Sqoop is a CLI-based tool for bulk transfers of data between relational databases and HDFS or cloud object stores. |
Apache Tez | Tez is an extensible framework for building high-performance batch and interactive data processing applications, which YARN coordinates in Apache Hadoop. |
Apache YARN | YARN is the processing layer for managing distributed applications that run on multiple machines in a network. |
Apache Zeppelin | Zeppelin is a multipurpose, web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more. |
Apache ZooKeeper | ZooKeeper is a centralized service that enables highly reliable, distributed coordination, including maintaining configuration information, naming, and providing distributed synchronization and group services. |