During the startup incubation of Google, the founders realized that to revolutionize the efficiency and relevance of web search, they had to develop new computing tools.
Google needed both a new scale-out file system and a new scale-out computing platform to deal with:
The first publicly available descriptions of one method for overcoming those two challenges were published as public white papers in 2003 to 2004. The Yahoo researchers who developed the first versions of the Hadoop Distributed File System (HDFS) and the Hadoop MapReduce computing platform credit those early Google papers for the architecture foundations that started the Hadoop open-source initiative.
Cloudera has been delivering enterprise class data platforms since 2008. The original flagship product was the Cloudera Distribution for Apache Hadoop (CDH). As the scope of the Hadoop ecosystem has expanded, the core open-source components of CDH grew to include an impressive list of projects. The last production release of CDH (6.3.x) included the components that are listed in Table 1, and many more.
In addition to the source code contributions, integration, validation, and support of these open-source components of CDH, Cloudera has also developed many commercial add-on products that solve challenges required by a complete data platform. Cloudera Manager, Cloudera Navigator, and Cloudera Data Science Workbench add tools and services that provide additional or alternative value to what is available from the open-source community that many enterprise administrators and developers have adopted.
Then in 2011, 24 engineers from the original Hadoop team at Yahoo! formed a new data platform company. Hortonworks was founded with the belief that open-source, open standards, and open markets are the best approach to innovation and success. Hortonworks only distributed complete open-source Hadoop without additional proprietary software, in comparison to similar platform vendors Cloudera and MapR.
The company’s primary software offering was the Hortonworks Data Platform (HDP), built entirely upon Apache Hadoop. They used fee-based training and other support services for sustaining revenue. Hortonworks bundled many of the same Hadoop projects in their distribution but with some differences that are listed in table Table . HDP was widely adopted as an enterprise class Hadoop platform that maintained high standards for security and stability.
Table 2 shows some of the differences in approach of HDP and CDH for some of the key functions. Full details of CDP components, and the differences for users coming from either CDH or HDP, are described in Cloudera Data Platform.
In 2018, Cloudera and Hortonworks announced they would merge to form a single company. This merger completed in January 2019. Its goal is to produce the first enterprise data cloud, with a platform to support hybrid and multicloud deployments, and contain 100% open-source components. Cloudera Data Platform (CDP) Data Center, described in the following chapter, is the first release from the combined company. It integrates the best of Cloudera and Hortonworks technologies into an on-premises offering.