During the startup incubation of Google, the founders realized that to revolutionize the efficiency and relevance of web search, they had to develop new computing tools.
Google needed both a new scale-out file system and a new scale-out computing platform to deal with:
- The number of URLs that existed on the Internet in the early 2000s
- The complexity of analyzing the interpage linking relationships
The first publicly available descriptions of one method for overcoming those two challenges were published as public white papers in 2003 to 2004. The Yahoo researchers who developed the first versions of the Hadoop Distributed File System (HDFS) and the Hadoop MapReduce computing platform credit those early Google papers for the architecture foundations that started the Hadoop open-source initiative.