Yahoo Releases Its Own Hadoop Distribution
Hadoop is a distributed file system and parallel execution environment that enables its users to process massive amounts of data.
In response to frequent requests from the Hadoop community, Yahoo! is opening up its investment in Hadoop quality engineering to benefit the larger ecosystem and to increase the pace of innovation around open and collaborative research and development.
The Yahoo! Distribution of Hadoop has been tested and deployed at Yahoo! on the largest Hadoop clusters in the world.
Hadoop is free Java software framework born out of an open-source implementation of Google’s published computing infrastructure and fostered within the Apache Software Foundation.
Yahoo! has been the primary developer and contributor to Apache’s Hadoop.
In 2006, Hadoop founder Doug Cutting joined Yahoo, which provided a dedicated team and resources, to lead the project of developing the open-source software and turn Hadoop into a system that ran at web scale. Today, Yahoo! is running the largest Hadoop cluster in the world, which includes more than 25,000 servers and provides the framework for many Yahoo properties including Yahoo Search, Yahoo Mail, and several content and ad services.
Yahoo says its opening up the source code to Hadoop to “benefit the larger ecosystem increase the pace of innovation around open and collaborative research and development.”. As Nigel Daley, Quality and Release Engineering Manager at Yahoo! Grid Technologies, summarizes:
Hadoop is helping us solve key science and research problems in hours or days instead of months. It provides us a platform to solve extreme problems requiring massive amounts of data processing. It underpins major revenue-generating systems. Opening our distribution enables a faster pace of innovation for the entire Hadoop ecosystem and broadens the use €” and ultimately the quality €” of this key platform across the industry.