Exclusive: Yahoo launching Hadoop spinoff this week

Yahoo will be spinning off a separate company focused on the development and commercialization of Apache Hadoop, called HortonWorks. The official announcement likely will come tomorrow or Wednesday to coincide with Yahoos annual Hadoop Summit, but rumors have been circulating for months and I confirmed the news today with a source familiar with the project.

As the originator of the Hadoop technology, Yahoos official entry into this space should play a big role in shaping how the market of Hadoop-based products evolves.

Yahoos HortonWorks (as in the Dr. Suess book Horton Hears a Who, a reference to the elephant logo that Apache Hadoop bears) will be comprised of a small team of Yahoos Hadoop engineers and will focus on developing a production-ready product based on the Apache Hadoop project, the set of open source tools designed for processing huge amounts of unstructured data in parallel. Its a natural step for Yahoo, which uses Hadoop heavily within its own web operations, and which has contributed approximately 70 percent of the code to Apache Hadoop since the projects inception.

By incorporating next-generation features and capabilities, HortonWorks hopes to make Hadoop easier to consume and better suited for running production workloads. Its products, which likely will include higher-level management tools on top of the core MapReduce and file system layers, will be open source and HortonWorks will try to maintain a close working relationship with Apache. The goal is to make HortonWorks the go-to vendor for a ! producti on-ready Hadoop distribution and support, but also to advance Yahoos repeated mission of making the official Apache Hadoop distribution the place to go for core software. Earlier this year, Yahoo discontinued its own Hadoop distribution, recommitting all that code and all its development efforts to Apache.

The introduction of HortonWorks means that other companies peddling Hadoop-based products cant rest on their laurels. Cloudera, which pioneered commercial Hadoop, and EMC, which just launched its own set of Hadoop tools a community version based on Facebooks optimized Hadoop code, and an enterprise version leveraging MapRs technology are now on notice. HortonWorks differs from Cloudera because HortonWorks is more involved in software development, and the spinouts tight alliance with Apache renders it distinct from the EMC products. Yet, HortonWorks will have to ensure it advances Hadoop development across industry lines and not just in a manner optimized for Yahoos webscale needs if it wants to gain adoption.

Despite all the talk about Hadoop, evidence suggests a presently paltry revenue base for the software HortonWorks, Cloudera and EMC peddle. Cloudera is leading the charge right now with what Ive heard is a few million in annual revenue, but thats hardly enough to sustain the amount of investment in Hadoop. Cloudera alone has raised $36 million, VCs have funded a number of other Hadoop-focused startups, and companies such as EMC and IBM are funding Hadoop strategies from their own coffers. Everyone with a stake in the outcome of Hadoop envisions a billion-dollar opportunity, so seeing how, or if, these companies are able to split the market and share revenue at least three ways makes this a fun race to watch. They also face increased competition from Hadoop alternatives such as LexisNexis spinoff HPCC Systems and Microsofts forthcoming Dryad tools.

HortonWorks will be a joint venture between Yahoo and an investor, presumably Benchmark Capital. The Wall Street Journal reported in May that Benchmark was in talks with Yahoo about how to handle launching the new company.

Related content from GigaOM Pro (subscription reqd):


Comments

Popular posts from this blog

China Watch: Magical New Maglev, Fire the Ambassador?

Live Blog: GMIC G-Startup Competition 2011

Chinese Pinterest Huaban.com Grabs Money and Attention