Battle on: MapR, Cloudera pimp their Hadoop products
The fight for Hadoop dominance is officially on. The unveiling of Yahoos Hadoop spinoff Hortonworks will undoubtedly be the talk of todays Hadoop Summit, but its not the only game in town. In fact, while Hortonworks is busy answering questions about its product strategy, Cloudera and MapR will demonstrate new versions of their distributions overflowing with bells and whistles.
I wrote yesterday about the importance of new tools designed to improve the Hadoop experience at a level above the distribution layer, but the distribution the underlying code base that defines Hadoops core architecture and capabilities is still king. Apache Hadoop is a set of open source tools designed to enable the storage and processing of large amounts of unstructured data across a cluster of servers. Chief among those tools are Hadoop MapReduce and the Hadoop Distributed File System (HDFS), but there are numerous related ones, including Hive, Pig, HBase and ZooKeeper.
Most vendors try to distinguish their Hadoop distributions with MapReduce and HDFS. Some will try to tweak the core Apache features and architectures, while others will replace one component generally HDFS altogether.
EMC and IDC released their Digital Universe study this week, estimating that well create 1.8 zettabytes of data this year and that data growth is outpacing Moores Law. Now that weve realized theres value in all that information, were anxious to capture, analyze and use it, and that requires more and better big data technology. As this diagram from Karmasph! ere illustrates, Hadoop is a very large part of the big data stack, which means were just getting started.
So many distributions, so little time
Cloudera: Cloudera, whose CDH was the first commercial Hadoop distribution, takes the approach of taking the full complement of available open source components and integrating them into an enterprise-grade product. Its value isnt so much in improving Hadoop as it is in making everything from Hadoop MapReduce to its own Sqoop (SQL to Hadoop) tool work well together out of the box.
Cloudera actually released CDH version 3.5 recently, but today it released a bunch of new features for its Cloudera Enterprise product, a suite of management tools designed to make it easier to operate CDH clusters. The coolest has to be something called SCM Express, which makes getting started with Hadoop easier. Clouderas Charles Zedlewski explained that SCM Express is a free tool that lets users provision and launch up to a 50-node Hadoop cluster in about six clicks.
MapR: However,Cloudera has lots of company, including the brand n! ew MapR. That startup just released its first two products today a free Hadoop distribution called M3 and a paid distribution called M5. MapR takes the Cloudera approach of integrating the entire spectrum of Hadoop tools into its distribution and including management functionality, but it also has made a number of significant changes to the MapReduce and HDFS components to improve performance.
MapRs Jack Norris says the result is probably the most comprehensive distribution, which performs two to five times faster than the standard Apache Hadoop. A majority of MapRs changes are to the storage layer, which it has reworked to be faster, easier, more reliable and more scalable than HDFS.
You cant talk about MapR without talking about EMC, which announced last month that the Enterprise Edition of its Greenplum HD Hadoop distribution will be powered by MapR. Norris explained to me that the product, available later this year, will utilize MapRs M5 version, which includes advanced storage capabilities around high availability and data protection. However, EMCs line of Greenplum HD distributions, which also includes a free Community Edition, is actually centered around the specialized Hadoop code developed by and running within Facebook.
Of course, Hortonworks isnt to be discounted, nor are DataStax with its Cassandra-based Brisk distribution or IBM, which has been promising its own Big-Blue-style Hadoop distribution for some time. But the most interesting th! ing abou t all this Hadoop activity might be the pace of it: as of mid-March, Cloudera stood alone as a commercial Hadoop provider. Now it has four competitors with more likely to come.
Feature image courtesy of Flickr user Joi.
Related content from GigaOM Pro (subscription reqd):
- Infrastructure Q1: IaaS Comes Down to Earth; Big Data TakesFlight
- Putting Big Data to Work: Opportunities forEnterprises
- Defining Hadoop: the Players, Technologies and Challenges of2011
Comments