Strata + Hadoop World 2015 Big Data Take Aways

Bill Schmarzo By Bill Schmarzo February 27, 2015

Lots happened at this year’s Strata + Hadoop World conference.  Here are a few of my take-aways. Click here for additional resources.

Hadoop Wars


Heck, the conference had barely started when cannon blasts were being volleyed in the “Who owns the Hadoop market?” debate after the Open Data Platform (ODP) announcements from Pivotal, Hortonworks and others. Mike Olson was quoted as saying “I learned then that code trumps cash.”

I’m not sure I agree with Mike’s comment. In many cases, business models trump code.  Microsoft dominated the PC market over the Apple Macintosh not with better code, but with a better business model (see Figure 1).


Figure 1:  PC versus Mac Market Share

While the Apple Macintosh was a much more elegant machine with better integration between software packages and much stronger integration with the underlying hardware platform, Microsoft scored the killer punch by attacking Apple’s business model.  Microsoft charged PC hardware manufacturers a ridiculously low price for copies of their OS, but only when the manufacturer agreed to bundle the OS with EVERY PC shipment.  And that “bundle the Microsoft OS with every PC shipped” ended up being the key, because what motivation did the PC manufacturer to pay extra for another OS when they were being charged for the Microsoft OS on every PC shipped.  Brilliant!!

My friend Amr Awadallah from Cloudera made it sound like it was Open Data Platform (ODP) versus Apache in his Thursday morning keynote, which it clearly is not.  While I am certainly not an expert on either ODP or Apache, they do appear to be addressing different needs and requirements.  Apache is about open source software and ODP is about open business models (see my recent blog “An Executive Mandate: Think Open Business” for more about the power of open business models).

Hadoop-based Tools Are Everywhere

If you really want to know what is happening at these types of shows, walk around the edges of the exhibition hall.  I know that all the excitement, best tchotchkes and glamorous booth dudes / booth babes are located in the middle of the exhibition hall, but there are many interesting new products being exhibited “along the edges.”  Many of these companies don’t yet have the financials to buy that “expensive, middle of the exhibit hall” real estate.  Here are a few of my favorites:

  • H2O – H2O is an open source parallel processing engine for machine learning.  It appears that they are trying to address some of the enterprise scalability issues of R with a more robust, enterprise scale analytics platform. H2O is for data scientists and application developers who need fast, in-memory scalable machine learning for smarter applications. Unlike traditional analytics tools, H2O provides a combination of extraordinary math, a high performance parallel architecture, and unrivaled ease of use.  I’m routing for these guys because we need more advanced, open analytic options!
  • DataRPM – (Note:  I’m on the DataRPM board so I’m very pro DataRPM).  DataRPM uses machine intelligence to automatically derive meaningful insights from Hadoop. Smart Machine technology automatically analyzes your data; reveals hidden patterns and anomalies; and presents inferences.  Think about it as a jumpstart for your data science team in that DataRPM uses smart machine technology to uncover and quantify correlations buried in the data.
  • (acquired by Cloudera) – analyzes your SQL to give a comprehensive view of your most important KPIs, data models, and data cross sections. Customers can then use this knowledge to migrate your SQL code to a Hadoop environment (Hive, HBase).  Very smart product in that it converts one of the by-products of BI (the SQL) into an asset that can be mined and more easily converted to a Hadoop platform (with Cloudera’s CDH being the priority now).
  • AtScale (OLAP on Hadoop) – Hadoop is not just for developers and data scientists. The AtScale Analytics Platform delivers dynamic, interactive cubes on top of the Hive-based SQL engines.  AtScale really provides a smart BI layer between Hadoop and your favorite BI tools.  Damn, I’ve got to think that every BI tool vendor would love these guys!!

Every year, Strata gets better and better – more relevant use cases and better vendors in the exhibit hall.

Also, in case you missed the show, here’s a link to some other observations and relevant content. Can’t wait for next year!!

Bill Schmarzo

About Bill Schmarzo

Read More

Share this Story
Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *