The Loch Ness Monster Has Been Discovered, and It Lives in a Data Lake!

Topics in this article

I have read more papers, attended more training, and spoken to more people than I can count on the concept of Data Lakes and analytics.  Is it me, or is it still just a little perplexing?  I tend to think of everything in the most simplest of terms.  Data goes in, data gets blended, and information (we hope) comes out the other side of the equation.  I call it the “circle of data life”.

Its formula consists as:

“DATAGOZIN” + “DATAGOZAROUND” + “DATAGOZOUT” = “DATAMAGICHAPPENS”

No reasonable discussion of a Data Lake concept can begin without discussing what scientists call the two main types of data: structured and unstructured.  However, I believe there is third category; I have dubbed it ‘yet to be defined’.

I see it this way – structured data is known before the DATAGOZIN step, or at least the fields you want to collect are known.  Unstructured data is not known until processing occurs in the DATAGOZAROUND step.    Both are raw materials that are turned into finished goods in the DATAGOZOUT step.  However, the ‘yet to be defined’ data type is just waiting to have tools processed against it to turn it into a meaningful finished product.  Think of this data as electricity in the 1800’s.  We were not exactly sure what we would use it for, so clearly it fell into the ‘yet to be defined’ category.  I put the Data Lake into this same category.

Its formula consists as:

“DATAGOZIN” + “DATASTAYZAROUND” + “Yet to be defined” = “DATAMAGICHAPPENS”

We tend to discuss a Data Lake concept via the products that act upon data in the various stages.  For example, DATAGOZIN consists of products like Pivotal HD, Greenplum, and Hadoop.  These technologies store the data and preprocess it to remove the impurities that raw materials sometimes have.  Entering the DATAGOZAROUND step begins processing products such as Map Reduce, HIVE and HAWQ.  DATAGOZOUT causes some decision points.  This is determined by just how important SPEED is to your arrival at your DATAMAGICHAPPENS.  Products such as GemFire are for in-memory processing.  Several other products such as Pivotal CF, MongoDB, and others play a role when there is some processing delay that is tolerable in this step.

Its formula consists as:

“DATAGOZIN” + “DATAIZACCELERATED” + “DATAGOZOUT” = “DATAMAGICHAPPENS”

So what does this all mean?  We want the DATAMAGICHAPPENS nirvana.  Simply put, we are turning raw materials into a finished product with the ability to modify the assembly line of information, in real-time, while the consumer is placing items in their shopping cart.   Why do we have stores opening before Black Friday, before Cyber Monday and before “I have spent almost all of my Christmas budget” Tuesday?  It’s about agility, speed, being nimble, and making decisions while the data is most valuable.  Think of the semi-annual Sears Catalog as platform one, the weekly newsprint ads as platform two, and online sales campaigns as platform three. We are now entering the fourth platform of DURING.

What’s next you ask?  We hone our algorithms of predictive analytics and process flow dynamics.  We get smarter, enter the realm of artificial intelligence, AND we build a product or two that use electricity along the way.  And you never know – we may just find there’s an undiscovered creature swimming in the depths of that data lake.

About the Author: Chris Gaudlip

As chief technology officer (CTO) for Dell Technologies Managed Services, Chris Gaudlip provides visionary leadership for Dell Technologies Managed Services customers. Chris brings 25 years of experience at Electronic Data Systems (EDS) and Perot Systems to his role at Dell Technologies. His accomplishments include pioneering Dell EMC Proven Certifications, filing multiple pending and approved patents for his innovations, and designing solutions for Fortune 500 customers. He was recognized for his achievements by being selected as an Dell EMC Distinguished Engineer – Lead Technologist in 2011. In his current role, Chris is actively involved in Dell Technologies sales efforts, technical validations, and directing the future endeavors of Managed Services. He is the customer liaison and advisory consultant for the Managed Services offerings. Dell Technologies' customers look to him as a trusted advisor. When not traveling or reading up on the latest technologies, he can be found at his favorite hunting and fishing spots.
Topics in this article