Lost in the Lake? 5 Keys to Data Lake Success
I had a cup of coffee with EMC Chief Data Governance Officer Barbara Latulippe recently. We talked about how more and more people tell us they have access to analytical sandboxes attached to a Data Lake but still can’t find the information they need.
Is this a Data Governance problem? A Skill problem? A Technology problem? A Tools problem?
The answer is yes, it’s all of that!
When you build a Data Lake you most likely have structured and unstructured data in it. For this post I’m only going to talk about the structured data because it’s the fastest/easiest to get value from it and a larger audience will benefit.
Biggest Complaint: I can’t find my data!
Reply: “You have everything you need. Why are you complaining?”
So what’s the problem?
Ok, many of us are used to using reporting tools and having nice clean flat tables fed from an EDW/GDW database. Now I have thousands or more tables with very little connection. I blogged about his problem before, likening it to dumping a bag of Legos on your desk and saying “Here you go”.
Keys to Success
- Light Data Governance
- Data SMEs
- You need the help of your data SMEs to make some order of this chaos and then document record and explain what they did. These data SMEs are the wizards who can make the magic jump out of the lake. Capturing what they do and making it available to the masses is where the value starts piling up.
- Leverage your Reporting Tools for help – See if the Reporting tools can show you the SQL or get IT to help
- When you first start out, many people don’t know what columns to grab or what they are called because they are used to working with reporting tools. Many reporting tools can show you the XML or SQL being created when you grab the data.
- Focus on Team Skills
- When we first got the Data Lake we had some skills issues. Most of my team were BI people and needed to skill up on SQL and then Hadoop. Being totally honest, not everyone was able to make that transition and new hires were targeted with those skills.
- It’s important to partner with your IT teams and have regular knowledge sharing events. Both sides can benefit as you probably have the Data SME knowledge and they have more technical knowledge. The more you collaborate the better you understand each other’s needs and how to work more effectively.
- It’s hard work. Wishful thinking and complaining doesn’t make it better.
- Sorry I had to throw that in : ). Regular meetings with your IT teams on what is and isn’t working is key. These are not complaint sessions bashing IT. We show real use cases that we’re struggling to get going. Early on it may be access to data, just finding the data or query restrictions on your roles.
If you are on the journey or just thinking about getting a Data Lake, I hope you found this useful. Please let me know if you found any other lessons that enabled your success leveraging a Data Lake.