Thinking Like a Data Scientist Part I: Understanding Where To Start
One question I frequently get is: “How do I become a data scientist?” Wow, tough question. There are several new books that outline the different skills, capabilities and technologies that a data scientist is going to need to learn and eventually master. I’ve read several of these books and am impressed with the depth of the content.
Unfortunately, these books spend the vast majority of their time reviewing and/or teaching things such as the data science processes (such as CRISP: Cross Industry Standard Process for Data Mining), and basic and advanced statistics, data mining and data visualization techniques and tools.
Yes, these are very important data science skills, but they are not nearly sufficient to make our data science teams effective. The data science teams still need help from the business users – or subject matter experts (SME) – to understand the decisions the business is trying to make, the hypotheses that they want to test and the predictions that they need to produce in support of those decisions and hypotheses. In essence, to improve the overall effectiveness of our data science teams, we need to teach the business users to think like a data scientist.
So the objective of this blog (which if successful, will make its way into my Big Data MBA curriculum for the University of San Francisco School of Management fall semester) is to define a process that helps business users to “think like a data scientist.”
Thinking Like A Data Scientist Process
The goal of the “thinking like a data scientist” process is to identify, brainstorm and/or uncover new variables that are better predictors of business performance. But “business performance” of what? Our key business initiative, of course.
Step 1: Identify Key Business Initiative. Would you expect anything different from me than starting with what’s important to the business? So, how can you spot a key business initiative?
A key business initiative is characterized as:
- Critical to the immediate-term performance of the organization
- Documented (communicated either internally or publicly)
- Cross-functional (involves more than one business function)
- Owned/championed by a senior business executive
- Has a measurable financial goal
- Has a well-defined delivery timeframe (9 to 12 months)
- Undertaken to delivery significant, compelling and/or distinguishable financial or competitive advantage
I am a big stickler about targeting business initiatives that are focused on the next 9 to 12 months. Anything longer than 12 months can quickly digress into a “Battlestar Galactica” or “cure world hunger” project that may have incredible business value, but little chance of success.
For a refresher on how to identify an organizations key business initiatives, read my blog “Big Data MBA: Reading the Annual Report for Big Data Opportunities.” That blog outlines how to leverage publicly available information (e.g., annual reports, analyst calls, executive speeches, company blogs, SeekingAlpha.com) to uncover an organization’s key business initiatives.
For purposes of this exercise, I’m going to pretend that our client is Foot Locker, and that our target business initiative is “Improve Merchandising Effectiveness” as highlighted in their annual report (see Figure 1).
Step 2: Identify Strategic Nouns. Strategic nouns are the key business entities that either impact or are impacted by the organization’s key business initiative. These strategic nouns are critical to our data scientist thinking process because these are the entities for which we want to uncover or gain new, actionable insights, and around which we will ultimately build our analytic profiles. Examples of strategic nouns include customers, patients, students, employees, stores, products, medication, trucks, wind turbines, etc.
For the Foot Locker “Improve Merchandising Effectiveness” business initiative, the strategic nouns upon which we will focus are:
Step 3: Brainstorm Strategic Noun Questions. Probably the hardest part of this exercise – and maybe the hardest part of the “thinking like a data scientist” exercise – is to brainstorm the different questions that you want to ask in support of the targeted business initiative. For this part of the exercise, we want the business users to brainstorm the business questions for each of the “strategic noun” questions from the perspectives of:
- Descriptive Analytics: Understanding what happened
- Predictive Analytics: Predicting what is likely to happen
- Prescriptive Analytics: Recommending what to do next
See Figure 2 for an example of the evolution from Descriptive to Predictive to Prescriptive.
In our Foot Locker “Improve Merchandising Effectiveness” example, we want to brainstorm the “Customer” strategic noun questions as such:
Descriptive Analytics (Understanding what happened)
- What customers are most receptive to what types of merchandising campaigns?
- What are the characteristics of customers (e.g., age, gender, customer tenure, life stage, favorite sports) who are most responsive to merchandising offers?
- Are there certain times of year where certain customers are more responsive?
Predictive Analytics (Predicting what will happen)
- Which customers are most likely to respond to a Back to School event
- Which customers are most likely to respond to a BOGOF offer?
- Which customers are most likely to respond to a 50% off in-store markdown?
Prescriptive Analytics (Recommending what to do next)
- What personalized offers (recommendations) should I deliver to Anne Smith to get her to come into the store?
Part II of “Thinking Like a Data Scientist” blog series will conclude this “thinking like a data scientist” process and hopefully help us uncover new data sources and metrics that may be better predictors of business performance.
To learn more about EMC’s unique approach to leveraging Big Data to drive business value, please check out EMC’s Big Data Vision Workshop offering.