Data Science is HOT

Let’s start with a couple of screens from the web, and note that every picture in this post will take you to the original article if you click it:

Can PowerPivot Pros Call Themselves Data Scientists?

“Harvard,” “Data,” and “Sexiest” All in One Place!
(The Mayan Prophesy for 2012 Comes True)

Data Science PowerPivot New Black Tim O'Reilly


Tim O’Reilly Would Never Say That Excel is the New Black.  Or Would He?

And There’s a Huge Shortage of Data Scientists!

This is the part where any career-minded Excel Pro (me! me!) should sit up and take notice:

McKinsey report on data scientist shortage


A Shortage of 1.5 Million Managers and Analysts in the US Alone?

Facebook advertised a data scientist job to me on...  my Facebook pageThe original Harvard Business Review article at the top of this post also had this to say later on:


“Indeed, the shortage of data scientists is becoming a serious constraint in some sectors.”

– From the HBR Article

The way I see it, shortage for them could mean opportunity for us.

“OK, I’m Interested.  But What IS a Data Scientist?”

Turns out that the definition of a data scientist is about as precise as the definition of Big Data – there is nothing resembling precise agreement, although some common threads run through all definitions.

Let’s take a look around.  Here’s what IBM says about it:

IBM Says Data Scientists Blend Many Skills:  Math, Stats, Analytics, CS, Strong Business Acumen, Strong Communication


IBM Says Data Scientists Blend Many Skills:  Math, Stats,
Analytics, CS, Strong Business Acumen, Strong Communication

I really like the next definition, coming from Hilary Mason of and quoted in the Wall Street Journal online:


They can take a data set and model it mathematically…  they can actually do [the modeling], which means they have the engineering skills…and finally they are someone who can find insights and tell stories from their data. That means asking the right questions, and that is usually the hardest piece.”

-Hilary Mason, in this WSJ article

The Constellation Research Group breaks the term “Data Scientist” into four types:

Constellation Research:  Four Types of Data Scientist.  Which Ones Map to PowerPivot Pros?


Constellation Research:  Four Types of Data Scientist.  Type I Seems Like it’s NOT Us.
Type IV is Definitely Us.  I Think Type III is Often Us.  Type II…  Unclear.

Common Threads

What do these definitions have in common?

1) Strong Business Acumen.  I’ve been saying this for years now and I’m happy to see this new job title “launch” with this requirement built in from the beginning. 

The people analyzing data need to be the same people who understand the business and its requirements.  Analysis is NOT an effective IT function.  Data preparation and staging IS a good IT function, and tight collaboration between the analysts and the “preppers” is key, rather than the “prep for months THEN start analysis” approach.  See this CIMA article for more detail.

One critical point here is that Excel Pros are, 99% of the time, embedded deeply in the business.  We check this box by default, whereas a PhD graduate in Statistics or Computer Science struggles mightily to adapt to the noisy real world.

Does that sound unfair, me picking on PhD’s like that?  Well I was a Computer Science / Math / Philosophy triple-major with ambitions of doing graduate work before I entered the workforce 16 years ago.  It was many years before I stopped trying to reduce the real world around me into neat, logical equations.  Letting the world tell me what it’s trying to tell me, despite my formerly academic biases, is probably the most significant personal and professional development in my life since 1996. 

Obviously I still use data, a lot, but the point is that I now let the DATA tell its story, rather than trying to make ME – either one of my pet theories or my favorite arcane graph algorithm – the star.  If you have a hard time understanding what I’m saying here, don’t be bothered by it.  Trust me, NOT understanding the way I used to think is a GOOD thing 🙂

2) Asking the Right Questions.  This is really just an outgrowth of #1.  More precisely, I think #2 is the reason why #1 is a requirement.  Understanding the business and what moves it forward (or holds it back) is a necessary component, but I don’t think that’s enough.

You also have to be CURIOUS.  You have to LIKE DATA.  You have to LIKE SOLVING MYSTERIES from data.

Sound like you?  I think most Excel Pros became Excel Pros primarily BECAUSE we are “those people.”

My neighbor, who is a real scientist, tells me that curiosity about data, asking the right questions of it, tearing it apart… these are the prime requirements he looks for in terms of what makes a good scientist.  Not good techniques with test tubes, not even necessarily deep experience with his particular field.  You must be “Data Curious.”

Continued on Thursday

This post is already quite long but I have set the table for sharing my observations and opinions, which was the original point of tackling this topic.

As a brief preview, I will say that there are definitely Data Scientist jobs for which we are NOT suited, but I think we qualify a LOT more than what the authors of all those articles realizes…  at least today.

So please come back Thursday.  I have some things to say Smile