Ok, in the last football post, I had written the following measure:
Which basically means, “sum up the [Yards] column but only those rows with [PlayTypeName] = “RUN-run” “
Now we get to find out if it works 🙂
I need some real-world data that I can validate against. Out on NFL.com I can find historical player stats. Let’s look at one of my favorite players from recent years, Priest Holmes. He amassed a lot of Rushing Yards and should be a good sample. Here are his statistics by Game for 2004:
OK, so let’s make a PivotTable that just shows Priest’s 2004 Rush Yards by Game, and compare that to the Rush Yards column from above:
My Pivot Table From NFL.com
Bingo!
“You know sometimes I even impress myself?”
-Han Solo
An exact match! That might seem a bit bland to you folks out there, but to me, it is VERY exciting. I’ve got 40+ tables, a Plays table that is loaded with crazy complexity, no database training whatsoever, and yet… I now have a measure that agrees, DOWN TO THE YARD, with NFL.com!
Wahoo! In all honesty, when I started this project, I had no idea how far I could go. I intentionally chose something that *might* defeat me.
I’m feeling pretty darn optimistic now, though 🙂
I’m suspicious, however…
I recall that my Plays table contains multiple rows per play, so that, for instance, it can capture data about the *defensive* players involved in a play:
Right there, a single play, three rows. One for the runner himself, one for the player who tackled him, and another for the player who assisted the tackle. That will come in handy when I get started looking at defensive stats.
But for now, I suspect that means defensive players are getting credited with rushing yards, too. When I filter to Antoine Winfield, the Assister from above, my PivotTable confirms my suspicions:
What we are seeing there is the total net yards of running plays in which Antoine Walker was involved in tackling the runner. I don’t want that.
But all I have to do, then, is add another clause to my CALCULATE function.
And now the pivot table shows:
Priest Holmes is unchanged. Antoine Winfield now has no rushing yards. Perfect!
CALCULATE is good. CALCULATE is your friend.
“OK Rob, a whole post just so you can make ONE change??”
Yes. I spent more time than usual on this. Here’s why: Everything above took me less than 10 minutes in real time. It took far longer to capture the screen shots than simply to blaze through it.
But when I worked with a BI consultant, a few years back, the same exact iteration took about a week.
How do we account for the difference? Is it because the consultant produced better results? Was the Rushing Yards measure more accurate or robust than mine? Was it somehow more formalized, more robust? Or was the consultant not very good?
The answers are no, no, no, and an emphatic “no.” (The consultant was fabulous, akin to godlike. There’s that word again – “akin.” Why do I keep saying that?)
The real difference, as I’ve said before, is that with PowerPivot, the “modeler” and the “business user” are the same person. I’m the one writing the expressions, and the one who knows the most about what I want, because I know the “business” (football, in this case) inside and out. Iteration in one person’s head is blazingly fast.
- Finding the comparison stats on NFL.com – Only the business user knows where the best validation data sets are.
- Creating the pivot for comparison purposes – even just choosing my player for comparison reflected business knowledge.
- Realizing that I likely was incorrectly counting defensive players in my measure – because I had access to the source tables and the “business rules” in my head, I spotted this problem before it ever made it into a report.
Seriously, this was like a trip down memory lane, on hyper fast forward. I’d get a cube from the consultant, build some pivots, see that things were not accurately reflecting football rules, point out the problem, wait for the next version, repeat.
Not anymore 🙂
“Are you saying we don’t need BI Pros anymore?”
No, I am NOT saying that. Even in this football example, I am cheating. The original data from STATS arrived as a jumble of text files. The schema was terrible. Many of the required attributes (like the score of the game on a particular play) were completely missing.
There was a TON of work, done by the BI professional, to get from that horrible mess to the 40+ tables that I am working with now in PowerPivot. And there is no way, no way at all, that I could do that myself – not back then and not today.
In terms that an MS BI Pro understands, the Integration Services work remains. In fact, it becomes even more important, since you need to make the resulting schema not just work for people like you, but also for people like me 🙂
But the Analysis Services work – you can start sharing that with the Excel business users.