A case study I just have to share
Over the past year, “Agility” has become quite a theme on this blog. It’s fair to say that I just keep hammering it, over and over. But I recently had my eyes opened as to what “extreme” PowerPivot agility can look like.
Let’s put it in football terms: it’s like I’ve been going around telling everyone how they should be less like the 400+ pound Aaron Gibson and more like the nimble, versatile Marshall Faulk.
And then someone comes along like Barry Sanders: agile in ways that you’d never think to recommend, but once you see it, you recognize it immediately. And then you watch it over and over in awed slow-motion (seriously, click Barry’s name above for a video).
That’s what this is like.
Dynamic Data Warehousing!
You don’t truly know what you need to know, until you are delivered what you THOUGHT you needed to know (I sound like Rumsfeld). That was the theme of this blog post from last year. That blog post concluded with my clients at the time realizing that the data they truly needed… wasn’t even being collected.
I think it’s basic human nature to assume that yesterday’s gaps in your understanding are just that (yesterday’s), and that you won’t suffer from that in the future. Marcellus Wallace recognized this tendency as pride, and rightly took a dim view of its helpfulness. (Video definitely NSFW!)
I’ve sung the praises of those folks (my clients) before, and I’m going to do it again: Pride didn’t get in the way of their thinking on this issue. Instead of telling themselves that they’d be better at anticipating their needs in the future, they decided to bake uncertainty into the CORE of their BI planning.
They built a system that enables the following: If they decide they are missing a set of data points, they can start collecting them, warehousing them, and analyzing them, end to end, in as little as two to four days’ time (that’s my estimate).
Here’s a diagram to illustrate:
Click for larger version
More Detail
Here’s some detail that was hard to fit into the diagram. Basically these folks have millions of devices out “in the wild,” and those devices are instrumented to collect data about usage patterns. When I first visited them in the early Fall, those devices were hardwired to collect only fixed data points, and we discovered that they needed to collect new data.
When I had the opportunity to drop back in on them recently, however, they revealed this new system. Now, the only thing hardwired is flexibility. The devices all call home once a day and see if there are new instructions awaiting them – brand new script written by their development team. To make things painless and error-free for the development team, they also have built an internal portal that the developer visits to register the new message type that they are adding to the devices’ instrumentation scripts. That portal takes care of configuring the data warehouse – new tables, retention policies, aggregation rules, as well as configuring the incoming message ports and mapping them into the right import processes. Boom.
They are even experimenting with ways to allow automatic generation and/or modification of PowerPivot models based on selections made in the portal.
It’s worth taking a step back and marveling at. On Monday they can realize they have a blind spot in their radar. On Tuesday and Wednesday they develop and test new instrumentation code. On Thursday they roll it out to the devices. And on Friday, they are literally collecting and analyzing MILLIONS of data points per day that they lacked at the beginning of the week.
It’s not like those new data points are on an island, either. Via DeviceID, they are linked to multiple lookup/dimension tables and therefore can be integrated into the analysis performed on other fact/data point tables as well. They can literally write measures that compare the newly-collected data against data they were already collecting – ratios, deltas, etc. They can put the new metrics side by side with old metrics in a single pivot. And, in theory, they could use the new data points to generate new lookup/dimension tables by tagging devices that exhibit high or low amounts of the newly-instrumented behaviors (although we did not discuss that on site – it just struck me as a possibility).
And late last year this organization had made zero investment, ZERO, in business intelligence.
Can this work for everyone?
Of course not. Not everyone has the luxury of reprogramming their production systems at high frequency like this. Not everyone can afford the risk or performance hit of having their production systems writing back directly to their data warehousing systems either – standard practice is to have your warehousing efforts “spying” on your transactional systems and taking occasional snapshots. It’s a pull, not a push, which is why the “E” in “ETL” is Extract.
But this is definitely food for thought, for everyone. “Why not?” is one of my favorite questions, because even if you can’t ultimately do something, examining the “why not” in detail is often very enlightening.
I know one of the standard objections is going to be, essentially, “Data Gone Wild: No discipline. Mile-long lists of tables and fields.” Bah, I say! Good problems to have! Storage is cheap, flying blind is expensive. And when you reach the point of being blinded by too much information, well, that’s an opportunity for a new set of tools and disciplines.
More to come
Last week was my first ever “doubleheader” – two consulting/training clients in a single week. That can be hard on a blogger, heh heh. But I look to be home for the next week or two, so you should expect to see a renewed flow of content here. Got several things rattling around in my head.