A PowerPivot Model with “Single-Step” Relationships is a “Star,” and one with “Daisy-Chained” Relationships is a “Snowflake.” Snowflakes are Often Slow. And Evil. Don’t trust them.
(Images taken from page 194 of the book).
“Hey, Why Did Things Get So Slow?”
Got a great question this week from Dave Boylan:
“I think I read in your book that you should always use dimension (lookup table) data in your slicers and page fields (even if the same data is in your fact/data table) because they behave better and perhaps use less memory. This makes sense as my fact table has 2MM rows and the slicer has three distinct values.”
Yes – if you have a lookup table, you should use fields from that lookup table on your pivot, rather than their equivalent fields from your data table. My recent post on using fields from your date table on the pivot is just one example of this.
Dave continues:
I have three tables: Timesheet, Projects, and Initiatives. They are linked together like this:
Initiatives (600 rows) –> Projects (17k rows) –> Timesheet (2M rows)
(where the arrows point in the direction that filters flow, so Initiatives filters Projects and Projects filters Timesheet).
When I add a slicer of initiative type from the Initiatives table and select a value, my workbook (45MB on disk) went from 300MB to 800MB in memory, in 5 minutes, and then crashed.
So, he’s following my advice and using his Lookup table field on the pivot, but the slicer then takes forever to slice, and then crashes. This is impossible of course – my advice is never flawed! (Joke).
But when he uses =RELATED() to “fetch” the right field from the Inititiaves table down into his Timesheet table, as a calculated column, THAT field works great on his slicer:
So, I brute force “double related” the initiative type directly onto the 2M-row Timesheet table and used that as the field for the slicer. The slicer buttons now recalc the pivot in one second.
Interesting huh? In fairness to me, I covered this problem in the book. In fairness to Dave, it’s near the end of the book, in Chapter 19 – Performance: How to keep things running fast.
Snowflake is the Culprit!
Situations like the one Dave ran into are the reason why I advocate always using Star-shaped models (single-hop relationships only), and never using Snowflake-shaped models (chains of relationships).
PowerPivot does not “like” large intermediate tables in its relationship chains:
Initiatives (600 rows) –> Projects (17k rows) –> Timesheet (2M rows)
That Projects table is large enough (17k rows) to cause PowerPivot trouble if it “gets in the middle” of a slicer (or Row field, etc.) trying to filter the Timesheet table.
Viewed more graphically, this is his original setup:
Dave’s Original Setup: Two Lookup Tables Chained Together (Snowflake)
Gives Him a Large-ish Intermediate Table in the Chain (Projects)
What’s the fix?
Eliminate a step in the chain. Combine the Initiatives table into the Projects table. Do this in the database (if you have one) or use =RELATED() calculated columns.
You end up with something like this:
Convert your Snowflake into a Star by Adding the “Parent” Columns into the “Child” table. Now you can ignore the parent (Initiatives) table and just use the Projects table fields on your pivot.
Is this always necessary?
No, it is not always necessary to “flatten” your snowflakes into stars. “If it ain’t broke, don’t fix it.” And sometimes you have no choice but to use a snowflake anyway.
But if things are running slowly, this is a good fix to try.
(As are the other tips in Chapter 19 on slicers, table shape, imported columns versus calc columns, common “slow” formulas, etc. – don’t skip chapter 19!)