“There’s a Fight Club up in Delaware City.”
“Yeah, I heard.”
“There’s one in Penns Grove too.”
“Bob even found one up in New Castle.”
“Did you start that one?”
“No, I thought you did.”
It Keeps Spreading
About a week ago I was talking to Chris Campbell and some of the other folks at Blue Granite. Chris mentioned that he has been teaching some PowerPivot classes at the Microsoft Technology Centers, sometimes even in my neck of the woods, but I didn’t know until he told me.
Which, of course, instantly reminded me of the scene above in Fight Club. I’m sure everyone else makes the same connection right?
Anyway, Chris asked if I would be interested in him writing a guest post and I said heck yeah! So, without further delay, I give you the p3adaptive.com debut of Chris Campbell.
Recently, a customer sent me a question regarding a DAX problem they were working on. They have a Members table in their model that includes attributes of “Start Date” and “End Date” for each member. The question they needed to answer was “How many active members did we have in [fill in the blank]?” I thought this was a pretty interesting question and it seemed like it ought to be pretty easy to do in DAX.
They were trying to get it to work using relationships between the start and end dates columns on the Members table and their Date table. Their date table also has “Month Start” and “Month End” columns. They were trying to use these columns in conjunction with the RELATED function to determine if the member was active by comparing the member start and end dates to the month start and end dates.
How I Reproduced the Problem
I’m not local to this particular customer and their model is large and I didn’t have easy access to it. It seemed easier to try to replicate their issue with a dataset I know well and I thought it might be useful to include that process here. If you’re not interested in that and just want to get to the solution, you can skip this section. If you’d like to see my thought process, read on.
I emulated the model they built using the ContosoRetailDW dataset that is available from Microsoft here. The Contoso model includes a Promotions table (DimPromotion) that includes a start and end date column for each Promotion. I used that table as the basis for my example. To replicate what the customer had on their table, I added a couple of calculated columns to create the MonthStart and MonthEnd month. Then I related DimPromotion to DimDate on the Promotion StartDate in order to have a similar model to what my customer had.
Here’s what it looked like:
The DimDate table in the Contoso dataset doesn’t have a column with just the month number (e.g. 4 for April) so I created a column for that. Then adding MonthStart and MonthEnd to the DimDate table was simple with a couple of calculated columns. I supposed I could’ve embedded the calculation for MonthNumber in the other two columns but I like to compartmentalize things.
The MonthStart column just uses the DATE function, with the Year
=DATE([CalendarYear], [MonthNumber], 1)
The DimDate table (with other columns hidden) ends up looking like this.
The MonthNumber column makes use of the MONTH function to extract the month number from the Datekey column. The MonthStart column uses the DATE function to take the values from CalendarYear and MontNumber columns plus the value “1” to get the first date of the month. Finally, the MonthEnd column uses the EOMONTH function to get the last date of the month.
The last thing I did was to create a hierarchy on the calendar columns in the DimDate table so I could drill from Year to Quarter to Month.
This gave me a working environment that looked enough like their model that I could replicate their issue.
On to the Real Problem
I converted the DAX from the original “Active Members” measure that I got from my customer to use the DimPromotion table in my new model.
The DimPromotion table looks like this:
In order to answer the “how many promotions were active in [fill in the blank]?” question, the measure uses the relationship between the DimDate table and the DimPromotion table to get the start and end date for the month. It then compares them to the start and end date for the promotion.
Active Promotions Wrong:=CALCULATE(COUNTROWS(DimPromotion),
FILTER(DimPromotion, (DimPromotion[StartDate] <= RELATED(DimDate[MonthEnd])
&& DimPromotion[EndDate] >= RELATED(DimDate[MonthStart]))))
The first thing I had to do was think through the logic. To determine if a row is “active” we have to check to see that the promotion started before the last date of the month and did not end before the first date of the month. That’s what the FILTER function does in the expression above. The RELATED function follows the relationship between the tables and retrieves the MonthEnd and MonthStart values.
Browsing this measure in a pivot table gives you this:
The problem is that the start and end dates are not necessarily (or usually) in the same month that you may be slicing on. However, when you put this measure in a pivot table, a row only gets counted if the promotion start date is in the same month as the related row on the DimDate table. For example, in May 2008, the pivot table only shows 1 promotion active. If you look at the data in the DimPromotion table, you will find that there are actually 2 promotions that were active. The “No Discount” Promotion has a start date of 1/1/2003 and an end date of 12/31/2010 and the “Asian Summer Promotion” has a start date of 5/1/2008 and an end date of 6/30/2008. Both of these should be counted as “active” in May of 2008.
How to fix it? Get rid of the relationship.
I’m a database guy so I like my data to have relationships. So much so that I spent a bunch of time trying to make this work keeping the relationships intact. However, sometimes in PowerPivot, working without a relationship is the appropriate way to handle an issue. Thanks to Josh Fennessy (b | t) for the conversation around this. Both Rob and Kasper have a number of posts that cover this concept too, like here and here.
So finally I thought, “Oh! If I just remove the relationship, it’ll work!” But that wasn’t exactly right because without the relationship, I needed to get the first and last dates of the month some other way. Fortunately, DAX comes to the rescue with a couple of handy functions: FIRSTDATE and LASTDATE. They return the first and last dates in a column of dates constrained by the current context.
The new version of the Active Promotions measure looks like this:
FILTER(DimPromotion, ([StartDate] <= LASTDATE(DimDate[Datekey])
&& [EndDate>= FIRSTDATE(DimDate[Datekey]))))
Using this new measure in the pivot table gave me the result I was looking for:
It also has the added benefit of not being tied to a month. I can slice the pivot table by any level of my calendar hierarchy and get a correct result thanks to the FIRSTDATE and LASTDATE functions.
This measure probably isn’t perfect yet. It assumes that there will always be a start date and end date for every record. In many cases, end dates are left empty for an item that is still active. Changing the expression to include an IF function to check for that should be an easy matter.
One last note on removing relationships. You will have to deal with the nagging “Relationships between tables may be needed” message. In PowerPivot 2010, you can turn this off by disabling automatic relationship detection. In PowerPivot 2013, you have to click the “X” button in the message box.