“…then they start thinking that, ‘where there’s smoke there’s fire’ logic…”
Continuing the Impromptu Series of Simple Real World Examples
The Mr. Excel PowerPivot Forum has inspired me to change gears a little bit and focus more time on simple techniques that don’t break the DAX Spicy Scale while still being very useful, everyday stuff. I sometimes take this stuff for granted and end up looking for topics that are much more “clever” when in reality we all can use a good dose (or two, or ten) of basic useful examples.
Medical Treatments – Are They Effective? Counterproductive?
A user over at the forum named “Mirknin” posted the following question:
I have a relatively simple data set where each row represents a week, each week has a number of data columns. Occassionally, however, we have treatments and I set up Excel with a column where the treatment is recorded as 1 – i.e. a treatment occurred that week, otherwise the cells are left blank.
I AddColumn in PowerPivot to generate a unique identifier (through the usual ampersand formula) to represent whether a treatment occurred in a specific week. Using the Slicer, I can PivotTable to see the data from the row where the treatment took place because all this info is related to that week, but seeing the data for following four weeks would let me see whether the treatment had made an effect.
I guess my query is whether the Slicer can be modified in such a way as to report the row it’s associated with plus subsequent rows…
Data Set: Let’s Use Trees Rather than People
When I was manufacturing sample data for this I got a bit squeamish about it. Not for the obvious reasons though – I’ll explain at the end of the post. But for now, just know that we’re going to work with Trees, like in a Botany lab, and some experimental treatment they are receiving.
We have a calendar of treatments – a list of dates and Yes/No for each:
Calendar of Treatments
And then a table of health readings taken for various trees in the lab on specific dates:
Tree Health Measurements
Normal Slicer Function Across Relationships
Notice that the two tables have the little “relationship” icons on the Date column? These two tables are related on that column. So if I write an “average health” measure and put the Yes/No treatment field on a slicer, I get:
But if I put Date on rows, we’ll see that all I am “getting” in my health scores is the health of the tree on the day the treatment was applied:
But I Want the Health Scores Over the Two Days AFTER Treatment!
I want to track a range of dates after a treatment is applied, so I need to do something different here.
What I need is a new column in the TreatmentCalendar table:
If I use THAT column as a slicer, or even better, put it on rows of the pivot, I get what I want:
Hey, check it out! Trees are doing a little bit better in the two days after they receive treatment! (That’s lucky, since this sample data was created using RANDBETWEEN).
But how did I build that “Recently Treated” Calculated Column?
Did I say up front that this was going to be LOW on the DAX Spicy Scale? Hmm.
First let me repeat a longstanding recommendation: if you are using a database as your data source for PowerPivot, you should seriously consider having this calculated column generated in the database and NOT in PowerPivot. There are multiple benefits of that, as long as it’s an option.
If it’s not an option, you still have two choices. For something relatively quick and dirty, you can do the calc column in Excel and just paste an entirely new TreatmentSchedule table into PowerPivot.
But for larger data sets, or cases where you can’t have that manual intervention step every time you get new data, you’re gonna have to write the calc column in PowerPivot.
And since PowerPivot lacks A1-style reference, and this calc column has to look at rows other than just the current row in order to get its answer, you have to use the dreaded EARLIER function.
Primer: Simple Use of the EARLIER Function
Before we do something advanced with EARLIER, let’s cover the basic usage first:
Say I have the following VERY simple table like this:
And I want to add a third column that is the total for each customer:
How do I do that?
The calc column formula for that third column is this:
=CALCULATE(SUM([Amt]),
FILTER(ALL(‘Table’),
‘Table'[Customer]=EARLIER(‘Table'[Customer])
)
)
Explaining EARLIER() in depth may be its own post. Yeah. This was probably the LAST function I learned to use. It’s an ongoing joke between me and the Italians actually.
Here’s the Formula for the RecentTreatment Yes/No Column
With that background in mind, here is the formula for the Yes/No column. I actually did it in two columns:
That 1/0 column is the tricky one. The Yes/No column is just a “cosmetic” column built using IF.
Here’s the 1/0 formula:
=CALCULATE(COUNTROWS(TreatmentCalendar),
TreatmentCalendar[IsTreatmentDay]=”Yes”,
FILTER(ALL(TreatmentCalendar),
TreatmentCalendar[Date] <
EARLIER(TreatmentCalendar[Date])
&&
TreatmentCalendar[Date] >=
EARLIER(TreatmentCalendar[Date])-2))
Easy right?
I Promise to Come Back and Explain This!
EARLIER isn’t all that bad really and neither is the formula above. But it definitely feels a lot harder than an Excel pro wants it to be. Given the length of this post however I think I’ll do that next time.
Calling Marco and Alberto!
There are a couple of obvious questions here that my simplified example just outright skipped:
- What if I have more than one kind of treatment?
- What if not all “patients” receive the same treatment, and/or on different days?
- Can I use a slicer to control how “wide” the range of dates is rather than hardcoding 2 into my formulas?
The fact that I was dodging those problems in this post is why I was squeamish about using people as the example rather than trees. And all three of those make this problem a lot harder.
And what do we do when we hit a problem that goes beyond the powers of mortal Excel Pros? We throw up the Boot Signal of course:
Get in touch with a P3 team member