This is a quick tip on automated testing with Power Query.
I loved Nar’s post on Automated Testing using DAX. I especially like the rule of always including controls so that business readers can share responsibility for data quality. For my part, I sometimes use hidden pages in Power BI reports to assure myself of data quality. I also set alerts on testing dashboards in the Power BI Service to notify me if something is not right. Sometimes, however, a more proactive approach is needed. So, we’ll be doing automated testing with Power Query.
If the query can’t connect to the data source, it will fail. When this happens, the report in Power BI Service is stale, but accurate. I’m fine with this. It can also happen that the query succeeds but is incomplete. In this case, the result is that the report is wrong. Why does this happen? It can happen because of an overtaxed transactional data source. The ERP or CRM or work order system just can’t deliver the amount of data. Maybe it’s linked SQL tables using ODBC. For whatever reason, the query succeeds, but data is missing. I’m NOT fine with this. The long-term solution is to move to a more reliable data source (data warehouse, anybody?). In the short run, refreshes must be stopped. Stale data is better than bad data.
Now, I’m not the one to drop great movie quotes, but there’s a quote from a novel I like. “My mother refused to let me fail. So I insisted.” (Walker Percy, The Second Coming). We need a way to insist on failure. The first piece of the puzzle was learning about if statements in Power Query. A while back, Avi Singh had a post about if statements in Power Query. The key point is that you can call steps out of order.
Automated Testing with Power Query So Query Fails when Needed
There are three steps to making Power Query fail.
- automated testing
- if statement
- a deliberate error
Automated Testing
At the end of your query, make your test. I used Count Rows (Transform ribbon). I have also used Count Distinct Values (Transform: Statistics). This is in Excel, but it works the same in Power BI.
IF Statement
Once you count rows, all of the columns disappear, and you are left with a number. Nar posted about scalars in DAX, but this is a scalar in Power Query: a single value not in a table or list. Next, click on the add new step button to the left of the formula bar.
After pressing the next step button, a step will appear that references the previous step of the query:
= #”Counted Rows”
= if #”Counted Rows” > 28 then #”Changed Type” else #table()
A Deliberate Error
Let’s unpack the M code above. The first term, #”Counted Rows” is a scalar number, in this example 29. If this step is greater than 28, the query will return #”Changed Type,” which is the table. The last term is #table(). A valid #table() statement contains the headers and rows of a table. Since the statement has no parameters, it returns an error. If the rows are less than 28, the query will fail. The query fails, and the data is stale, but it’s still complete.
When working with an unstable source, transaction tables are the main concern. But don’t forget your lookups. If the data is there but there’s no product lookup to translate it, you still have no data. And yes, find a better data source.
Update 10/27/2018. I decided to bring up some interesting discussion from the comments (I also accidently broke the original post, my bad).
1. KCantor asked about determining the criteria for the test. The simplest shown here uses a minimum based on expected rows. You could also do a distinct count of product SKUs in the transaction table. Or, you could check the transaction table to ensure that all values are in the lookup table. Depending on the data and the types of errors, you may have to get a bit creative.
2. stevej mentioned error with Text.Format():
//raising an actual error is more informative
= if #”Counted Rows” > 28 then #”Changed Type” else error Text.Format(“Expected > 28 rows in table. Actual rows=#{0}”, {#”Counted Rows”})
With my quick and dirty solution for automated testing with Power Query, I didn’t even consider error messages. The format above shows Text.Format using a list of values. #{0} indicates to use the first value from the list. In this case, the list only contains the end result of the #”Counted Rows” step of the query.
There’s another method for Text.Format, which uses names instead of a list index. Text.Format is described in the Power Query documentation.
=if #”Counted Rows” > 28 then #”Changed Type” else error Text.Format(“Expected > 28 rows in table. Actual rows = #[Rows]”, [Rows = #”Counted Rows”])
Did you find this article easier to understand than the average “tech” article?
We like to think that is no accident. We’re different. First of a new breed – the kind who can speak tech, biz, and human all at the same time.
Want this kind of readily-absorbable, human-oriented Power BI instruction for your team? Hire us for a private training at your facility, OR attend one of our public workshops!
Get in touch with a P3 team member