Hello P3 Adaptive World! My name is Krissy Dyess and I have had the most amazing experience working as a full-time consultant at Power Pivot Pro since April 17, 2017!
Today is 1/9/2018 so that would be…
266 calendar days, if you DON’T include today’s date
267 calendar days, if you DO include today’s date
182 calendar days, if you include ONLY business days, excluding Saturdays/Sundays & 6 holidays
… and there are many other FLAVORS of responses that could be listed here as well!
Funny, how a simple intro can branch off into many different paths of interpretations – this is pretty much the world of data and analysis that we live in as data professionals!
For any given question, there can be MANY different ANSWERS, depending on what question you are REALLY trying to answer. Additionally, there can be MANY different SOLUTIONS!
In my 20+ years working with data, it seems like a good deal of my time is spent PREPARING the data. What does this mean? It means applying logic to structure the data in a way that is best for your data model and/or analysis.
Today, I would like to share a common data problem and 2 different SOLUTIONS used to PREPARE a set of data for Power BI.
And like most problems, there is usually more than just one or two ways to create a solution. So, if any of our readers out there would like to share any additional ways to solve this problem, please feel free to leave your ideas in the comments below!
The Problem
Sometimes it’s necessary to create a data set that contains a consecutive list of dates and a value for a data point that will be measured or visualized. In this example, the Source Data shown has 7 records, starting with 07/31/2017 and goes through 08/10/2017. Notice that there is a gap between 08/05/2017 to 08/10/2017.
The end goal is to have consecutive days. Now, the end goal could be consecutive days through the last date available (08/10/2017) or through today’s date or etc. In my example, I will go through today’s date but depending on your problem and your end goal this could be different.
SOLUTION #1 – Using SQL
Why solution in SQL? Well… because until Power Query came around, SQL was my best friend when I needed to do any data manipulation!
For those of you looking to learn SQL, this will be a good example to follow along with. For those of you that don’t care to learn SQL, I’ll also show another solution using Power Query.
I’m using SQL Server Management Studio (SSMS), my local workspace, and then creating a New Query workspace.
In order to create a table and populate that table with example data, I select New Query and use the following scripts:
SQL Step 1
SQL Step 2
The SQL Script is entered in the New Query window, Execute or F5 is used to run the SQL script and the results are displayed in the Results Pane.
Voila! I’ve created a table with example data by running the above code!
And we can see the 4 day gap between 08/05/2017 to 08/10/2017.
The next step would be to have a complete consecutive set of calendar days available in a table. Now, you might have a calendar table that you can already use for this step or you might not! There are many ways to create a calendar table! But for this example, I’m going to show a different technique to get a set of consecutive calendar dates for the time frame I’m interested in building.
SQL Step 3
Voila! I’ve created a consecutive calendar table with example data by running the above code!
NOTE: If you are following along, your dates will be different based on when you run the code – this doesn’t matter for understanding the concept. The purpose is to just get a set of dates, they can be any dates.
SQL Step 4
We can see that we have 161 rows of data starting 07/31/2017 going out until 01/07/2018.
SQL Step 5 – Final Step
Which brings us to the final step, which is to join the result of the above step with the actual store quantity data to get the actual quantity for days where a quantity was available. And to do this, I created the final SQL query:
And now we have the final SQL query & result set –
SOLUTION #2 – Using Power Query
Now, can we get to the same results using Power Query? Yes, we sure can!
Power Query Step 1
Let’s start with the example data. Since I started with SQL, I can just copy and paste my same example data from SQL into Excel. Then in Excel, I can use Insert > Table and to create a table named ‘StoreQuantity’. So far, so good!
Power Query Step 2
Then since I’m using Excel 2016, I can use Data > From Table/Range to load and launch the Power Query Editor window with the example data.
Power Query Step 3
The exact same example source data is now loaded into the Power Query Editor! Voila!
Power Query Step 4
I’ve renamed the source data query to ‘Staging_StoreQuantity’. And I’ve created a “Staging” folder for the data preprocessing steps.
Power Query Step 5
For the next step, I can right click on ‘Staging_StoreQuantity’ and Reference or Duplicate another starting source of data.
Now, there are performance reasons that I might choose Reference over Duplicate.
If I were to choose Duplicate, Power Query will locate and retrieve the data again. Depending on the source of the data and the size of the data, this could take some time. However, if I choose Reference, then I’m using the data that is already loaded in the Power Query Editor. So, in this case, I’m choosing the Reference option!
Power Query Step 6
I’m going to rename the new query to ‘Staging_StoreQuantityAllDays’. And use the Home > Group By button to create a unique list by [store_id] of the first [create_date] where we have a quantity.
Power Query Step 7
Next, I can add a custom column expression using Add Column > Custom Column and the Power Query List.Dates function to create a list of consecutive dates for each [store_id]. I will use the [min_create_date] as the starting date in the expression and go out 365 days.
List.Dates(#date(Date.Year([min_create_date]), Date.Month([min_create_date]), Date.Day([min_create_date])), 365, #duration(1, 0, 0, 0))
After adding the [dt] custom column as a List, you can use the Expand to New Rows…
… to get one row for each item in the List. Boom!
Just a few more steps to change the [dt] format to a date, filter steps to keep only the records through the current date … and Boom!
We now have a table with consecutive dates for the [store_id] starting with 07/31/2017 through to today!
Power Query Step 8
Finally, I can use the Home > Merge Queries as New to create the final query.
Staging_StoreQuantity is the resulting column, we use the Expand double arrows to select ONLY the [qty] column from our original data source. I deselect the use original column as prefix so that only the [qty] name is returned for the column versus Staging_StoreQuantity.qty.
I renamed the final query to ‘StoreQuantity’ and I’ve created and named a new folder called “Data Model”, to store the final query that will be loaded to the data model and that’s it!
You can see that we can use either SQL or Power Query to take source data and manipulate the data into the SAME desired results!
Which method is BETTER? Well, that depends. If you don’t have SQL or prefer not to use SQL, then Power Query is a GREAT option!
As for me, I have to refer to the song… “Make New Friends (Power Query) But Keep The Old (SQL)… One Is Silver And The Other Gold!”
Happy New Year Everyone
Forget bending spoons with your mind – there’s no money in it.
It takes a special kind of mindset to “bend” data (and software!) to the human will. As this article demonstrates, we at P3 Adaptive can twist Power BI into a pretzel if that’s what an organization needs. (A robust, trustworthy, industrial-strength pretzel of course).
The data-oriented challenges https://foofacing your business require BOTH a nimble toolset like Power BI AND a nimble mindset to go with it. And as Val Kilmer / Doc Holladay once said, we’re your huckleberry.
Get in touch with a P3 team member