04.23.24

Power BI Models are the Center of the Fabric Universe – But What Has ACTUALLY Changed?

Listen Now:

Dive into this episode where we’re unraveling the threads of data analytics like never before. This week, we’re zeroing in on the seismic shifts in technology that are reshaping our understanding and use of data. Imagine stepping into a world where data isn’t just numbers, but the lifeblood that pushes businesses forward.

We’re going to take a deep dive into semantic models. Once confined to back-end operations and report generation, these models are stepping into the limelight, becoming powerhouse tools for strategic decision-making. We’re not just talking about the mechanics—we’re exploring how these developments touch everything in the business world and open up a world of possibilities you might never have considered.

Plus, here’s your chance to steer the ship: join our LinkedIn steering committee. It’s your chance to propose topics, suggest guests, and help us chart the course toward the most pressing issues and innovative ideas in data analytics. Your voice can help direct our journey, bringing fresh perspectives to the table.

And, as always, be sure to leave us a review on your favorite podcast platform to help new listeners find us!

P3 NCAA March Madness

Rob Collie (00:00): Hello, friends. After we recorded last week's episode about the Fabric Conference, as I was thinking back about it, I came to this realization that when talking about Fabric and specifically talking about this concept, the semantic model, the Power BI model becomes super central and super available in the Fabric world, available to other things in a way that it wasn't before. When discussing and emphasizing that concept, I've been just a little bit lazy, just a little bit vague and never actually have we been directly answering the question, what has actually changed, on the ground in reality, what is possible today in the Fabric world and is going to be possible tomorrow as well in the Fabric world?

(00:50): As they improve and add additional capabilities, what is the actual difference, the contrast between that and the world that we've already gotten to know, the world where you're publishing these data models as Power BI models in the non-Fabric world? So, today, Justin and I dove into that level of detail. We make it a lot more tangible, a lot more specific than we have in previous episodes, and that of course means that we got a little bit into the weeds, not too deep into the weeds, just deep enough.

(01:23): Getting down to the actual mechanical moving parts and what's different before and after Fabric turns out to be super valuable even in terms of understanding this stuff at a high level, understanding it in terms of its business impact. So, don't worry, we didn't get down into techno mumbo jumbo. That would be very off-brand for us anyway. But I do think that these specifics are going to help you much more solidly anchor the way you understand and plan for in terms of this new thing that Microsoft almost deliberately has to be vague in terms of how they describe it.

(02:00): In fact, at the end of this episode, we actually explicitly talk about precisely that why is it in Microsoft's interests to talk about this publicly in a way that does create confusion. There actually is a sensible answer to it to put the finest of bows on the topic. We talked explicitly about how all of these, the specific things we've talked about throughout the episode, how they translate into actual business value.

(02:26): As we wrapped up the recording of this one, I was left with the distinct impression that this one will go down as one of the most valuable episodes we've actually recorded. I hope you agree. So, let's get into it.

Announcer (02:37): Ladies and gentlemen, may I have your attention, please.

Announcer (02:42): This is the Raw Data by P3 Adaptive Podcast with your host, Rob Collie, and your co-host, Justin Mannhardt. Find out what the experts at P3 Adaptive can do for your business. Just go to p3adaptive.com. Raw Data by P3 Adaptive is data with the human element.

Rob Collie (03:09): Justin, I thought we'd start someplace a little bit different today, which is some of my favorite podcasts. They actually almost explicitly as part of their format, they start off with some fun banter at the beginning, and I've got an example. We have something we can fun banter about before we dive into the actual topic. Lately I have discovered a new use for Power BI, which is to tell numerical jokes.

Justin Mannhardt (03:33): Give me an example.

Rob Collie (03:34): We did a couple episodes about the hockey dashboards that I built. I built hockey dashboards for my league, and the real value that these dashboards bring to the league is just that they enhance enjoyment. And once I really truly internalize that, I came back and recorded another episode saying hey, look how I thought about this differently and the different dashboards that I built once I realized what I was really doing. So, telling jokes in numerical format is really like the pinnacle of that form. We have this dashboard called Stars of the Week. It's a regular dashboard, some bar charts that shows the people who scored the most points, scored the most goals, whatever, just in this most recent week.

(04:15): That came out of the realization that people like seeing things that change all the time. If you happen to be on the Stars of the Week for the first time ever, that's a great thing that happens to you and whatever. It was a really good dashboard to add, but then when I went to one step further and I created a completely separate fact table than the model that all it is just pulled straight from a hand edited Excel file. And so, it's this bar chart that we can put whatever we want in it.

(04:39): One example one time was one of our star players was whining during a game about a penalty that he thought should have been called on him. We all knew he was faking it. It was actually my wife Jocelyn tripped him and he fell.

Justin Mannhardt (04:57): Is this a big human?

Rob Collie (04:58): He's big, but the point is he's so ridiculously skilled that if the game were intentionally go trip him, you'd never be able to pull it off. Because he would always stay on his feet.

Justin Mannhardt (05:09): I love this.

Rob Collie (05:09): When this one case where he thought he could get a penalty, all of a sudden, he's just flat on his face.

Justin Mannhardt (05:15): Yeah. Go, Jocelyn.

Rob Collie (05:16): He's whining about it, and then the ref skates up to him and says, "I am never going to call that." And so, we put this in the Stars of the Week dashboard in the, we now call the AI section, the AI superlatives, where AI stands for approximately intelligent. And so, that week we had dumps Shane in the corner, one dash Jocelyn, whines about it, one dash Shane, and then tells him he is never going to call that, one dash Drew.

Justin Mannhardt (05:55): Numerical jokes

Rob Collie (05:57): And there's a sort order column because you've got to tell the narrative, you got to deliver that joke in order. So, there's a sort order column in the spreadsheet, and so I have been a second-rate Rob the whole time I've been in this league. I'm the only Rob in the league, but there's this other Rob that used to play.

Justin Mannhardt (06:13): Other Robs.

Rob Collie (06:14): No, no, and he is called Cool Rob. They refer to him as Cool Rob.

Justin Mannhardt (06:19): Why?

Rob Collie (06:19): So, by default, I'm just Rob. And this guy has been, I mean he is legitimately cool. The stories of him are epic. He is a really cool Rob for sure. Okay, 100%. As far as I'm concerned, he's Snuffleupagus. He doesn't exist. I've never met him. He never shows up at the bar, but last night, he finally did. I finally got to meet and tell him what hell my life has been because he exists. And the AI stats are a number of Cool Robs in attendance at the bar last night, one, other Robs, one.

Justin Mannhardt (06:53): I love it. That's a great use case for Power BI.

Rob Collie (06:56): Just a completely hand editable chart. You're not actually doing anything truly numerical. You're not doing any analytics. You're just putting some jokes and I know that you can't do this in 99.99999% of business scenarios. You can't do anything similar to this.

Justin Mannhardt (07:12): Yeah, don't get fired.

Rob Collie (07:14): Don't do that. Anyway, onto the actual topic of the day. In last week's episode, we recorded with Mark, Will, and Chris their reactions to and experiences with the Fabric Conference.

Justin Mannhardt (07:26): Fab Con.

Rob Collie (07:28): Fab Con that I keep calling Fabricon.

Justin Mannhardt (07:30): I think Fabricon, you need that extra syllable.

Rob Collie (07:34): You know what it is. Actually, it's Fabrikam. Fabrikam was one of the Microsoft fake companies.

Justin Mannhardt (07:39): That's right.

Rob Collie (07:41): AdventureWorks.

Justin Mannhardt (07:41): With a K.

Rob Collie (07:42): Yeah, and if you pronounce the A as an ah, it's Fabrikam.

Justin Mannhardt (07:46): We've got the Contoso demo, we get the AdventureWorks demo. Somebody get on the Fabricon demo.

Rob Collie (07:52): And we had a bit of a side discussion, breakout during that recording that we realized really deserved its own episode, and so we left this out of the Fabcon episode so that we could slow cook it.

Justin Mannhardt (08:04): I'm nervous.

Rob Collie (08:06): Enter the crock-pot of data. Here's the gist of it. I have been saying for a long time on this show and elsewhere that one of the core, if not the core principle that's driving Fabric, the most important thing about Fabric is that before Fabric, you build these semantic models, these Power BI models that would become the most intelligent source of information in your entire company about the data sources that are involved by far the most intelligent sources of information. And then, all of that intelligence was essentially trapped, almost like held hostage and only exposed to Power BI reports, which is amazing. This is a good problem to have.

Justin Mannhardt (08:50): If that's all you've got and you're getting value, you're just fine.

Rob Collie (08:54): All you are is ridiculously capable of answering every important question or noticing every important trend. That's a wonderful, wonderful place to be. And the fact is that before Power BI, these semantic models didn't exist. You didn't have anything resembling this source of intelligence, this level of intelligence. Many, many BI platforms had tried and failed to deliver on that promise and Power BI was really the first one to ever actually do it. The fact that it was then locked up by and hogged by Power BI reports is an afterthought in a way, but now that these semantic models are not everywhere, but they're permeating.

(09:34): If you zoom back from that, you realize what a tragic loss it is that so many other things in your organization that aren't Power BI reports aren't benefiting from that same intelligence.

Justin Mannhardt (09:47): I want to emphasize this point because sometimes I think it's easy for people that haven't gotten into the deep end of the pool on Power BI yet don't understand what we mean when I say this is the most intelligent thing about your business. I was on a call with a customer yesterday and we were doing a demo of Power BI and what it could do. And so, we flipped over and we were looking at the model view. And I just said, "This is why this is important." And I said, "We got a transactions table, we got our budget, we got our forecast." These things are all at different levels of granularity, and just their eyeballs lit up. They're like, "Oh, you mean if I want to do actuals versus forecasts, I don't have to aggregate my actuals to the same granularity as my forecast."

(10:31): I think if you're sitting there listening to this going, I don't really understand, there's a depth of possibility of bringing in all the data in your organization regardless of where it comes from, regardless of what granularity it's at, building rich metrics that go across those systems, combining them together. It's not like just a basic sales report. That's not what we're talking about. We're talking about another level of maturity and insight and possibility for any team or company.

Rob Collie (11:02): No, I love that. I'm glad you brought that up. This is something that bears repeating forever. Dear listener, if this is ringing bells in your head, we have an episode entitled something along the lines of Are you getting the most out of Power BI? Please go look that up. We want to talk about something else today in depth, but this is a great thing to remind people of. If you're thinking of Power BI as just a reporting tool, as just a visualization tool, as just a replacement for something like Tableau, this magic that we're talking about, the semantic model brings you is something that is so subtle. It's a nuanced concept, but its impact on your business is anything but subtle. It is not nuanced in its impact. It is nothing short of amazing like lightning in a bottle.

(11:48): Okay. In Fabric land, arguably, the single most important thing to know about Fabric is that what they're doing is they're saying, hey, that semantic model, that richness, that smartest thing you've got about your business should be open to many, many, many more things than just Power BI reports. And so, this whole deal where OneLake, their core data lake storage platform has taken on everything that's good about the Power BI storage format means that now your Power BI models, your semantic models, those models are now open to things like custom applications.

(12:28): They're open to things like machine learning and AI models, and the entire ecosystem now has access to that same intelligence. Now, I have been repeating that over and over and over again, but we've never stopped and drilled down and said, "Okay, what has really changed?" This is one of those things where I benefit from having a lot of history with the people involved, like a mirroring company. I can almost read between the lines and go, "Oh my God, I see what they're up to and it's amazing." Now that's great for me. In terms of communicating out to the rest of the world, I don't think I've done anything resembling a good enough job.

(13:07): In fact, I don't even know the answers. I don't need to. I'm so confident in what they're up to. And just to really drive home what I'm talking about here, if you wanted to write a custom application from scratch in, I don't know, C Sharp, pick a language, Python, and you wanted access to the goodness that was in a Power BI model, well, technically I've been making it sound like that wasn't possible before, but it actually was. There were APIs.

(13:34): I could call them and I could query up the richness that is the semantic model and get all the information that I need if it was not impossible before, but I'm talking like it's super possible now with Fabric. So, maybe we can get down to the details of that.

Justin Mannhardt (13:50): I'd love to.

Rob Collie (13:51): Getting down to the details, I will know some of the details maybe for the first time. I have a couple of theories that I've developed over the weekend. I'll test them on you.

Justin Mannhardt (13:58): Okay, good.

Rob Collie (13:59): That level of detail I think is really important to ground people.

Justin Mannhardt (14:04): Let me take a high level pass of the chapters in this book, use that analogy. It felt important to me coming into this conversation to bold that concept of the semantic model is the most intelligent asset with respect to your data and analytics and your business information that you have. My interpretation of what Microsoft is saying is yes, the semantic model is the apex of maturity with an analytic system, so we're going to organize around that. If it's the apex, that means it's the top of the mountain, and so you want to get to the top of the mountain, you got to go up the mountain.

(14:43): And then, when you're on top, you can see everywhere you want to go, and now you can go different places with it. One of the things you've mentioned in our previous episodes and just our one-on-one conversations is how powerful it is that Microsoft decided to standardize the storage layer of OneLake to be essentially the same storage layer that's in the model. That's one thing I'd like to unpack here for a second, going up the mountain part.

(15:10): We want to get to the apex, and then the birth of VertiPaq and the tabular engine columnar stored data and query retrieval was massive.

Rob Collie (15:21): I'm tingling now thinking back to the privilege in hindsight that I had to have had a ringside seat while that was being hatched while that was being born. I got to be there for that. I didn't contribute to it, but I was there. Oh, it's so cool.

Justin Mannhardt (15:37): Really momentous. As technology has evolved, after that point, lake houses became a thing and Hadoop became a thing, all these things. And eventually, the world started gravitating around the storage format called Parquet. Parquet is a columnar storage format. The point of this is collective, we as an industry realize this is a really good way to store data for analytical processing workloads.

Rob Collie (16:05): Every time I hear the word Parquet, the first thing I think of is these terrible commercials from the '70s that were advertising margarine that was called Parkay. And there was this thing, the bowl, the tub in the fridge would say butter and Parkay. It's not spelled like that. It's more of the floor type of Parquet.

Justin Mannhardt (16:23): P-A-R-Q-U-E-T. Yeah.

Rob Collie (16:27): Like wood floor pattern. Does that refer to like the columnar compression type of look?

Justin Mannhardt (16:34): Yes.

Rob Collie (16:34): That makes sense to me. Parquet is a Microsoft concept essentially.

Justin Mannhardt (16:40): I don't know. The point I want to make here is I think arguably started as columnar compression with tabular and VertiPaq, analytical technologies have advanced. Microsoft and Fabric is saying, "We've been right about this all along. We want our ecosystem to plug and play with everything else." Okay, great. Let's make OneLake. Let's standardize on Parquet. We're going to make VertiPaq. I'm sure there was challenges involved with this, and so now we're saying this is how it's going to be.

(17:13): This is like the getting up the mountain stuff to me, because OneLake removes so many barriers of just getting your data in and then making it available to build a model in the first place, the opportunity to eliminate a huge major component of refresh cycle times, that's really impactful.

Rob Collie (17:34): Let me try to even further clarify something here. You're right, we've been talking about hey, the storage format for OneLake is now taken on the properties of what the storage, all the good stuff about the VertiPaq storage format that was in Power BI. Even that, why is that significant? If you're storing things in two places, if you're storing it in a Power BI squirreled away format and you're storing it in a data lake, well, which one's more up to date? You don't want two definitive copies of anything. If OneLake can now natively achieve all of the same things in its storage format that are required to make Power BI operate properly, then Power BI no longer has to hold exclusive possession of the data.

(18:17): It can now say hey, I'm letting OneLake do the storage part for all of the... DAX engine has access to that. The DAX engine's going to be pulling data from there to run its calculations as opposed to pulling it from the private Power BI storage format. So, that's part of the story of why that's important. The other reason why it's important is that there are a million expectations that are placed on, when you say that this is my data lake, is my data lake storage format, you're implicitly promising as a software vendor, Microsoft.

(18:47): You're making an implicit promise that says like hey, in all the things you're expecting about data lakes in terms of ingestion, data ingestion, data fetching, querying, availability, security, all the interfaces you're used to are also going to be there. And the vast majority of those same expectations were not met by the Power BI squirreled away format.

Justin Mannhardt (19:11): That is correct. Let's take an example, and this will fall under the general category of refresh, and this typically applies when you have either large volumes of data or very frequently changing data. What's a common pattern and technique in these scenarios is to update only a part of a table, either incrementally just the most recent transactions or if there was an update to a particular segment. And so, technically, we've referred to this as partitions. And partitions have been around in databases for a very long time, but affecting partitions in a Power BI model was like wickedly advanced at times.

(19:55): And so, if you've got a situation where you do have an engineering team and they are managing the data coming into OneLake, there is an expectation of, "Oh, I can partition the data and very conveniently update, remove, change only a sliver of it," and then not require a complete full refresh of the Power BI model to get that change in.

Rob Collie (20:17): And that sounds like, because it is. It sounds like in a way just hard, but not rocket science work necessarily for Microsoft to make this possible. There might be some rocket science in it, honestly. There probably is, but they never had a reason to take on that engineering challenge until they had this epiphany that the semantic model needed to become a lot more central, a lot more available.

Justin Mannhardt (20:45): A macro-level idea for me with Fabric is every possible set of tooling or capability that I might have a need for in the course of building analytics solutions is all there and it's dramatically simplified. I can get up the mountain to that semantic model faster.

Rob Collie (21:07): To really drive home this example, one of the other things we've been saying repeatedly is that pre-Fabric, if you went and built yourself a wonderful semantic model and it was the richest, smartest thing ever about your company and you were using that to power reports and hey, everything was great. But now you wanted to do some machine learning or AI project that essentially covers or is about that same exact domain. Everything that you need to power your AI model, your AI project is in the semantic model. But before Fabric, you would have to go and essentially recreate the intelligence of the semantic model in a completely parallel sense using a completely different tech stack, which by the way, it sounds awful to have to repeat the work.

(22:01): But it's not just repeating the work because you're repeating the work with a completely different skillset, toolset, which means different people have to be involved who weren't involved in the original may or may not be able to understand the intent of the original even, or you've got a communication problem. All of that. And then, another thing about it is that the semantic model, because it's able to be asked completely different shaped questions without rebuilding the semantic model, these other parallel languages that you'd have to use to build the pipeline of data to fuel your machine learning models, et cetera, they're not built in the same way.

(22:34): If you run into a dead end in that pipeline, you've got more plumbing work to go back and redo in order to provide a different shape of result. It's just not going to ever be as good and it's going to be just a tremendous amount of wasted duplicative work. This is one of the examples we've been giving. This is why the first thing on the list of what Fabric does for you is it gets you to the doorstep of AI overnight, whereas you were on different continent before. Using that example, we can crystallize some of the things we were just saying.

(23:04): Let's pretend for a moment that this AI thing you were building was going to be completely custom. You were going to write it all from scratch and whatever cool kid language. Is it PySpark?

Justin Mannhardt (23:12): Yeah.

Rob Collie (23:14): You're not going to use anything off the shelf at all.

Justin Mannhardt (23:17): This is going to be like a life goal for Justin, sit you down and we're going to write a notebook together. And I know you're going to walk away from it being like, "Yeah, I knew I hated this." But I'm just going to make you do it.

Rob Collie (23:24): Hey, what's worst case scenario? I come out of it going, "Oh, there's something I hate more than. It's like a championship fight in this corner.

Justin Mannhardt (23:36): Sign me up

Rob Collie (23:38): This example, building some AI model essentially from scratch. Well, we won't have to, in the OneLake world, you don't have to do any of that. The types of things you were talking about before, like partitions, et cetera, all these things, there's so many things that the old Power BI model, while technically not making it impossible, was going to make it just so incredibly tedious, difficult and awkward that the people who are capable of doing the work in the first place, the PySpark people are going to be so turned off by it.

Justin Mannhardt (24:11): Yes, totally.

Rob Collie (24:13): That it's really dead in the water. It never gets started.

Justin Mannhardt (24:15): If you think about big boxes and arrows, like the process of building a machine learning solution, step one, I need to curate and assemble the data at the right level of granularity with the right features, all dimensions with the right metrics. That process is time-consuming in a semantic model where we have all this richness and this ability to create interesting calculations that go across different events in the business. So, those, we call them measures in a semantic model. The answers to those things, they're not materialized in the storage layer. They're produced on read query on read type situation.

(24:53): And so, in the past, you could still do something like this, but what happened is you would say, "Oh, you can call this API and one of the parameters you need to pass in the body of this request is a DAX query and a table query," not like, go get me this measure.

Rob Collie (25:12): It's the same exact type of DAX query that the table visual in Power BI has to create for you. You never see it. You author this DAX query without realizing it by dragging things out of the field list by setting filters and things of that sort. Like behind the scenes, it's making this DAX query for you, sending it to the semantic model, getting the results back and populating the table visual.

Justin Mannhardt (25:35): And people like our consultants, we understand all this stuff. We can look at these queries and we understand what's going on. But a data scientist, as soon as you're like, "Oh, we need to put the DAX query here," what's DAX and how does this... and then that's that friction that gets introduced. And it's just like ooh-ah.

Rob Collie (25:55): But there's two friction in terms of trying to pull this off with the old Power BI model. And I think one of those two frictions does carry through to the Fabric world and one of them does. This is my hypothesis, so let me test this on you. A lot of the things we've been talking about are things like the partitioning, and the refresh schedule and all of that. The old Power BI model is sitting there with a big fat refresh button on it that you can call programmatically if you want, but it's so opaque it's going to go and do magic and come back and tell you something that it succeeded. You couldn't do nearly the granular things with it that you would want because it wasn't built for you.

(26:34): It wasn't built for your programmatic solution scenario. It was built for scheduled refresh and the security model, that Power BI file, that model was published to a workspace. Yes, you can programmatically access that workspace via APIs. But again, just like layer after layer of new languages you have to speak. And not only are they new ones you have to learn, they're not even convenient for the types of tasks you want to perform. They're not optimized for those at all.

(27:04): And all of that stuff goes away in the Fabric world because now, the OneLake storage format is meeting all of your expectations that other data lakes met just whoosh, like almost overnight, all those different languages and their awkwardness go away. But if you want the intelligence out of the semantic model, which is the whole reason you came to this party, you're going to have to query said semantic model in the same way that the DAX visual, like the table visual in Power BI.

(27:36): If you want a rectangle back of data, fine, you can get that. In fact, that's what you're going to want. But that rectangle isn't one of the original tables from the relationship view. It's not like this dimension table or that fact table. It's some blend that's created by the measure space in DAX. And so, I think it seems to me, things that were formerly super, super inconvenient are now going to be convenient, and then you're still going to have to plop the DAX query somewhere in your code.

Justin Mannhardt (28:10): This is where I think I get to change your hypothesis or debunk an assumption to an extent. So, yes, that's where the world has been. Here's another place where we can answer the what specifically has changed question. If I am a data scientist and my objective is to build a machine learning solution that's going to help us predict the price of a commodity, if a customer's going to churn, I'm building a machine learning solution for that, 99 times out of 99, I am using PySpark notebooks to write the code for this machine learning solution. The data coming in to my solution very well could and honestly should be sourced from something that has the level of intelligence as a semantic model.

(28:56): One of the reasons why the language Python is just so widely popular is because of the sheer amount of libraries that make that programming language relatively easy for people to learn and understand. Within Fabric, Microsoft released a new library, and that library is called semantic link. Someone who writes Python doesn't know Jack about DAX. They now have a library that they can write one line of code that says what are the tables in this model?

(29:34): What are the measures in this model? Within this table, what are the fields that are available to me? And then, I can build data frames using measures and columns without knowing one lick of DAX.

Rob Collie (29:48): In my code, I'm essentially operating the field list.

Justin Mannhardt (29:52): Yes, you're operating the field lists and you're using Python and you're working in a way you would work with any other type of data frame.

Rob Collie (29:59): In the same way that I think of DAX as a calculated column and measure language that's slowly, slowly, slowly bleeding into becoming a query language in the back of my head, places I don't even like to think about in 20 years, it would all converge for me, but really I never write DAX queries. I don't need to. The field list does it for me when I want to assemble my table visual and all I'm doing is picking from a menu, could be this, this, and this and put them in this order, and it's figuring out the DAX query for me.

(30:30): What you're describing to me in semantic link is essentially an interface of programmatic interface where people can write code and pick from that same menu. It's like they're writing code, but their level of DAX knowledge doesn't have to go beyond. Well, it doesn't have to have any because all you're doing is checking checkboxes.

Justin Mannhardt (30:47): It is the equivalent of dragging things onto the canvas and producing a table. Because if I want to build the basics of machine learning, I take a table of data and I feed it through an algorithm to predict inputs in outputs out. So, yeah, I can now just pick from that list of what I want without needing to learn any of that. Some of the measures that you can create in these models are very rich and deep.

(31:10): We use the term base measures when we teach, then they're built on top and all these layers of abstraction that are going on to get to that same answer from the very beginning, that's the rework you're describing and that can be very difficult without the semantic model. I just wanted to throw that in there too.

Rob Collie (31:31): The semantic link thing, and it's essentially programmatic version of the field list picking from a menu, that matches up one of my other intuitive expectations, which is that Microsoft is going to offer these PySpark developer types who's going to offer them some interface that's equivalent to the field list. If I get to benefit from a don't need to know DAX query, even though I know how to write some subset of DAX, of course those developers, I was even thinking honestly that it would be like some standalone tool where they could author the query like using the field list and then copy paste. Semantic link is more elegant than that.

(32:06): So, now I know the label semantic link that attaches to that void that I was intuitively expecting them to fill. That's really cool.

Justin Mannhardt (32:14): Yeah, it is. When I first was introduced to semantic link and realized what that could do, it reminded me of a few projects I've worked on where we wanted to use the semantic model to find the best opportunity we had for something. The answer to that question was never the same exact set of column combinations. It's like I want to look at it by customer and geography, compared to customer geography and product or just customer and product, because I'm looking for where is our opportunity greatest? Where's the variances, where's the best value here?

(32:49): And so, now we're talking about three, four, five, six different types of DAX queries where I'm trying to compare all this. And I was like, "Wow", semantic link within Python and PySpark, I was like, "Oh, I could do that now," without having to materialize this big gross calculated table in my model.

Rob Collie (33:06): That's really cool. Just to illustrate, one of the things I said a few minutes ago was if you're building up towards some machine learning model in PySpark and you're having to build a completely separate data pipeline in parallel to your semantic model, and you're making some guesses about what your code's going to need, so you're building a pipeline that's going to produce a rectangle of a particular shape. And then, when you get to the finish line, you realize that that's not the rectangle you need, use something different.

(33:32): Now you've got to go rip up most of your plumbing work and replace it with different plumbing work. Whereas in this new world, you go, "Oh, I've got this semantic link and I've checked these checkboxes "in my code" to select these things from the menu. I'm just going to select different things from the menu." It's like 10 seconds of modifying your code in a relatively trivial way and you're off and running again.

Justin Mannhardt (33:56): And that's really common when you're building a machine learning exercise. You're trying to figure out what's the right combination of things that optimize the right prediction at the lowest error rate. You are like you're adding things in, removing things out, comparing those results.

Rob Collie (34:11): There's this whole audience out there like the PySpark machine learning crowd who they're probably by default thinking, "Oh crap, now I've got to go work with this other Microsoft thing. Someone's going to want me to work with one of these semantic models instead of the things I'm used to," but they are about to fall in love.

Justin Mannhardt (34:32): Oh, yeah.

Rob Collie (34:33): If you're taking it from the perspective of the business leader who has supervised or witnessed the creation of the semantic model for Power BI purposes, and then from their perspective, leveraging that investment to very rapidly and cost-effectively achieve a win in the machine learning space, the AI space, great. If you're the data engineer or the data scientist in that story who's now being told here, you're going to work with this thing instead of the way you normally do it, this is going to be your source information, I would expect at first that data scientist to look at that and go, "Ugh, ick, intrusion, interloper." Like some number of days later sit back and going, "Oh wow, son of a bitch. This is just better."

Justin Mannhardt (35:17): Well, I think this is a classic point of emphasis even going so many years back and how we've been just teaching Power BI. Once you realize I have this collection of measures now that are rich and they're doing the right comparisons and they're giving me the actual valuable nuggets out of all of this information. When people realize like, "Oh, I can calculate that metric for any conceivable combination of columns in this model without going back and building the model over and over again," that same realization I think is true for this audience too, where they're like, "Wow. If I want to look at a year-over-year Delta at a different granularity, a different group of columns, I don't need to go redefine the calculation of the year-over-year Delta.

(36:03): I'm going to switch out a couple columns in my code. I just have all this power and flexibility to experiment and iterate, not with the anxiety and the pressure of like I really got to get this right from the jump."

Rob Collie (36:14): Yeah, it's the age-old thing. You don't know exactly what you're going to need. So many systems force you to build the rectangle of data, the rectangle of data before you can find anything out, and the semantic model basically gives you every conceivable rectangle you would ever need. And it's just at the click of a button where you can reshape what it's giving you. And that level of flexibility is truly what makes Power BI a better reporting tool than its competitors. It's not the visualization layer.

(36:51): The visualization layer is just a client of the goodness underneath, which is the semantic model, and now that same goodness is about to be brought to the data science world. As a side effect, this is even more brilliant for Microsoft I'm realizing because now this is going to be the reason why all the data science people start to finally become enthusiastic about the semantic model. It's like a unite the clans moment.

Justin Mannhardt (37:11): Again, because the semantic model is this apex artifact in your analytics stack, and we've talked about faucets first and all that stuff. It was just always so hard to get there. Just like people that needed reports were out there just saying, "Just put a faucet in over here." That's what's happening in this side of the world too. So, yeah, I think that's really cool. This is another great example of something specific that's been introduced with the semantic link library.

Rob Collie (37:37): We've talked about now the damn near 100% custom code scenario. One of our other instincts or one of my instincts, I think, for you, it's more than instinct, it's called confidence and knowledge in your head, is that there are many other low-code slash no-code components in the overall Azure framework in the overall fabric framework where you're not writing things completely from scratch. Can we pick an example of something like this?

Justin Mannhardt (38:09): A quick little shout out to the Power Apps crowd because Fabric OneLake, Dataverse, which is the backend that's coming using Power Apps, those all play nice together now too.

Rob Collie (38:21): And they didn't before?

Justin Mannhardt (38:22): No. Before OneLake, there was really nothing going on there.

Rob Collie (38:27): Wow. I never peeled back that curtain.

Justin Mannhardt (38:30): But I was just like, "Oh God, thank God." Finally, we could build the reporting and the write back and the application all on the same set of ingredients that were, "Oh crap, we forgot to tell Ed that we added a column to the customer table, and they expect that to show up." That's cool that Dataverse integration is there. The other thing that's top of mind for me that's in this low code, really cool capability, it hasn't been getting a lot of marketing traction lately for whatever reason and I think there's a lot to be desired still on this, but I'm hopeful it keeps going, is the data activator stuff.

(39:08): Quick reminder, if you haven't heard that term yet, data activator is a workload in Fabric that's designed to produce data-driven triggers and alerts based on your data. And so, this was a pain in the butt, almost like impossible. The idea here is with data activator, you can say, "Hey, whenever this data refreshes, I want you to look at this and this and this, and if these conditions are met, you got to notify Rob about this.

(39:41): You got to notify Luke about that, or you got to kick off this power automate workflow that does some other process." And to be able to do that at scale without writing a whole bunch of code is pretty neat. It hasn't been talked about a whole lot recently, and I think it should be talked about a little more.

Rob Collie (39:57): I agree. The thing about activator that makes it a less impactful example of this is because it's brand new and it's something we could have imagined them building into Power BI itself. You could imagine something good like activator just being a feature of Power BI and not requiring the OneLake thing. The Power Apps example, which I didn't expect is actually perfect because I already thought Power Apps was about as integrated as it could get. It turns out not really.

(40:26): If I understand correctly, Power Apps had access to the source tables, the individual tables making up. You could write back to a dimension table or something like that, but it didn't have the same intelligence of the semantic model that you would want.

Justin Mannhardt (40:41): So, let's use that example. There is and has been for a while a Power Apps visual in Power BI. And so, that's the idea when... you might've heard me say like, "Oh, we could put a Power App in your Power BI report." That's what that is. All that Power App visual does, it understands the filter context in the report so that it can use that back in the Power App. So, pre-Fabric, you just brought up an example like oh, I assumed it could write back to a dimension table. Okay, so you got a table that's in your tabular model that got its data from somewhere.

Rob Collie (41:22): And you have to know that somewhere like written down in the code.

Justin Mannhardt (41:26): This maybe isn't a great example, maybe we'll find our way there of like, "Oh, the power of the semantic model as much as the power of the integration." When I build a semantic model, of course I'm dealing with common master data, lists of customers, list of locations, geography, people, transactions, Power Apps need that too. Now, instead of them also going off and building another thing that has all that same stuff, you just say, "Hey, we've already got that here." And then, they could read your transactional tables and similarly to the way you could use semantic link and PySpark, I don't think it's as pretty.

(42:03): You can query data. I think it's more in the vein of we can build applications that are interfacing with our OneLake data that may also being read by our models and reporting, one common foundation, one copy of the data.

Rob Collie (42:17): Well, let me ask a really naive question. We've talked about the written from scratch PySpark version of machine learning, AI, whatever. I've heard rumblings for years from Microsoft about low code, no code type of approaches to machine learning that are meant to bring heavy duty data science, some subset of it anyway within reach of the citizen data scientist. He's not fully down that rabbit hole. First of all, are there such components in the platform?

Justin Mannhardt (42:52): In Fabric, yeah.

Rob Collie (42:54): But before Fabric, there were Fisher Price versions of machine learning in the broader Azure platform. But my expectation is that those things, if you go back three years, let's say, you could sit down and tell it in some vague terms what you're looking for. And it would give you some results without having to write the PySpark. But those things, those components couldn't talk to a semantic model. Again, just like the PySpark example, it needed a completely separate pipeline. You could only point those citizen data scientist tools at data lakes.

(43:30): You couldn't point them at Power BI semantic models. In the new world, if the similar components like the low-code machine learning type stuff, if Microsoft behind the scenes is teaching them how to talk to semantic models, that in itself is a huge deal.

Justin Mannhardt (43:45): What you were referring to is there had been some low code capability to do some machine learning things in Power BI before Fabric, and that's true, and what that was is you could use components of Azure AutoML.

Rob Collie (43:59): AutoML. That's the thing.

Justin Mannhardt (44:01): Or cognitive services, but specifically in the context of Power BI data flows. We're in the Power Query world. I could build a table and then I could add a column and say, "I want you to do..." there was a small number, I think three to five types of algorithms you could pick, I want do classification, I want to do a sentiment analysis or that capability exists, and we should link this in the show notes. A couple of years ago, we built the predictors for the NCAA tournament brackets, the men's and women's. That was built with what I'm describing right now.

Rob Collie (44:36): Okay, cool.

Justin Mannhardt (44:37): That's an example of that. I don't know this for sure. I don't have any evidence of this. Now we're moving into Fabric. There's what we call Dataflow's Legacy, and now this thing called Dataflow's Gen2. And so, I'm waiting to see is the AutoML data wrangler stuff going to all meet up here?

Rob Collie (44:58): Here's where I can confidently answer them and say it's just a matter of time. Whether it's there already or whether that's going to be true in the near future, it's a certainty in my opinion. This is the whole thing. This is the whole philosophy. This is the whole thing that they're doing here.

Justin Mannhardt (45:14): When you think about the mathematics involved in machine learning, I got to source the data, decide which features I'm going to be using to try and train this model. Then as the model gets trained, I'm trying to optimize the error rate, and these are mathematical concepts. When I was talking about semantic link and why Python is popular because all these libraries, I'm making this up, it probably used to be 50 lines of code to write a logistic regression function. Now it's like there's syntax sugar all over the place for this thing, and nobody needs to worry about how to optimize parameters. It's all there. The building blocks are all in place. I would put my chips on that card with you.

Rob Collie (45:54): And here's the thing. If the semantic model is good enough for the completely custom, highly, highly technical world of writing your own PySpark-driven models, then it's also perfectly good enough for the more citizen developer type tools in the same space. This is their promise to the world is that the semantic model is going to be available as a first-class citizen data source and destination both in and out to everything that you would ever care about. Everything important in our platform. I'm very confident in this prediction, which might already even be true.

Justin Mannhardt (46:32): There's just so much flowing around information-wise. We just had the conference. Social media is always buzzing and hitting the hype cycle about things. That's the main point here is it's been very clear that the semantic model is the apex of solution building in this whole ecosystem. The real-time analytics or the Kusto database stuff that's in Fabric two, that's different. It's not really part of the semantic model world necessarily. You could get data in and out, but Kusto's meant for more like your Verizon and you got cell towers all over the world and bazillions of cell phones. And you're just querying this incoming data in real time. It's a different application, powerful, but very different.

Rob Collie (47:18): There are two really important, I think, summing up points that I wanted to talk about. One of them is why is this so confusing? Why do we need to have a podcast episode where we talk about this, where we demystify this? Why is it even necessary? It is necessary, but why? I think that's going to be really helpful to people as well. We haven't gotten too deep into it, but it's been a lot of shoptalk, a lot of technical stuff. So, first, let's talk about why it's so damn confusing.

(47:45): I think it's this, in the same way that earlier, we said that the difference between having a semantic model versus a flat rectangle of stored data, the distinction at a technical level or even at a conceptual level is pretty nuanced and fuzzy. It's not clear that you have to live it to know the difference, but once you've lived it, the impact is anything but subtle. It's hard to understand ahead of time, but then afterwards it's like, "Oh my God, I totally get it now." I think Microsoft with their whole Fabric initiative understands that.

(48:19): On the one hand, they could come out and just describe to us in very plain English terms what they're actually doing. Instead of having a podcast like me and you sitting down talking about what's really changed, they could come out. And that could be their whole marketing push. Just be completely transparent. Here's what we're doing. We think it's a big deal, whatever, but people aren't going to get it at the visceral level of how amazing it's going to be.

(48:48): So, instead, they go with the big splashy but vague marketing push Fabric changes the game. All jokes about the word Fabric aside and how it's been used elsewhere and data platform data ipsum before, the fabric of reality, the fabric of the heavens, the fabric of your business.

Justin Mannhardt (49:08): The fabric of our lives.

Rob Collie (49:09): Exactly. They are making a big, bold emotional statement here. They're trying to evoke the feelings, the ambitions that you are going to have. Once you try it out, it starts to dawn on you. They're trying to skip ahead, and that is almost certainly 100% the right choice for them to make in terms of how they market this thing.

Justin Mannhardt (49:36): Oh yeah, the data platform built for the age of AI.

Rob Collie (49:39): But that comes at the expense of clarity. The other approach saying okay, look folks, here's what we've done. All the good stuff about Power BI, We've put it into our data lake format. That means that it's accessible to these APIs now, which it wasn't before. That would be a really dull freaking press release. It would be the thing that you would put in the release notes of a Power BI monthly update.

Justin Mannhardt (50:03): I wouldn't read it.

Rob Collie (50:04): But it actually is a ton of work. They were working on it for years before they finally unveiled what they were doing publicly. It was a ton of work, and the impact it's going to have is really going to change the world. To roll it out in the clear way would be a mistake for them as a company. They're like, "Okay, look, we're going to roll it out in the vague splashy way," and people will figure out the clear over time.

Justin Mannhardt (50:30): What else I think is proper with this is the way they're communicating about it, encouraging people to get on the wave, so to speak, is there's lots of entry points to it, and I actually think that's really smart. As much as we might say, "Oh yeah, the semantic models." Just the idea of you're working with another data lake platform, oh, this is the same type of stuff and it's easier. It's just very inviting, and so it's just of allowing people to come into a world where it's more likely they'll realize that there's this apex asset that everybody can participate with either in creating them, leveraging them, and getting more business value out of.

(51:11): That's probably one way of me saying this is my favorite part about this thing called Fabric, is it's brought all of that together.

Rob Collie (51:19): And that again is the right technical choice. It makes things more confusing, but it's okay. The confusion is temporary. The benefits are forever. Business value. Yeah. I think to a certain extent, you could read between the lines here. First of all, if you are well-invested in Power BI currently, or you're about to rock, you're embarking down your Power BI journey as an organization. Either way, you're going to receive a lot of benefit from that investment that goes beyond just what the reports give you. Getting the reports, the dashboards out of Power BI is more than enough payoff dramatically so for the price of admission, it's worth it.

(52:02): Now, Microsoft is saying, "And guess what? There's more." In particular, the things we've been talking about today, very specifically like all the things that have changed, is there any way that we could viscerally translate those things into business value? It's hard because a lot of the things we talked about are activities that businesses aren't currently engaging in. A lot of people are still not "doing" AI. The business benefit of all these things we're talking about is that maybe you can do AI in the near future without great costs, without great risk.

Justin Mannhardt (52:39): Yeah. I think the business values, you can approach AI and machine learning with a much lower risk profile than you could have before with not much additional staff resourcing cost. It's in range as we like to say. The other business value, the agility and the pace that you could operate with has seismically changed. We were already able to go fast and not so much that we can go faster, but you can also go in different directions. Before we'd get to a point and be like, "Oh my gosh, wouldn't it be cool if we could take this data and go do this other thing?"

(53:16): We're like, "Yeah, let's go write an XMLA command in PowerShell to do da-da-da-da-da-da-da." And so, then, you ran into these false brick walls and limitations. I think you've got more agility to go in different directions. It's almost been a year when we did our webinar, when I first came out and we had this idea of the light bulbs. You remember this?

Rob Collie (53:37): Yes.

Justin Mannhardt (53:37): That idea, again, you can entertain a wider range of good ideas at higher tempo with lower risk.

Rob Collie (53:45): Yeah, let me amplify that a little bit. Talking to a broad audience, everyone listening to this, everything about their business is different than everyone else's. Oftentimes, most of the time talking about business value impact, you need to talk about that in the context of that particular business. It's very hard to come up with examples that resonate with everyone that are understandable by everyone and also relevant to everyone. That's the age-old challenge in this space. If you just think of the term custom applications, and it doesn't have to even be AI, anything you could imagine that could be realized in software that would benefit from your semantic model. Again, it could be read, write.

(54:21): It doesn't have to be just reading from your semantic model. It can be adding to, it can be making changes. It can be creating brand new tables in it with completely different data sources. Imagine an iOS app on someone's phone that is commercially available, that is powered by a data source that your company has unique access to, or data sources and a semantic model that you've built about an industry or something like that. Fabric makes that thing possible.

(54:50): So, custom application is super, super, super vague. It needs to be, whatever you can imagine that would benefit from access to a semantic model is now super, super, super practical, which would've been very, very awkward.

Justin Mannhardt (55:05): Awkward is the right word.

Rob Collie (55:06): You would've been hacking essentially the platform to make it do what you wanted as opposed to using it for its intended purpose. The universe of possibility just completely wide open and unrestrained, and who knows what's going to come out of that. There's at least one person listening right now who's hearing that custom application phrase and going, "Oh, right, we could do that thing." And so, if you're listening right now and you're not having that moment, just know that someone else is, and that means that your version of that moment might be coming.

(55:35): That realization might be coming for you, like, "Oh, we could go do X, Y, Z that has nothing to do with dashboards, has nothing to do with AI even." It's that other thing, and that's now in play. So, building off of that custom apps thing, the Power App angle that you brought up, the fact that Power Apps are going to become less awkward to write. It's easier to integrate them more closely. The fact that business intelligence itself, the art of being informed is just a means to an end, and it's all about action.

(56:05): And then, even action, it's an intermediate state, an intermediate step to get to improvement. The I in BI should have always stood for improvement. The ability to make your BI a state more read, write, more action oriented.

Justin Mannhardt (56:20): Huge.

Rob Collie (56:21): The simple examples we talked about, you develop an action plan, you're looking at a dashboard and you say, "Okay, we need to fix that," that there needs to be improved or that there is going so well, we need to replicate that success elsewhere. So, you make an action plan, you assign an owner, you write down what we're going to do about it and attaching that to the right place in the report, the right bar in the right chart. So that next time you hover over it or whatever, you get that context. Anything that makes even something like that, which is just an annotation that makes it easier brings it closer. That thing is moving like 50% closer.

(56:56): A lot of the AI scenarios are moving like 95% to 98% closer it sounds like, but this one was already close enough, but it's getting closer even. You're only going to be limited by your imagination. Now, imagination is a frilly, non-businessy, like non-solid thing. However, imagination is only frilly until it produces the good solid idea or a good solid idea. Imagination's all blue sky and pixie dust and fairytales until the inspiration hits. And when the inspiration hits and you can actually execute on it, that's a big deal.

(57:33): Imagination is like a business requirement, but if it always stays in rainbows and pixie dust land and never gets real, it never produces real ideas, then that's where it gets its reputation as being a frilly thing. But wow, it so often doesn't end that way. It so often does produce inspirations and the fact that the platform's now going to be built in a way that's never going to limit those inspirations, everyone's journey with this is going to be different.

(57:57): The specifics are going to be different, but there's going to be some amazing things to come out of it, and we're going to get better at talking about these as a class because we're going to have more examples.

Justin Mannhardt (58:07): We're going to build some stuff. We're going to find out what's awesome.

Rob Collie (58:10): All right, well, let's go back to discovering those things. Being awesome. Sounds like a good way to spend the rest of the week.

Justin Mannhardt (58:15): Good chat with you, Rob. I'm going to go back to being awesome.

Rob Collie (58:18): Go back to being awesome. Yeah.

Speaker 3 (58:20): Thanks for listening to the Raw Data by P3 Adaptive podcast. Let the experts at P3 Adaptive help your business. Just go to p3adaptive.com. Have a data day.

Subscribe to the podcast
  • This field is for validation purposes and should be left unchanged.

Other Episodes

Copy link
Powered by Social Snap