episode 154
Finding The Sweet Spot: When Is Your Power BI Model Ready for AI and ML?
episode 154
Finding The Sweet Spot: When Is Your Power BI Model Ready for AI and ML?
Step into the future of data analytics with Rob Collie and Justin Mannhardt in this week’s episode. Together, they unravel the critical intersection of Power BI modeling with the advanced capabilities of Artificial Intelligence (AI) and Machine Machine Learning (ML). Explore what it takes to elevate your data from robust to revolutionary.
Rob and Justin zero in on the crucial factors that signal your Power BI model’s readiness for AI and ML enhancements. They cover everything from the importance of data quality to the need for scalable systems, offering a detailed roadmap for those looking to make the leap. With their combined expertise, they provide real-life examples and practical tips to help you assess and enhance your analytics frameworks.
Plus, the episode dives into the broader implications of AI and ML in the analytics space, revealing how these technologies are reshaping our approach to data and decision-making. Join Rob and Justin for a compelling discussion that not only demystifies advanced analytics but also shows how accessible and impactful they can be.
And, as always, if you enjoy the episode, leave us a review on your favorite podcast platform to help new listeners find our show!
Episode Transcript
Rob Collie (00:00): Hello, friends. Whether you're a business leader or a data practitioner, every day you're exposed to messaging that says something along the lines of, "You're falling behind." Now this is particularly acute when it comes to AI, of course. Sometimes the messaging is explicit, like when some LinkedIn influencer says something like, "If you're not already using AI, your competitors are already pulling away." Other times, it's a bit more subtle. Like another influencer, again, usually on LinkedIn, is rattling off all these amazing things that they're already doing with AI. That flavor is really just as bad, because you immediately connect the dots with your own experience and you say, "Well, I'm not doing those things." Even worse, you often also will say, "I don't think I can do those things."
(00:49): In both cases though, I want to remind you that messaging like that is designed to make you feel inadequate. That's what they want. They want to create an insecurity in you, that then they can exploit later for money. As a bonus for them, on the way to exploiting you for money, that kind of messaging also immediately, right from the get go, sets up a power dynamic where they hold all the cards. More often that not, if you took the time to dig super deep with a critical eye in to the things they were talking about, you'd discover that they are, in large part, bluffing. Their chief skill is the influencer game, rather than the actual subject matter in which they profess to be the experts.
(01:36): This is the thing about confident practitioners. Confident practitioners can communicate with you in ways, they can play the social media game, or they can talk to you face-to-face, or whatever, an email. They can talk to you in ways that lift you up, rather than making you feel bad about yourself. Try to remember that next time you encounter that deliberate toxicity. It's done for a reason, and the expertise hiding behind the curtain isn't necessarily what it's portrayed to be. You're doing fine.
(02:08): Okay, setting that reminder aside for a moment, you do get plenty of messaging like that, every day, that's warning you not to be too late to the AI party. In today's episode though, Justin turn the tables on all of that, and explain to me that there is such a thing as jumping off from Power BI and into AI to soon.
(02:29): Did you ever see the movie Inside Out? It's a fantastic movie, by the way. If you haven't seen it, you really should. Anyway, there's this concept in the movie where these marbles representing memories, sometimes they get sent downstairs in the brain to permanent memory storage. Whereas other marbles are just representing fleeting, short term storage. That turns out to be a pretty good mental model for how I learn, and I have a very stingy filter. The referee in my brain that decides when the marble gets sent downstairs to the longterm storage, it has a very high bar. Very few marbles get sent downstairs for permanent storage. But the ones that do for me are significant, because they represent these succinct moments of clarity that I know will be applicable moving forward.
(03:17): I build my longterm picture of tech perhaps a little bit slower than some people, but I like to think that I more than make up for that in clarity. Well, today during this conversation, I sent another marble downstairs to longterm storage. I only do this maybe every couple of weeks. This conversation changed me. In a modest way perhaps, but I actually suspect it was significant. And like any good lesson worthy of longterm storage, this one comes with clear examples and guidelines on how to practice it. Let's go make that marble.
Speaker 2 (03:51): Ladies and gentlemen, may I have your attention please?
Speaker 3 (03:55): This is the Raw Data by P3 Adaptive Podcast. With your host, Rob Collie, and your cohost, Justin Mannhardt. Find out what the experts at P3 Adaptive can do for your business. Just go to p3adaptive.com. Raw Data by P3 Adaptive. Down-to-Earth conversations about data, tech, and biz impact.
Rob Collie (04:25): Hey, it's been a little while, Justin, since we've done one of these. We've had a series of episodes with guests.
Justin Mannhardt (04:30): That's right.
Rob Collie (04:31): We've had solo podcasts with me. But it's been a little while since it's been just you and I. I think maybe the last one of these that we did, we kicked it off with a little non-data banter.
Justin Mannhardt (04:41): We need a tagline.
Rob Collie (04:43): A sound effect.
Justin Mannhardt (04:43): Banter Time.
Rob Collie (04:46): Or we could get MC Hammer to do a cameo for us. Stop.
Justin Mannhardt (04:50): There you go.
Rob Collie (04:50): Banter Time. Put it on the backlog, yeah. Put it on the backlog.
(04:57): As we were thinking about what we were going to do today, you roughly had an idea.
Justin Mannhardt (05:00): Yeah. I had a chance to listen to the episode that comes right before this one, where you talked about how you're using Power BI and Excel Online to track information related to Jocelyn's health, and how that's helping y'all have a better quality of life, and track, and understand. And even working with medical professionals to get better results with this thing.
Rob Collie (05:20): Yeah.
Justin Mannhardt (05:21): Just wow, really interesting. Also, really interesting parallels to the business world. The clarity of where things are really moving, and where you really need to focus, that was so relatable to me.
(05:33): You said a couple things in there. You mentioned something along the lines of, "Now I think I maybe need to be doing machine learning." It was curious to me. We had a little conversation about that because I think that's a really interesting topic, because we've been talking about how Fabric presents a similar opportunity for business leaders, maybe that have invested in Power BI.
Rob Collie (05:53): Yeah.
Justin Mannhardt (05:54): You and I have used this idea of AI machine learning coming in range. What's led your mind to some of those thoughts? And then, maybe I might have some follow on questions.
Rob Collie (06:05): The one thing in college that I ever learned from my computer science education that I think that turned out to be useful, and it's only become useful lately, is one of my professors nominally taught an AI course. The AI course was not an AI course. It devolved into us writing, I swear to God, we were writing code in Lisp. We were writing code to help him automate his job as a four-year advisor for undergraduates. He was tired of helping undergraduates build their own program for the four years that they were there, like what classes they would take to satisfy their major. He was harvesting the best work from his students, and running with it so that he could automate his job of ... Who wants to touch these humans?
Justin Mannhardt (06:50): He's one of those take the human out of the process people.
Rob Collie (06:53): Ironically, he was the most human, or at least in the top three most human professors I had in college. He was awesome. But he was also awesome, just telling it, "Look, this job's a drag. I got to really get a lot of this automated to I can enjoy my time here."
Justin Mannhardt (07:04): Right.
Rob Collie (07:05): He just really leveled with you. But he was involved with this project where these printing presses at this local print shop ... Hey, you can from a printing background.
Justin Mannhardt (07:15): I did, yes.
Rob Collie (07:17): They make these giant spools engraved with the inverse of what they want to print.
Justin Mannhardt (07:23): Right.
Rob Collie (07:24): These things have to be almost forged and carved out of metal. There's a huge upfront investment in making these. Then they need to be able to use them for a long time to print a jillion of this thing over, and over, and over again.
Justin Mannhardt (07:36): Right.
Rob Collie (07:37): Well, what happens is that every now and then, it would get what they call a band where they would start getting a vertical line that would run through every copy that they printed with one of these rollers. Once it started doing that, it wouldn't stop. That roller had to be thrown away, recycled, retooled. It was very expensive, it slowed the whole process down, raised their cost basis tremendously. They didn't know what was causing it.
(08:02): In the early '90s, this professor helped this print operation not diagnose, in the end they still didn't have an explanation for why it was happening. But they ended up collecting 30-something variables.
Justin Mannhardt (08:16): Okay.
Rob Collie (08:17): Including temperature, RPM, humidity, whatever. Just collected all the variables they could. Then also, whether or not and how long it took before they got a band. Did this machine learning. Again, they didn't call it machine learning back then. Even though this guy was an AI professor, it was called data mining. It was this decision tree, decomp behind the scenes that was done with the machine learning that ultimately arrived at a seven variable prescription for avoiding bands. If you keep the temperature between this and this, and you run the RPMs between this and this, blah, blah, blah, viscosity of this.
(08:56): Again, no explanation. They hadn't found the smoking gun, and it wasn't one variable. If it had been one or two variables, they probably would have been able to spot it. But it was seven variables they didn't get bands. They ran three shifts there. Two of the three shift leaders bought into the prescription.
Justin Mannhardt (09:15): Sure.
Rob Collie (09:16): The third shift said, "No. I refuse to follow this newfangled, BS advice." The two shifts that used it had spectacular results. But then that third shift kept ruining the rollers. Messing it up for everyone. They switched to all shifts doing it, and basically they never had a problem again. They still didn't understand it. Okay.
(09:40): Really long explanation.
Justin Mannhardt (09:42): I loved that detour.
Rob Collie (09:43): You came from a printing world. You came from a number of worlds, it turns out.
Justin Mannhardt (09:47): That's a different podcast.
Rob Collie (09:48): It is a different podcast, yeah.
(09:50): I've got amazing charts, tracking all of these things, these variables and outputs, symptoms, with regard to Jocelyn's health. You haven't lived, folks, until you've gone into a doctor's office, and spun your laptop around and said, "Feast your eyes on this." There's a moment there where you find out what kind of doctor you've got.
Justin Mannhardt (10:12): Right, exactly.
Rob Collie (10:14): If they're dismissive and they're like, "Well, that's nice and all, but put it away," you know they're not really that good of a thinker. They're too caught up in their own reputation. Too much hubris, can't come down from the mountain, and actually interact with something like data that the patient produced.
Justin Mannhardt (10:29): Right.
Rob Collie (10:30): But the good kind go, "Ooh," they lean in.
Justin Mannhardt (10:33): Yeah. I have some family that's in the medical field, my wife included. This is a scientific practice, and they're also just trying to figure things out so much. These people, yes, they're incredibly brilliant and smart. But it's like, "There's so much I don't know."
Rob Collie (10:48): Yeah.
Justin Mannhardt (10:49): You think of the most daunting diseases of our time, we're still trying to figure it out. I think the curiosity you experienced with your provider, it's so great.
Rob Collie (10:58): Growing up, you would expect all doctors, and all scientists, to be that way. The disappointing reality that we live in, a very sizable fraction of them are not.
Justin Mannhardt (11:10): Yes.
Rob Collie (11:11): This actually reminds me of another thing, which is a friend of mine, his older sister used to say ... She had this thing called the Three Stooges test.
Justin Mannhardt (11:17): Okay.
Rob Collie (11:18): When she was dating a new guy, she would put the Three Stooges on. If he liked it, she got rid of him. Maybe, when you're auditioning new medical providers, you just have some reason to show them a chart.
Justin Mannhardt (11:35): Right.
Rob Collie (11:35): Then right there, you know. It's like the acid test.
Justin Mannhardt (11:38): Yeah.
Rob Collie (11:38): If they're dismissive, if they're put off by it, if they immediately try to reestablish their power dynamic, their control over the situation, you just get up and leave. If they lean forward and they're like, "Ooh," you got a good one.
(11:52): I've definitely reached certain conclusions. The things that you would normally reach from charts.
Justin Mannhardt (11:59): Right.
Rob Collie (12:00): Usually, we can find out that treatment X, it can even be an environment variable that we change. We go to bed earlier, something like that. There's so many things that we can change up, and they're not always medication, they're not always supplements.
Justin Mannhardt (12:12): Right.
Rob Collie (12:13): Normally, you can change it up, and watch what happens. You can just generally tell, the chart changes shape at that moment.
(12:24): That's usually as far as we go in business. Of course, in business we would have a lot more subdivisions of it. We'd have a lot of patients equivalent. Here, we only have one. This is not big data, and yet the Power BI model is important. It's doing things that we can't do otherwise.
Justin Mannhardt (12:41): Right.
Rob Collie (12:41): But what if, Justin, there's a seven variable, if you just keep it in this range, everything's going to be fine. There's no way that my brain is ever going to spot that in the noise.
Justin Mannhardt (12:56): That's right.
Rob Collie (12:57): I'll give you another example. Sometimes when we change a variable, we don't really know. Sometimes it's very subtle. There's other things changing, and even just change over time. Sometimes I find myself just saying, "Hey, I'll just sort by how bad are the symptoms," a particular symptom column and a table. A table visual. Then just scan the inputs and see if they are somehow lined up. If you sort by one, and then it implicitly sorts by another-
Justin Mannhardt (13:24): Yeah.
Rob Collie (13:24): That tends to jump out. But you have to do that, a chart won't do that for you. It's so weird to discover that something over time, a trend. I'm actually using the table visual as much as I am using charts. It just shows you that the human brain isn't necessarily built to spot these patterns. There might be a seven variable Holy Grail that we will just never know.
Justin Mannhardt (13:44): Are you saying you need machine learning instead of what you're doing? Or you've reached a point where now, machine learning makes sense?
Rob Collie (13:54): I've reached the point where now, machine learning makes sense.
Justin Mannhardt (13:56): All the work you've done on this model, and all the work people building models and reporting solutions with Power BI, this combination of people, typically in analyst or sometimes their in functional roles in companies, we refer to them as totally datageners. The combination of the datagener with tools like Power BI, with that data model engine, is a superpower to getting to a state where you understand something well enough to know, "Okay, now machine learning might be the next step."
(14:30): We've talked about on the podcast how the most difficult part of machine learning isn't, well, what type of model should I use? Is this a regression problem or a classification problem? That's the easy part. The hard part is wrangling the information into a state where the outputs of that exercise there is actually useful.
(14:47): I just want to highlight for a moment. You've gone through this process with yourself and Jocelyn, and people in companies, business leaders and analysts have gone through a similar process. Through iteration, week, over week, over week, and tweaking, of fine-tuning, and understanding it better, and better, and better, and better, and better. If we jumped back to day zero of this and we said, "Oh, well, this is a machine learning problem," I would guarantee you, we'd be light years behind where you are right now.
Rob Collie (15:13): Yes. By the way, learning one new thing at a time doesn't just apply to data practitioners. It also applies to business leaders who aren't directly executing these things.
(15:28): If you think about it, if you're out there, dear business leader, and you're listening to this podcast, and you've been some length of the distance down the Power BI road, you remember what it was like when you first set off down that road. Is it going to be valuable? You're thinking, "Probably," which is why you started, but you don't know for sure.
Justin Mannhardt (15:46): Yeah.
Rob Collie (15:46): How overwhelmed am I going to be, et cetera, et cetera? What is it going to do for my business? How much? Don't really know.
(15:52): Especially if you're doing it right. For example, one version of doing it right is working with a company like ours. We've fashioned our company to work in the right way. Novel concept. We're talking our book because we built our book around the way it should be. But if you're a business leader involved in any sort of data project, you need to be a close stakeholder of that project. If the implementation process, like the Fawcett's 1st version, is being done the right way, that allows you to remain engaged. It allows you and your domain expertise to constantly inform, and improve the quality, and twist and turning, and head in the right direction. Okay.
(16:33): You, business leader, you learn how to mesh with that process. Even if you're not the one building the models, right?
Justin Mannhardt (16:41): Right.
Rob Collie (16:42): I think the same thing is true, if you stay at that level of stakeholder, domain expert, business leader, the same thing is going to be true when you step off into something like machine learning. Having a strong base, and you're only really adding one new thing to the workflow.
(17:02): It's really easy for me to talk about this from the perspective of someone who might someday be cajoled into going and writing some Python, which we will wait and see if that ever happens. I know what you're rooting for.
Justin Mannhardt (17:14): Yeah. This might involve a couple airplanes, and someone named Chris Haas. But, yeah.
Rob Collie (17:19): It might, indeed. But I just don't want people listening to think about this as, "Oh, I'm never going to go write Python."
(17:25): The stage being set in this way, the likelihood that you have business problems that would benefit from this, and the position that you're in now if you've got Power BI models that set you up for it, whether you're a business leader or a data practitioner, you're really, really in the right stepping off point right now.
Justin Mannhardt (17:43): One of the OG 10 things, fostering a-
Rob Collie (17:47): Virtuous cycle of better questions.
Justin Mannhardt (17:48): Virtuous cycle of better questions. This is actually a really important insight, I think. With all the hype on AI, and business leaders who are like, "What are we doing with AI? What are we doing with advancing our analytical capability, et cetera?" Is you need to be tightly in tune with are we sufficiently answering the question we have first? Because it's very easy to get carried away of, "Oh, we'll know this, and then we'll know this, and then we'll know this, and then we'll know this." But you've gone through this process of, "Well, I want to understand this," and then that led you to the next question, and the next question, and the next question.
(18:25): If you're looking at a business problem right now, if you're being asked or your brain is jumping to, "Oh, this obviously seems like a machine learning problem," I would encourage you to to challenge how well you understand that at this point.
Rob Collie (18:40): Yes.
Justin Mannhardt (18:41): Before you go off, trying to blaze that trail, hiring a data scientist or a consulting firm.
Rob Collie (18:46): I agree. You're going to have to just take my word for it, dear listener, that I have put this data model through its paces.
Justin Mannhardt (18:54): Yeah, through the wringer.
Rob Collie (18:56): I have really twisted it, turned it. I mentioned in the podcast that I modify the model about once a week. Now, why am I modifying the model is because I'm trying to answer a new question that it's led me to, and I need to twist and turn the report shape just a certain way. That's what's driving me. I'm not improving the model for its own sake. I'm improving the model because I want a different visual result. I want to be able to line some things up that weren't lined up before.
Justin Mannhardt (19:21): Every project, you reach a point where you realize, "This model's not set up right." Because we've encountered a new layer of question. We need a different fact table, or a different type of granularity, or different types of relationships, whatever it is.
Rob Collie (19:34): And that's the good stuff.
Justin Mannhardt (19:35): It's all good. It's not waste. It's not like, "Oh, we took a left turn at Albuquerque." It's the right way.
Rob Collie (19:42): Yeah. When you get to that point where you're like, "You know what? This model that we've built that has helped us so much so far, it's already win, after win, after win, it's still not perfect. It could be even better." Oh my gosh, when you're in that mode, it's the clear indicator that you are in the pipe, you are in the zone.
(20:00): Again, this doesn't take much work. When I modify this data model once a week, or whatever, I'm not spending hours modifying it. It's a new column, or a different timestamp, or a new inactive relationship that allows me to play with time-shifting a little bit differently, or whatever. It's incremental, it's fast, and then I've got my result.
Justin Mannhardt (20:20): And you're feeling that incremental progress every single week. It's not like going back to the drawing board every single time.
Rob Collie (20:28): If I understand you correctly, your advice here is to reach a certain level of maturity with what you've gotten out of the data model before you get into ML.
Justin Mannhardt (20:36): Right.
Rob Collie (20:37): A, you're missing a lot of the low-hanging fruit that you could just get without ML. And B, your machine learning, if you end up distracted down the machine learning path, you're going to treat the data model, the semantic model, as if it's an artifact. If it never has the richness that would help you discover something amazing with ML, it's going to be harder conceptually to go backwards to the data model to evolve it. Some of the people who are listening are in that spot.
Justin Mannhardt (21:01): Totally. There's a collection of memes around this. Of you want to piss of your data scientist, hand them a people and say, "Make magic with this." They need that maturity of the data, the cleanliness of data, the business context to make sense, all of these things that you do when you're iterating through models. I think it's a better way to go about it, and much more likely to have success with it.
Rob Collie (21:24): First of all, I needed Power BI to run some of these calculations. One of the most important variables is what is the blood level of a particular medication.
Justin Mannhardt (21:33): Sure.
Rob Collie (21:34): Not how much have you taken, because that ignores the fact that a lot of these medications, here blood level is going to be double what her most recent dose is. But especially if you're changing dose over time, to try to titrate how you feel, oh my gosh. It is really crazy. Sometimes just to stay the same, you need to cut in half. That calculation is something I really could only get out of Power BI efficiently. It's just one example.
(21:59): But also, just splicing together all of the different convenient data recording formats. The places that we have to input data, there isn't necessarily one place to go. If you force it into one spreadsheet for instance, or one app, it becomes very unwieldy.
(22:16): There's that splice together, and there's calculation. If you view the data model as data prep for ML, it is doing a kind of data prep.
Justin Mannhardt (22:24): Absolutely.
Rob Collie (22:25): It is a much better input. It brings the problem to its actual semantic level, as opposed to the original raw data, which isn't.
(22:34): But the other thing is, I think you're also hinting at this, is that once I have that Power BI model, there's a tremendous amount of low-hanging fruit that can just be plucked by using the existing model, and looking at charts, and asking different questions, and rearranging, filtering, drilling down, all that kind of stuff. I've run to the limit of what that gives me. I have an intuition that I'll never build just the right chart to discover something. Even if I did, I wouldn't necessarily notice it.
Justin Mannhardt (23:06): In my own experience anyways, I think about the tools I've worked with in my career, things I've gotten good at and subsequently forgotten how to do, I'm pretty good at writing SQL. I was really good at Power Query for a period of time there. I've learned my way around some Python. But if I thought about how to do some of the things I've done with DAX and those other languages, couldn't, could not.
Rob Collie (23:31): Yeah.
Justin Mannhardt (23:32): Even if I could, it'd be, "Aha! See? I told you so." You'd be like, "But, why?"
Rob Collie (23:37): Power Query has been huge just in assembling this model as well. Because what makes it a convenient input format when you're recording data is almost the opposite of what makes a good analysis format.
Justin Mannhardt (23:51): That's the classic dilemma between OLTP and OLAP. The way things go into systems is not ... It's the same thing, the same problem.
Rob Collie (24:00): Yeah. Closest Rob gets to SQL is typing data into an Excel spreadsheet, in that case. That's the line of business app.
Justin Mannhardt (24:07): Right.
Rob Collie (24:08): The line of business app is hopefully optimize for the business user who's using it, filling in the data.
(24:15): Actually, I want to go back to a question I asked you last week. As I finished wrapping up recording that podcast, these thoughts were bouncing around in my head. I asked you, "Hey, that whole semantic link thing we've been talking about in Fabric." That component that allows Python to interact with a published semantic model a Power BI model. My evolving picture of Fabric keeps coming into clearer and clearer focus. Oh, it turns out direct lake mode doesn't use the Power BI VertiPaq format. But at the same time, from a conceptual understand what's going on perspective, what the intent of Fabric is and everything, that was still not a bad way to understand it.
Justin Mannhardt (24:56): Sure.
Rob Collie (24:57): It's just that direct lake isn't going to perform as well as import mode.
Justin Mannhardt (25:00): Import when you can import, yeah.
Rob Collie (25:02): Then after having that revelation, that oh, Direct Lake isn't the same, oh my God! Wait a second. Does semantic link work with import mode? Is my Power BI import mode model, is that one available to machine learning just as easily as the direct lake one? That's when another light bulb went on for me. Of course, it's got to be, right?
Justin Mannhardt (25:22): Right.
Rob Collie (25:22): They wouldn't penalize you for using import mode.
Justin Mannhardt (25:26): No. Think of semantic link, what it's really doing is the equivalent of a matrix visual on the canvas. That matrix visual is issuing a query to the model.
Rob Collie (25:37): Yeah.
Justin Mannhardt (25:37): Saying, "I need these results back." The matrix doesn't care if your model's import, direct query to SQL, or direct lake to one lake. The model has different limitations across those storage modes, but you can still build a matrix and you can still use semantic link to query your model.
(25:56): The whole idea with semantic link is that convenience option to say, "Hey, I want to retrieve data from this model into a data frame that I can use in whatever I'm doing in Python here."
Rob Collie (26:07): In a previous episode as well, I likened semantic link to the field list.
Justin Mannhardt (26:11): Yeah.
Rob Collie (26:12): The field list for building reports, for building visuals in the Power BI canvas, you don't need to understand DAX, you don't need to understand Power Query. The person who's using the field list is often the same person who built the thing, but isn't always. You don't need knowledge of it necessarily.
Justin Mannhardt (26:29): Yeah.
Rob Collie (26:29): The same goes for Python now, via semantic link. It's essentially a field list equivalent that allows Python to issue queries to the Power BI model in exactly the way that the matrix visual, or the table visual, or the line chart visual, whatever.
(26:44): I'm now understanding, I've actually discovered now a potential place where Fabric is useful in my personal life.
Justin Mannhardt (26:54): Whoa.
Rob Collie (26:54): That's pretty nuts.
Justin Mannhardt (26:55): This could quickly spin off into a series of Rob learns Python every day until he writes his machine learning model. This is how I'm going to get you.
Rob Collie (27:07): One day, I'll be having a colonoscopy and I'll wake up. I'll go, "Oh, I got it! Quick, write this down before you knock me back out."
Justin Mannhardt (27:17): Oh, here he comes, folks.
(27:18): It's not only in range for you in your personal life, it's in range for company, and it's also in range for datageners.
Rob Collie (27:28): Yes.
Justin Mannhardt (27:28): Sometimes if you haven't gone that direction, Python, PySpark, machine learning, that can feel very intimidating. It's also a very learnable skill.
Rob Collie (27:39): First of all, this is my personal life. The amount of data that is collected in my personal life is so small compared to the amount of data that's collected in a business.
Justin Mannhardt (27:50): Right.
Rob Collie (27:51): The number of opportunities in one's personal life to use data to improve one's life is minuscule by comparison.
Justin Mannhardt (27:59): Yeah.
Rob Collie (28:00): It is one of the least likely places to discover some relevance of Fabric and semantic link. It's the last place you would expect to discover this. If I'm discovering this here, in that spot, that means that almost everything above that line, in terms of real business scenarios, is so much more likely than the one that I found to have value. You would expect this to be the last thing I discovered.
(28:26): The second thing that I wanted to amplify is a lot of people, myself included, when you have one thing to learn, one new thing, you're infinitely more likely to give it a shot, than when you have two or three things to learn that you need to learn simultaneously in order to do something new.
Justin Mannhardt (28:46): Right.
Rob Collie (28:47): I think I'm at one of those precious opportunities where I don't have to go invent some use case, I don't have to go download some freaky dataset that Microsoft gave me that I don't care about in order to do some machine learning on it. I just don't care. I'm sitting here, with a problem that is already in a format, import, hence my question. Wait a second, import mode has got to be supported, right? I'm sitting here, I've got the import mode model, semantic link is sitting there. Now the one thing I need is Python machine learning.
Justin Mannhardt (29:24): Yeah.
Rob Collie (29:25): That's still pretty daunting, especially for me. I don't like learning new things. I've got a higher distaste for learning new things than most people in the tech field. I'm semi-famous for this. The fact that I liked Power BI, in fact I loved it, was mind-blowing. It was because it was that good, even Rob liked it.
Justin Mannhardt (29:41): Yeah. Code is one of those areas where it does shine a little brighter than other areas. There's some things, not necessarily directly integrated with Fabric all the time. But, "I want to do this type of model. What does that look like?" The code libraries have advanced tremendously in the last few years. If you want to run something like a regression, instead of actually writing out the mathematical computation of that, there's a function that makes it easy.
Rob Collie (30:12): It sounds good.
Justin Mannhardt (30:13): You got to understand what you're doing, but I think we might be able to get you there. We'll see.
Rob Collie (30:17): I'll still need some handholding and some comforting.
Justin Mannhardt (30:20): It's okay, Rob. It's curly braces.
Rob Collie (30:22): Where's my DAX editor?
Justin Mannhardt (30:23): We don't use curly braces in DAX, man. What is happening?
Rob Collie (30:27): Next thing you know, you're going to tell me it's case-sensitive.
Justin Mannhardt (30:32): Yeah, right. Oh, gosh. Just to keep it simple, let's assume there's really two types of machine learning problems in the world.
Rob Collie (30:39): All right, I like it.
Justin Mannhardt (30:40): There are what we would call regression problems.
Rob Collie (30:44): Okay.
Justin Mannhardt (30:45): And/or classification problems. Just to differentiate, a regression problem, we're trying to predict essentially probability. What's the likelihood of an output, given a set of input? A classification problem, we want to say something like cancer, not cancer. Good review, bad review. Size one, size two, size three. Classification is putting something in very neatly defined groups and categories. Regression is predicting likelihoods of outcomes.
Rob Collie (31:15): Okay. I think classification is easier for people to understand.
Justin Mannhardt (31:17): Yeah.
Rob Collie (31:17): My experience with regression is it is usually used to test whether a variable has influenced something else. The confidence in result, like in a scientific study. Meaning, "We gave this group the real trial medication, we gave this other group the placebo. Here's the incidents of some disease that was observed in the control group versus the test group." They run a regression on that to see how likely it is that the drug made the difference. Or how likely it was that the drug caused a side effect. It could be a negative result, too.
Justin Mannhardt (31:55): Right.
Rob Collie (31:55): I understand that that is a predication of a probability. It's like saying, "If I give someone this medication, what's the chance it's going to help them?"
Justin Mannhardt (32:03): This comes from the deeplearning.ai, a great educational resource. In some of their introductory materials on regression, they'll use a really simple example of predicting the price of a house. We're trying to predict the price of a house, based on things like the square footage of the house, the neighborhood it's in. If you give me all these inputs, I can predict the price of a house with some level of confidence. That's the probability piece. "I predict the house is going to be a half-a-million dollars."
Rob Collie (32:32): Plus or minus, yeah.
Justin Mannhardt (32:34): Right.
Rob Collie (32:34): Yeah.
Justin Mannhardt (32:35): In regression, what we're trying to do is we're trying to control error rate. We're trying to get that confidence and probability as high as we want.
(32:42): You are probably looking at something more along the lines of multiple variable regression, where you're trying to understand, given these inputs of the things you're tracking, what's the likelihood of certain outputs. What you could do in this scenario. You've accumulated a lot of data, you know what's actually happened.
Rob Collie (33:05): Yes.
Justin Mannhardt (33:06): You have history. You can use that history to train a model, "Hey, given these inputs," this is maybe just helping with some of the nouns and verbs that you would hear from a data scientist. You'd refer to those inputs maybe sometimes as features in a dataset. So much of this medication, so much of this whatever you're tracking in your inputs, and the outputs are what you're experiencing as a result of those things.
Rob Collie (33:31): Features and outputs, those are their nouns?
Justin Mannhardt (33:33): Yeah.
Rob Collie (33:34): That's so much better than dependent and independent variable. I eventually figured it out. I'm like, "Oh, right. Totally."
Justin Mannhardt (33:39): The same thing.
Rob Collie (33:40): But, features and outputs. Yeah, that seems humane.
Justin Mannhardt (33:42): Pretty good?
Rob Collie (33:43): Yeah, okay.
Justin Mannhardt (33:45): You have data that you could train with. What you would do is you would set up a regression model, and we'd feed it a certain percentage of your real data, with both the inputs and the outputs. Use 30% of the data to train, and use the other 70% to learn. Then it would predict the rest of it, but on the real data, so you could teach it where it was wrong.
Rob Collie (34:08): By the way, that's an example right there where I might run into a problem applying this to a personal life problem that you won't run into with a business problem.
Justin Mannhardt (34:18): Right.
Rob Collie (34:19): If you think my dataset and start cutting it to 30% and 70%, almost certainly, we're going to find ourselves in situations where there isn't enough data to produce any confident outputs.
Justin Mannhardt (34:29): Yeah. Even in your scenario, the good news is you would repeat this cycle numerous times.
Rob Collie (34:34): Okay.
Justin Mannhardt (34:35): What you might consider doing is leaving certain features in or out. I think where you'd eventually get to, Rob, is if you train a model and you say, "Okay, it's kind of getting close to what's actually happening in our lives, based on the real data we've got." Now the question you had posed is, "Well, I want to understand what's optimal?"
Rob Collie (34:53): Yeah. I want the solver.
Justin Mannhardt (34:54): The solver. You've been tracking this stuff for a while, but there's only so many days in a year. I'm guessing you have less than a couple thousand rows?
Rob Collie (35:02): Oh, gosh. Yes.
Justin Mannhardt (35:03): Of entry? Yeah.
Rob Collie (35:04): Unfortunately, other than date, probably many, many, many of those rows are unique. There's not enough at bats of these four symptoms, and seven inputs, or whatever. There's a lot of variation across 11 columns. As you start to subdivide the dataset, you're getting down to samples of size one. In the business world, you're not going to have that problem, not nearly at the rate that we'll probably run into it in my situation.
Justin Mannhardt (35:32): What you can do now is, effectively, you can run, "Here's the possible ranges of this input." And the possible ranges of this input, give it hypothetical data and say, "Predict the outcome. I've given you real world experience, here's a range of possibilities of what we could change or do differently."
Rob Collie (35:49): Go run 400 simulated universes forward, without having to live the consequences of running these experiences in biology. Specifically, in mine and my wife's life.
Justin Mannhardt (36:01): That might help you say, "Hey, these ranges," like the seven variables on the printer drums. "Okay, maybe these ranges seem like they might produce outcomes on the other side. Let's try that." I'm still tracking my experience and what's happening in the real world. That's the cycle you would get into, even in a business context.
(36:22): It's funny. Kellan and I were actually talking a little bit about this this week, when you think about the interconnectedness of different core business processes. What's happening in your demand? What's happening in your sales functions, and your execution, and your financial outputs? When you're looking at any one of those things, you understand the patterns and trends that those things experience within themselves. But to understand how they all domino to each other, one of those things is like, "How could you simulate these things in this type of a world?"
Rob Collie (36:53): Kellan is just a number of months away from achieving singularity with the Power BI models behind our business.
Justin Mannhardt (37:00): Right.
Rob Collie (37:01): He's close. He is machine learning.
Justin Mannhardt (37:01): Anybody seen Kellan? Yeah, I think he got published to the tenant.
Rob Collie (37:05): He got ingested into it like Tron. He's running around on a light cycle, heading off business problems in realtime.
Justin Mannhardt (37:17): Right. Anyway, the general ingredients to the potential application of regression are I have lots of variables. Or maybe, not even lots. But I have more than one variable going in, and maybe more than one variable coming out. I kind of understand the interplay, but I don't really. I'm trying to better understand what might be going on here. Companies use these types of things to predict things like customer churn, or to try and predict the lifetime value of a customer, or try and predict whether a credit card charge was fraudulent or not. That's what's happening in these type of applications, is they're training on real data where things really happened. Then they're running those types of simulations to try and find where things might be optimal.
Rob Collie (38:04): That answers one of my other pocket questions. Which is, I remember years ago, when I still worked on Excel, and SQL server data mining was this new product-
Justin Mannhardt (38:17): Oh, God.
Rob Collie (38:17): They were fighting to get traction for. One of the people I worked with, Mike Ammerlan, he went and kicked the tires on this stuff. One of the algorithms that it had back then was this thing called clustering. Where it's kind of like classification. You give it, let's say a list of all of your customers, and all the data associated with those customers, and it would cluster them into families that were somewhat alike. But the weird thing was, right off the bat, the first thing it asked you was, "How many clusters do you want?" We were like-
Justin Mannhardt (38:49): We don't know!
Rob Collie (38:50): "We don't know how many clusters we want. We want you to tell us." If we're here to tell you, just randomly, two, three, four. Come on. That seemed like a real miss.
(39:01): The way that I think of regression, from an Excel background, is where I pick the input and the output. And say, "Tell me what the relationship is between these two." Whereas this more modern, more sophisticated version of regression is, is that I can give it all the variables that I've got, and it will go look for correlations that might even be multi-variable correlations. I don't have to tell it which ones to look for, it's going to be trying them all.
Justin Mannhardt (39:30): Yeah. There's some really cool libraries out there. Again, I'm not a data scientist, I don't proclaim to be one. But just really cool libraries out there that do these types of things. Because you want to know things like, okay, if I have 10 features, maybe two of them are really the only ones that matter. Yeah, all the other ones could be moving all over the place, it's these two.
(39:53): That's been my experience, or where I've felt my own limitations working with data models is when I get to that point where I don't really know what's affecting this. I know what's happening, and I can calculate it, I can measure it, I can see it, I can chart it. But I don't really know.
(40:09): I was working on a project with a customer. We were trying to use history to essentially forecast the budget rundown on a project. I can understand clearly what's happening, and I can essentially straight line out what might happen from there, but I don't know really what's driving the deviations to be on or off track. That's where you start to lose some of your insight and ability to really predict what's going on.
Rob Collie (40:36): We're at that point where you know it's happening, there's a lot of factors. You don't really know how each one of them is influencing things. This is one of those signs, one of those signals to you that maybe now is the time where you could get that extra value out of machine learning.
(40:52): This hearkens back to thing you said earlier, about don't just slap together a data model, and then charge through in ML.
Justin Mannhardt (40:58): Right.
Rob Collie (40:58): This signal that you're getting at that point in time tell you that, number one, you've gotten a lot of the value out of the data model that Power BI provides, that Power BI is going to be better at providing than machine learning.
Justin Mannhardt (41:11): Totally.
Rob Collie (41:12): You don't want to miss that. It tends to be even the first 75% most valuable anyway.
(41:19): Secondly, it indicates to you that you've also, at this point, evolved your data model to the point where it has the richness and the detail that's going to make your machine learning models even better.
Justin Mannhardt (41:30): Right.
Rob Collie (41:31): A perfect example of what you were saying earlier.
Justin Mannhardt (41:33): You and I, together, individually, have had numerous experiences working with companies and business leaders where you start a Power BI project, and then right away, that first day, or even a few days later, they'll say, "Wow. We're learning things about our business we never really understood we needed to learn." These are people that think they understand their business really well.
Rob Collie (41:55): And deeply thoughtful, curious people, too.
Justin Mannhardt (41:58): Intelligent, deeply thoughtful, not foolhardy at all. I think maybe a sign you could use here, and that experience of, "I'm still learning things about my business. I'm still learning, I'm seeing things, I'm applying what I'm learning." If you feel like, "Now I'm perplexed. Now I'm starting to feel stuck." That's a cue that a time for change is there. Maybe it's not machine learning yet. Maybe it's that idea we were talking about earlier too, of now my model maybe needs to morph a little bit to help me break through the next level.
(42:29): But if you're still discovering, and learning, and applying insights from your business intelligence, learning about your business, keep going. Wait until you find that, "Okay, no I'm perplexed." Now the problem's becoming more clear that it is unclear what to do about it.
Rob Collie (42:45): I think that's great. So much of this conversation, I'm really glad we did this, because even just from my own personal understanding, I intuitively knew that I was at that point with this health data model. But I didn't have a lot of the principles in English. They weren't distilled out into guidelines that I could transmit to someone else. I'm like, "Yeah, I know that I'm in the spot. I don't know how to describe that spot on a map, though." I think I do now. What a valuable thing.
(43:15): That's it. We're never going to release this podcast, Justin. It was just something for me to learn and we're going to hog it. We should release it. We record podcasts to release them, Justin.
Justin Mannhardt (43:26): Yeah. Yeah, that's right. Even myself, having this conversation ... You do projects like this, see hundreds of them, hundreds of different companies, hundreds of different groups of people. When I step back and realize, even in this conversation, I'm like, "We were going through this high frequency of continuous learning and discovery." Then you experience stalls along the way, and those stalls are good. They're the catalyst of, "We've now learned so much that we've cracked open another layer of the problem."
Rob Collie (43:52): Yeah. It's just really nice that, again, the way the tech works these days is it's not expensive in terms of time, money, or energy to explore ways to get off of that plateau.
Justin Mannhardt (44:03): Right.
Rob Collie (44:04): On occasion, you might be on what you think is a plateau, and it's actually literally the top of that particular mountain.
Justin Mannhardt (44:09): Yeah.
Rob Collie (44:10): How do you know? You can't tell unless you start trying to climb again. If trying to climb again was super expensive and super scary, then you might just stay on a plateau and never know what you were missing.
Justin Mannhardt (44:22): Right.
Rob Collie (44:23): Now we can just start throwing grappling hooks. "Hey, look, I caught a ledge."
(44:29): All right, well, I enjoyed this one and benefited from it.
Justin Mannhardt (44:32): I did, too.
Rob Collie (44:32): Well, we solved that problem.
Justin Mannhardt (44:34): Yeah. That one's taken care of now.
Rob Collie (44:38): But yeah, as much as I enjoyed this format, this probably isn't what we're doing next week. I think we've got a guest next week.
Justin Mannhardt (44:43): Yeah, I think we've got a couple guests coming up. So maybe, a few episodes between now and the next jaw session. Maybe we'll even check in to see if you've written any Python yet. My model says probability zero.
Rob Collie (44:57): Yeah.
Speaker 3 (44:58): Thanks for listening to the Raw Date by P3 Adaptive Podcast. Let the experts at P3 Adaptive help your business. Just go to p3adaptive.com. Have a data day.
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.
Subscribe on your favorite platform.