One Data Model to Rule them All?

Rob Collie

Founder and CEO Connect with Rob on LinkedIn

Justin Mannhardt

Chief Customer Officer Connect with Justin on LinkedIn

One Data Model to Rule them All?

Listener John sent in a classic question: should you build one all-encompassing data model, or should each part of your business have its own? If you think the answer is black and white, well, welcome to the world of data modeling, where nuance reigns supreme.

Rob and Justin dive deep into the pros, cons, and inevitable mistakes of both approaches. Along the way, they call out the biggest modeling traps, expose the myths that make projects stall, and introduce a revolutionary new acronym: JGS (Just Get Started). Because, spoiler alert, the best data model is the one that actually gets built.

This is an episode you won’t want to miss!

Episode Transcript

Rob Collie (00:00): Hello friends. Today, Justin and I discuss one of the oldest and most central questions in the book. When you're early in your organization's data modernization journey and you're looking at a relatively clean slate in terms of Power BI semantic models, is it best to start building multiple models each specific to a particular segment of the business, or instead attempt to centralize all of those segments into one model to rule them all? Like all old questions, this one has no simple answer. This episode could have a sticker on its label saying, "May contain the following: nuance, reframings of the question, and of course, a healthy dose of, that depends." None of those are allergens, however, that's just how reality works. One of my favorite sayings these days is, "The universe doesn't owe us simple solutions." But the good news is that we can turn the tables.

(00:52): If we embrace the nuance and reframe the question in ways that better suits our actual priorities, oftentimes we do end up with a confident and simple clarity of how to proceed. So I think you will find this episode useful in terms of navigating this age-old decision. Something we didn't talk about, which we probably should have, is that the question of fragmented versus centralized data models is no longer just about powering dashboards. As discussed in multiple episodes last year, the semantic models we build in Power BI are so good for dashboards because they are so good, period. The data they contain combined with the business semantics we encode into them make these semantic models the most authoritative source of information about our business. That bears underlying, so let me say it again. A good Power BI model is a far more comprehensive source of information about our business than anything else we can possibly have today.

(01:51): Nowhere else does everything. The data and the business logic come together so cleanly, so completely as they do in the Power BI semantic model, which makes those models also the springboard for future AI projects at your organization. On our episode with Sean Rogers, for instance, when we asked him, "What should organizations be doing to ready themselves for AI?" He said, "The three most important ingredients to AI are going to be data, data, and more data." But then I suggested that the answer should be data, data and metadata. In this case, using the word metadata to capture all of that business logic. The formulas we use to calculate important metrics like customer retention, the ways in which our date tables are linked to multiple columns in a single fact table, and the ways in which those relationships are themselves utilized by other specific business calculations. Every single thing we add to a semantic model that isn't data, is metadata, and it's incredibly important metadata.

(02:54): Sean liked that, the suggestion that it's data, data, and metadata. I told him he could use it. I'm not sure if he has been using it, but I sure am. So your strategy for data models gains new weight when you view those models as the key to future payoffs that you might not even have contemplated yet. The good news though, is that it does not change our advice at all. Again, semantic models are good for your dashboards only because they are good, period. I think you'll in the future likely need to enhance them further later on to enable specific AI use cases. But the investments you make today to power dashboards will give you the proper foundation for AI projects when that day comes.

(03:36): Now, before we get into the episode, did you know that Justin and I semi-frequently meet with people like you who listen to the show? It's one of the most valuable things we do in a given week. We love hearing where people are at, what they're struggling with, and just having a two-way conversation as a complement to the one-way conversation that is the podcast. These aren't sales meetings. None of our sales team is there. It's just me and Justin. We often learn just as much as the people we're talking to. In fact, the question for this week's episode for instance, came from a listener. Well, we also have a chat scheduled with him coming up. I love it. We did a whole episode from his question and we're still going to talk. If you have a question for us to discuss on the show or you would like to just chat with me and Justin, hit one of us up on LinkedIn. With all of that said, let's get into it.

Speaker 2 (04:24): Ladies and gentlemen, may I have your attention, please?

(04:28): This is the Raw Data by P3 Adaptive podcast with your host Rob Collie and your co-host Justin Mannhardt. Find out what the experts at P3 Adaptive can do for your business. Just go to p3adaptive.com. Raw Data by P3 Adaptive, down-to-earth conversations about data, tech, and biz impact.

Rob Collie (04:57): So Justin, we have the supreme luxury of another listener question today. A couple of weeks ago we had the question of why aren't there more people with the data gene in leadership? And I did a solo podcast on that.

Justin Mannhardt (05:09): It was great.

Rob Collie (05:10): And another question came in from John, the age-old question one data model or many, should you build a separate data model for each operational need operational facet of your company, or do you instead go for the one data model to rule them all approach? So many facets to this question, so many pros and cons.

Justin Mannhardt (05:31): I love it.

Rob Collie (05:31): So much nuance and clearly in the end, the answer is just going to be either completely binary, one or the other. One approach or the other is going to be better. There's not going to be any gray in this.

Justin Mannhardt (05:42): Never. This is going to be a three-minute episode and then we'll go home because it's a clear winning choice.

Rob Collie (05:47): Yep. Beatles or Elvis, can't like both.

Justin Mannhardt (05:53): All right. The question is to give some context, John is at the beginning of a company's Power BI journey, should I try to build one data model that represents our business or should I have many data models? The question is already positioned as an either/or, and so I get to sit here as a consultant and say, "Well, it depends." Of course, but let's break down and explore alternative questions that would be helpful here.

Rob Collie (06:24): For example, rather than asking about what's the desired end state instead, how should you proceed, how you should proceed is in the end far more relevant. It's the verb, not the noun.

Justin Mannhardt (06:38): That's right. Full disclosure, I've been a part of projects and initiatives that have gone both of these directions. I've gone down projects where the goal was to build one model that represented the entirety of a business's operation. I've built models that are more focused than that. I have learned through the School of Hard Knocks that it is all about, "Well, how should you proceed from here?" So a couple of things I believe about models and I think we share these beliefs is if you approach Power BI and building data models like you may have approached your legacy reporting, you're probably doing something wrong already. What I mean by this is if you go to a tool such as SSRS or Cognos or BusinessObjects, you've got some sales reports, some financial reports, some productivity reports, et cetera. Those reports tend to be very one-dimensional. They're all about the sales data or all about the financial data or all about the productivity data as an example.

(07:42): So if you're building a Power BI model, let's say the only fact table in that model was sales, like sales orders, I can guarantee you you're missing analytical value by only having sales data in that model. The magic from Power BI isn't just the way you can visualize it. Visuals are awesome. Humans love charts. We understand patterns, but if that sales data isn't able to integrate with customer service data as an example, I can guarantee you you're missing out on things. So that should be a question from the jump is if you're thinking about your build in that way, you're probably missing a lot of value in the process. So thoughts on that, Rob?

Rob Collie (08:27): Well, yeah, I mean, completely. The ability to splice across silos. Your business runs on a hodgepodge of line of business systems. Each one is purpose-built to do one phase of the business and do it well, and it's the data locked up in those systems. The secondary value of that data, the primary value of the data in those systems is to make the business go, make the business operate, but the secondary value is being able to see what's going on and the ability to splice across those systems and see the journey, a customer journey as they move through different phases of your business.

(09:03): In manufacturing, we've heard about the journey that a part might take through the system, the journey that money takes through the system, like some of your systems track expenses in certain systems, while revenue is tracked in detail in other systems and other forms of customer satisfaction, job performance, things like that are tracked elsewhere. To be able to integrate those into one all-up view where the metrics that power your dashboards are even sensitive to them, what's our margin on things? If you're just doing one silo at a time, you're flying in an airplane with the windows blacked out and no instruments.

Justin Mannhardt (09:38): Yep, and we did a webinar recently talking about the manufacturing scenario, like you were saying, watching a part number move through different parts of the parts processing. It's an inventory, it's being used by worker, etc. And the example I was giving in that webinar, and I see this quite a bit, is even if we've put the data from the warehouse system in the same model with production work orders, in the same model with purchase orders for new materials, you still see a lot of dashboards where they're just showing you all those separate things and we've not connected it enough yet. That's my first indication here is if you're not bringing multiple facets, silos, domains of the business together, even at least two of them, you're missing out on an opportunity to better understand things.

Rob Collie (10:29): I completely agree with that. I am a deep, deep proponent of the multi-fact table, multi-line of business system data model. As an ironic story, when I was first learning about BI, Microsoft sent me to Europe a couple of times actually. I went to the UK and to Denmark on separate occasions, and when I went to the UK, I think we visited HP's countryside campus. This was a huge compound. It was like rolling farmland. I remember being told by the honcho that we were meeting with some Vice Presidenty type of IT or something. This was long before Power BI. We had OLAP cubes, we had SSAS cubes. I remember being told by him that, "We basically build one cube per report." And I didn't even know enough at the time to know why that was ridiculous. It's such a subtle thing.

(11:21): Here is a human being, me, Rob Collie, who was already in the chair. I was already in the chair of being in charge of a lot of Excel's investment in business intelligence, and here I am, not halfway around the world, but a long way from home, hearing something incredibly mind-blowingly like, people aren't adopting the software the way that we expect them to, nor the way that they should, and it's just kind of going over my head. It shows you both how subtle this is. It takes a while for it to set in. It's okay if you have kind of missed this point so far, the old joke like, "Hey, if you're missing this point, guess what? You could be in charge of BI for Excel in 2007."

Justin Mannhardt (12:07): Yeah, that example you gave of effectively seeing a one-to-one relationship between a model and reports in the same environment, I've seen that so many times where that is the case. You go into Power BI and there's basically one model per dashboard or report that's out there. Yeah, we don't want that either.

Rob Collie (12:26): Or it's quote, unquote, "one model", but you look at it and it's just a bunch of separate fact tables with no relationships.

Justin Mannhardt (12:31): Not actually a model.

Rob Collie (12:32): Not a model, but it's like, "Hey, we put it all in the same place." So in terms of how to proceed, even if you're starting from scratch and your first problem that you're going after is, I don't know, manufacturing efficiency or something like that, take into account the whole life cycle because you can't answer the most important questions until you've bridged across multiple different systems of multiple different data sources, and that's a really big deal. Embedded in that is sort of one of my biases, which is pick a real world business use. Even though we are absolutely right off the jump, we are advocating if you're coming from one specific corner of the business and you're thinking about one specific problem right off the bat, we're saying, "Broaden your scope a little bit, but just to do a good job of the thing that you originally thought you were doing. You had a narrow focus on what you were trying to do, but don't lock yourself into one system from the get-go."

(13:29): That's a crucial, crucial, crucial point. At the same time though, I think having a specific set of stakeholders and a specific set of initial problems that you're trying to address to focus your efforts is crucial rather than setting out from the beginning to build the model. The model is, it's not quite as bad as, but it's sort of like the old plumbing first thing that we abhor at P3. You're trying to anticipate future needs everywhere without engaging closely enough with any of them to know whether those needs are real and you're certainly going to miss the real needs. So having that initial set of focusing problems, don't set out to solve all of the problems of your entire business end to end all at once. Pick a good starting point and execute on that until it is excellent or bordering on excellent. Then we can say, "Okay, what next?" And what next gets very interesting.

Justin Mannhardt (14:30): Yeah, it's starting from the right place and it's applying that sense of focus in the right place, so it's very natural for us to want to move from the data forward, something we've talked about before as opposed from the need backwards. So when we move from the data forward, that's where we think, "Oh, let's just focus and build a sales model." But if we go back to the need, it's like, "Oh, we're trying to understand this problem that says, oh, well, of course we need the sales data, the quote data, the delivery data. We need all of those things to address this need." So starting from the right point of focus is really important. You bring up the idea of, we talk about faucets first versus plumbing first, and I've been to this rodeo, I want to bold button this. It's so easy to get stuck in the modeling trap.

(15:18): I'll coin that term here, the modeling trap. When we're asking the question of like, "Well, should we build one model or separate ones?" We start focusing on what seemed like good things to focus on and resolve, but ultimately don't help us as much as we think they do. For example, if we're debating whether we should build a common model or build two separate ones, we might find ourselves having a conversation, "Well, Rob, we're going to have very similar dimensions, and what if you make updates to yours? Those updates aren't going to be in mine and we're going to have similar measures. What if I change the logic in mine and you don't change it in yours?" And these seem like valuable problems to solve, but in the reality we're both just spinning our wheels and nobody's really getting anywhere. Now we're focusing on the asset almost in the same way we would get stuck in a data warehouse project trying to focus on data integrity and all these things, which I'm not dismissing that at all. It's just preventing us from progress more so than it's helping us avoid a mistake.

Rob Collie (16:18): Anticipating that those problems might happen. If we end up with two separate models that have some overlap and changes in one aren't going to make it to the other one, that is a legitimate problem. It's just that you have to rank problems. You're describing a future state where things are already going so well that this is good problem to have. Go create yourself that problem, and it might very well be addressable. You might be able to unite the clans in the future, but getting to the point where both wings of an organization are getting what they need, folks, let's go get there. And oftentimes you wouldn't build these in parallel anyway. You build one then the other.

(16:57): You might be able to start from the model you built for purpose one. So here's something that I do not know. I still do a lot of Power BI work, but I'm not doing big enterprise type client work. I'm not doing even big mid-market, I don't know how far we've come in terms of model inheritance. How effectively can you say, "Hey, I want to inherit from this other model a bunch of stuff and then add on top of it." Have we reached a point in the ecosystem where that's a practical approach?

Justin Mannhardt (17:28): There are different features in the Power BI ecosystem that can lead you to some success there. Now we have the ability where we can take two separate models, stitch them together and use them in the same report. This is not really going to be a solution for people that don't have these types of resources, but Power BI has made a lot of progress with things like code repositories for models so you can have multiple people working on the same model at different times. We've made a ton of progress in that ability to collaborate zone, yes, for sure. Despite all of that, which I think is really positive and it's good for the ecosystem, it also brings a certain level of complexity that not everybody wants to wrestle with, nor do they need it.

Rob Collie (18:10): Yeah.

Justin Mannhardt (18:11): Let's say we solve a clear problem, we build a model, we build some dashboards off that model and everything's great, we love that. And then we say, "Okay, Rob, let's work on the next project now." We start working on the next project and we realize, "Hey, there's a lot of similarities in the data we need for this next project versus from what we already did." And let's say we discover there's a lot of value in us doing a certain type of calculation that requires data at a certain type of granularity, one of my favorite examples. Okay, now we have two fact tables that exist in our world that are slightly different, but for well-intentioned purpose, so what's the right answer?

Rob Collie (18:50): I'm waiting.

Justin Mannhardt (18:51): Yeah, okay. Do we replace the old fact table with this new one? Oh, but that might mess with all of these other measures. My point here is if you're catching yourself in these moments to be like, "Oh, wait. There's going to be technical debt and technical complexity here." My experience far says, "You know what? Do what's best for the project you're working on right now." Like you said before, I want to emphasize this. Let the problem become serious enough where there's value in resolving it.

Rob Collie (19:19): Yeah. It turns out that our ability to anticipate future problems is pretty poor. It's compounded by our ability to anticipate what the solutions to those problems should be, and let's throw in there the ability to anticipate how great of a magnitude the problem will be in practice.

Justin Mannhardt (19:37): Yeah. A great example of this is open records, and so what I mean by that is something that is in an open status. It could be an order that's in an open status or a work request or a project. This type of data usually has dates associated with a start date, an end date, dates that help you understand when something began and ended. And so when data's of a certain size, we can use measures to look at start and end, and derive, "Oh, this is still open based on some business logic." Now we might say like, "Hey, we actually want to model the data differently." So we just have a record a date and an open yes, no, instead of a window of dates. Very different structure of data, but our first problem might have not led us to need that. Our second problem might lead us to needing that, and so rather than go back and redo everything, it's not worth the time and effort.

(20:34): The key principles as a recap this far, start with your focus on the right side of the problem. What's the business application? What are the things the users need to be able to do and see and understand about the data? Work backwards from there, serve that need. And if you do that, you'll see what data you need and you're not going to think about, "Well, should I have a sales model or a financial model?" You're going to think about the business in its full context, and then I'm always going to ask the question, how do we leverage the models we've already got, but I don't ever want to feel like I'm constantly retinkering with everything to keep it all perfect. I think that's a core lesson here.

Rob Collie (21:13): All of this could fall under the heading of artifact-driven thinking versus workflow. This even applies even with the design of a dashboard itself. Ignore the model. Assume you have a perfect model built. You can still go and build a dashboard that's attempting to be too much of a noun, that's trying to be everything to everyone, like the universal source of the place you go to answer all of your questions, and it turns out that it's actually terrible for any specific use. Everyone that comes to this thing ends up either consciously or subconsciously realizing that this thing is their opponent and it's not actually helping them do their job, because it wasn't designed around one of their workflows. One of the things that they actually need to do. By trying to be good at everything, you end up being bad at everything. Much better to have separate dashboards, each one of which caters to a particular portion of the workflow. I continually make this mistake, even though I might be the world's number one proponent of doing things the verb oriented way, even I continue to stub my own toe on my own lesson.

(22:19): I did a whole episode about how I messed up the first iteration in a way of the dashboards for my hockey league. By the way, right now while we're recording, there are text messages being exchanged right now about the data model and the refresh process for this week in the hockey league. This happens every week. I'm still included on the text threads where they're talking about all the different updates they're making and everything. It's really cool that something I built, it lives on after I'm gone. Yeah, so dashboards are artifacts, data models an artifact. If you're not thinking about the human beings and the actual needs and their workflows and everything, you're kind of missing it.

(22:56): Lean in on the verb, all this future anticipation of future problems and future wins, again, you're wrong about it. You're completely ill-equipped to properly anticipate what that future state's going to look like and how it's going to feel and what the solution would be. Plus, along the way, you're just tying yourself up. You're not doing a good job of the thing that you were after because you were distracted by this other thing, so you and I completely agree about this. There's no controversy here. All right.

Justin Mannhardt (23:21): So far.

Rob Collie (23:22): When you reach a point where you've got something working pretty well for finance purposes or whatever, and now you want to move on to like, oh, let's help the operational side of the business out a little bit more, you have so many choices of where to start, and you're going to be pretty well-informed. Right off the bat, 90% of the questions that you would have going into this before you built the first model are going to be so obviously answered to you, and it's not about the technology, it's not about theoretical best practices. You're just going to see like, "Oh, I know what we should do." Whether it's a save as of the original or a start from scratch all the way up to these more sophisticated team collaboration, deriving models from other models and all that kind of stuff, which I imagine myself not even choosing to do that when it was the right decision, because it just sounds awful.

Justin Mannhardt (24:15): There are people at P3 that are way smarter than you and I on all that stuff.

Rob Collie (24:19): Okay, good, good, good. That's how we want it to be.

Justin Mannhardt (24:21): That's the way it should be. There's a second point here. I think it's really related to have your focus in the right place, but when you're doing that, to also make sure you deeply and accurately understand how the user's going to interact with that data. The vast majority of people, they use the dashboards. The vast majority of people don't use the model. So when we have our little arguments about, "Well, we're going to have two measures that are both doing sales, and what if you pull this one in and you have the wrong filter?" Most people are never going to have to worry about that.

Rob Collie (25:00): By the way, when you bring that up, you absolutely use that voice.

Justin Mannhardt (25:03): Yes, every single time because it's silly, but it's also something that you see quite a bit, which is we're expecting people that A, have no interest in, and B, candidly, that's not a skill or a talent that they need to do their regular job. We're expecting them to work with the model. To go into the fields list and understand it completely, it's just not how the majority of the world interacts with the data.

(25:31): Now, there are people that do do that, and that's I think one of the how do you proceed answers is, if you are trying to build a model for ad hoc dashboard development, ad hoc Excel analysis, you got to have a finite group of people that are doing that, and now you're in a state where you are trying to bring it all together to give them a tool where they can explore the things that are coming up, but I feel like we emphasize that use case way more than we need to because the people you're really trying to serve with the data you have, they're going to click on charts and filters and navigate between pages, not build a pivot table from scratch in Excel.

Rob Collie (26:15): So basically everything we've said so far can be distilled down as follows, multi-fact models, multi-fact table models, a single fact table model is going to work, but they are the exception. You should be suspicious of any model that only has one fact table. You should also be very suspicious of models that aren't models, that are just a bunch of separate fact tables in one file, so multi-fact with bridging dimensions, nine times out of 10 if you're not doing that, you're missing something really important.

Justin Mannhardt (26:44): Agreed.

Rob Collie (26:45): The other principle is JGS, just get started.

Justin Mannhardt (26:49): Ooh, put it in the glossary at P3, new term, JGS.

Rob Collie (26:55): To go with now you can, so we're going to take now you can and pair it with just get started.

Justin Mannhardt (27:00): Just get started. Business intelligence, it's unique in that regard, which is just impossible to know everything. You need to discover a heck of a lot of it. Everything from how your model should be built, how the calculations work, what the user really wants in the dashboard itself. All of these things are discoveries. They're all things that need to be discovered by a group of people working together.

Rob Collie (27:26): I'm prepared to revise. I think it's more than two things. So number one, multi-fact. Number two, just get started. Number three, workflows and stakeholders and actual day-to-day problems, focusing on those verbs, not on the artifacts that you're building. And then a close cousin of that, number four is what we call faucets first, and it plugs right in with just gets started as well.

Justin Mannhardt (27:52): Yep.

Rob Collie (27:52): Be working towards delivering the thing that helps people as soon as possible get out of this business of plumbing first and data forward and all of that, just get moving and evolve.

Speaker 2 (28:05): Thanks for listening to the Raw Data by P3 Adaptive Podcast. Let the experts at P3 Adaptive help your business. Just go to p3adaptive.com. Have a data day.

Check out other popular episodes

Get in touch with a P3 team member

  • This field is hidden when viewing the form
  • This field is hidden when viewing the form
  • This field is for validation purposes and should be left unchanged.

Subscribe on your favorite platform.