A Bathtub of Data Holds an Ocean of Information (plus the Midmarket Paradox and Other News)

Rob Collie

Founder and CEO Connect with Rob on LinkedIn

Justin Mannhardt

Chief Customer Officer Connect with Justin on LinkedIn

A Bathtub of Data Holds an Ocean of Information (plus the Midmarket Paradox and Other News)

Is the Midmarket Secretly Winning with the Microsoft Data Platform?

In this episode, host Rob Collie explores a surprising quirk in how Microsoft’s powerful data tools like Power BI and Fabric are being used.

It turns out that midsize companies – not big enterprises – are actually getting the most bang for their buck from these game-changing data platforms. The episode explains how the speed and affordability of the new Microsoft tools give smaller organizations a real advantage.

Rob also shares an unexpected lesson he’s recently uncovered through his own personal analytics projects. Even with just two basic data points per player, he was able to unlock a crazy amount of useful insights and stories. It just goes to show the hidden richness that can be found in even the most simple datasets. Check out the Hockey Dashboard referenced!

So tune in to find out how a “bathtub” of data can hold an “ocean” of information – plus hear about the midmarket’s secret winning strategy. It’s an episode you won’t want to miss!

Subscribe now for new episodes every week, packed with practical data insights!

Episode Transcript

Rob Collie: Hello friends. In today's episode, I'm going to run through a bit of a grab bag of things that have been on my mind lately. As always, these represent both finished thoughts, as well as thinking in progress. If you listen to the intro of one of our recent episodes, you heard that the podcast often forces me into clarity than I would've had otherwise. So the idea here is that everyone benefits.

First, let's talk about the paradox of the Microsoft data platform and what it represents for the mid-market. There's no paradox in its value to the mid-market. On the contrary, things like Power BI and Fabric are actually more valuable to mid-market companies than to other sized companies. Now, the paradox instead is in how the mid-market isn't a sales and marketing priority for Microsoft, at least not when it comes to the data platform tools.

Now, there's a handful of reasons why Microsoft data tools are such a boon to the mid-market. One, the software is just crazy affordable. Microsoft has historically priced these tools on a per-seat basis and with enterprises in mind, organizations with thousands of users, this makes the mid-market a bit of a loophole in Microsoft's licensing strategy, given that mid-market orgs have many fewer people than enterprises, and even then, only a fraction of those require licenses. So the net-net is that many of our mid-market clients at P3 only pay a few hundred dollars a month in software licensing to Microsoft. We like to say that it rounds to free.

Now, I would be remiss here if I didn't at least mention that with things like Fabric, Microsoft is diversifying away a little bit from that per-user model. With Fabric capacities, it's more of a pay for the capacity as opposed to pay per user. Even then though, the Fabric capacities, if you need them, are still incredibly affordable compared to the vast majority of the competition. So even when Fabric changes the model a little bit, it still doesn't change the message that the pricing and affordability of this software is a major, major, major opportunity for mid-market companies.

Now, that's all about price, but the business impact, the business improvement that you can get from the Microsoft data platform is extraordinary. Once you've seen what it does for you, it's the kind of thing you would've gladly paid 20x for, but you don't have to and it's nice that you didn't have to pay more, pay that 20x, because it makes it so much easier to try it out. If it had cost 20x upfront, you might not have tried it out, you probably wouldn't have tried it out. It's just that once you have seen the impact, you will say, "Yeah, I'd pay 20x for that. I'm glad I don't have to."

Now, why is the impact so great? Why and how did these tools provide such business improvement? And we've covered this a lot on this show, so I won't belabor it, but it starts with speed and agility. The prior generation of data tools was a lot more expensive in terms of licensing, but the bigger problem, actually, was that they required massive time and money expenditures to implement them. Those project implementation costs priced the mid-market, largely priced it out of the game, much more so even than the software cost, which was more expensive back then, but still it was the project implementation cost that was the primary driver.

Enterprise organizations, with their massive scale, could eat those costs as unpleasant as they were, but the mid-market largely could not. And this led to an inequality between the mid-market and the enterprise level. Now, for perspective, take something like the PC revolution of the 80s and 90s. That lifted all boats, all organizations regardless of size, and, in fact, it disproportionately helped small and mid-sized orgs more than it helped enterprises because the complexity of having a handful of PCs was no big deal, whereas the complexity of having 10,000 desktop PCs was an absolute nightmare for enterprises.

In fact, in 2000, the year 2000, I was working on a team that was created within the Microsoft Office organization specifically to help combat that nightmare for enterprise organizations. I was part of the total cost of ownership team, TCO team on the Microsoft Office product team, and it was a sexy team to work for. Can you believe that? That was a really good place to work at the time? So it's neat to think of tech in this way, in that sometimes it disproportionately helps certain sizes of organizations more than it helps others. Well, the business intelligence movement of the 2000s and 2010s was the opposite of the PC revolution, more severe than the opposite actually, because really only enterprises could afford to even play. And this quietly widened the gap between enterprise and mid-market capabilities. Enterprises with their economy of scale were able to modernize in ways that the mid-market couldn't, and this happened over a period of nearly two decades.

So if you work for a mid-market organization and you feel like you're stuck in the past, well, A, you're not alone in that feeling and B, it did happen for reasons far outside of your control. But the modern Microsoft data stack, things like Power BI and Fabric, are much more like the PC revolution. They actually benefit the mid-market disproportionately more than they benefit enterprises, whereas the previous generation of data tools benefited from the massive overhead of enterprises, their ability to afford and eat these terrible implementation costs. The new generation of tools does not because projects are now super affordable and fast. And one of the things we've seen over and over in our consulting practice here at P3 is that the overhead scale of enterprises functions like a speed limit. The faster project timelines of today do benefit enterprise organizations for sure, but the size and scale of their political structures caps the speed at which they can move.

It's almost like the sound barrier. Above a certain speed, you start to get like destructive shock waves in enterprises, so you have to stay below Mach 1, even though the tools are capable of Mach 5. Mid-market orgs, though, have much, much leaner team structures and whereas those were a disadvantage for data-driven improvement in the prior two decades, they're an advantage now. You can move at a much more nimble pace than enterprises can and can take much greater advantage of the headroom and maneuverability offered by this new wave of tools. Also, you're not carrying the weight of all those old investments that enterprises were saddling themselves with over the past 20 years. What might feel like living in the past is now actually the advantage of being unburdened.

These tools meet you exactly where you are today. They don't require your infrastructure to be prepped first. Whatever systems you're using today to run your business, the modern Microsoft data tools are positioned to get you actual useful results often within the first week or so of engagement. To underline this whole topic of how the new tools disproportionately benefit the mid-market, we've been kicking around the idea of a marketing campaign for our services here at P3, along the lines of your small team and limited resources are now an advantage. And now, of course, it's time for you to go chase down those slow and lumbering enterprise organizations.

So if all these advantages are true, which they are, why isn't the mid-market a big focus for Microsoft when they go to market with Power BI and Fabric? Well, it actually makes sense. Microsoft's strategy for business software has always been to have a lean sales presence. Even when they're targeting enterprise customers, Microsoft sales reps often are responsible for dozens of accounts, and this goes along with the fact that their software has always been more affordable, often by a wide margin than their competitors. For example, it made sense for Tableau to carry a much larger sales team than Microsoft did, and they often even parked representatives on site with their Tableau customers, building dashboards for them because the licensing costs for Tableau were so crazy expensive relative to the Microsoft offerings.

So to get the right profitability, each customer that Microsoft assigns salespeople to must represent a lot of seats and right off the bat, that makes the mid-market less likely to gain Microsoft's attention as a strategic focus for its data platform tools. But there's even more nuance to it than that. For example, Microsoft does have an apparatus for targeting the mid-market. It's just that the data platform isn't part of that apparatus. Microsoft does have teams whose specific focus is to gain greater penetration into the mid-market with Office 365 for instance. And why is that? Well, Office has already won the enterprise space. Google Docs made a good run at them but came up short and the result is that Office is now even safer than it was before.

So to grow the market for Office, they need to look at smaller orgs and that's what they've done. Power BI and Fabric, by contrast, well, they have not yet conquered the enterprise. They're on their way, for sure, but there's a lot of growth yet to be had there and it makes sense to stay focused on the biggest fish until you've got all the biggest fish. Microsoft Dynamics is another example that's actually disproportionately focused on the mid-market and it's a Microsoft product. And long-term, I do think companies like SAP and Salesforce are at risk from Microsoft Dynamics, but that's a long road and in the meantime, mid-market is an absolute hotbed for Dynamics adoption. It also helps that both Office and Dynamics are the kinds of things where basically everyone in the mid-market org will need a license. At our mid-market clients, we often find ourselves starting off with just only a handful of licensed users for, let's say, Power BI because that's where the data-driven leverage lies, especially just to start out.

Now, over time, as those orgs shift their cultures to a more of a broadly modernized data-driven DNA, you do start to see that things like reports and dashboards reaching wider and deeper throughout the organization, but even then you sometimes get away with shared kiosk style licensing in many cases and your user footprint still isn't going to be so heavy that it adds up to truly significant licensing costs. So it's weird, right? On the one hand, the place where Microsoft's modern data platform makes the most impact, where it's most valuable, where it's most transformative is not the most valuable place for Microsoft itself, and in fact, it's in distant second, for the moment, behind the enterprise. But mid-market organizations don't need to second-guess this. Even though it feels like we're getting away with something, it's not like it's hurting Microsoft that all these mid-market orgs can revolutionize their operations at low cost. Mindshare is super, super important to Microsoft and converts to their platform are nothing but a win for them, regardless of org size.

Let's start this next thought with a question, which is there more of in the world, data or information? For example, if you have 12,000 data points, 6,000 rows of data with two data points each, 12,000 data points, how many meaningful questions can you answer from those 12,000 data points? Can you answer precisely 12,000 questions? Can you answer less than 12,000 questions? More than 12,000 questions? And if it's less or more than 12,000 questions, is it by a little or a lot? Now, just ruminate on that for a bit and we'll return to it as we go.

Even after several decades in the data industry, I continue to learn new lessons and reinforce old ones every time I directly engage with a project. As CEO, I haven't been directly with our clients' projects for probably almost 10 years, but my back-office analytics projects and the personal analytics projects in my life continue to teach me things. I've spoken recently on the podcast about the analytics we use to help my wife, Jocelyn, manage some of her medical situations, but today I want to briefly return to the dashboards I built for my recreational hockey league. And the thing that's blowing my mind all over again here is the incredible basically bottomless richness that can be derived from even primitive amounts of data.

For each game, our league, our hockey league, records only two data points for each player. We record the number of goals they score or scored and the number of assists that they're credited with in that game. That's it. We don't record anything else. We don't record how often a player goes in or out of the game or how many minutes they played. We don't record who else was on the floor when goals were scored. We don't record the time and the game that the goal was scored. And when we record an assist, we don't even record which goal that assist was part of. We also record nothing, no data at all, specifically about the goalies in our league.

So compared to a business where basically everything that happens generates a detailed digital footprint, our hockey league captures only a tiny fraction of the data that we're generating. Most of it's just lost, doesn't even get written down. Now, we'll link the dashboards in the show notes if you haven't seen them, but let's just say that they are shockingly comprehensive and I never really thought about the contrast between the intense level of richness in the dashboards versus the primitive nature of the original source data. I'm accustomed to sitting down with a data set and teasing all kinds of richness out of it. So when that same sort of richness started coming out of the hockey data, I never really circled back to reflect on how simplistic that underlying data was.

A few months ago, though, I posted the dashboards in a hockey players group on Reddit and the top comment on the post stuck with me ever since. Here, I'll read you that comment verbatim, "This is hilarious because they are literally tracking two stats, goals and assists, and somehow turning it into a Bloomberg terminal showing every economic data point that exists." At first I thought that person was making fun of us, but then I pretty quickly realized they were complimenting us. I hadn't really thought about how primitive our source data was until that comment, and they were 100% correct to say what they did, except rather than saying, "This is hilarious," they probably should have said, "This is amazing." And instead of attributing it to us, our achievement in terms of building these dashboards, they should have attributed it to the richness of data period, because there is a tremendous richness of information even in so-called primitive data.

If you have a bathtub full of data, there's an ocean of information in it. And if you have a water glass full of data, there's an Olympic-sized swimming pool of information in it. That might seem counterintuitive, of course. So let's ask the question, where does all that extra information come from if it so wildly outweighs its own source data? For starters, there's a lot of implicit data being recorded that you wouldn't think of. For instance, we know if you showed up to the game because you won't even have a row in the data if you weren't there. We also know what team we were playing for in each game, and we also have a date and time stamp for each game. As a result of that, we know who your teammates were and we know who the players were on the other team.

Now, did you catch that part about teammates and opponents? That information is not captured on the row of data itself. If you go look at the source data row from me for the game I played last week, you aren't going to see a list of my teammates or opposing players on my row. You're just going to see basically my name and goals and assists, which will probably be zero for me in both cases. So the information I'm talking about, who are my teammates and who are my opponents, while it is very much relevant to that row, that one game that I played, the data points in question are captured by other players rows. And this is where 99.9% or more of the richness of information that's living in data, where that comes from. Every row of data exists in context to all of the others.

So let's keep going with this thought. If you grab all of the rows of data that are about games that I played, so all of my rows of data, you can calculate my lifetime averages in goals and assists and yeah, you're going to get small numbers because I'm not good at hockey, but then you can compare, say, just the rows, my rows from this season and their average to my averages from before this season. And you can reach the surprising conclusion, surprising to me anyway, that this recent season, this current season has been my best season overall so far, which runs quite contrary to the anecdotal impression I would've had otherwise.

One of my league mates that plays with us, he works in data as a data engineer, but he and I don't get too many opportunities to talk shop because when we show up on Wednesdays, he's too busy being good at hockey and I'm too busy being bad at it. But over the summer, we were at a barbecue together and he asked me about the dashboards and asked me if I was using AI to extrapolate and/or invent all of the stats in the dashboards. Now, that conversation strongly underlines the point I'm making here. Now, this guy, Aaron is his name, he's super bright, super thoughtful, but his job is on the back end of data, back end of data plumbing, and at that layer, if you capture a million rows of data with five columns each, well, you're going to have five million cells of data. And from a back end perspective, that's not going to change. And he knows that all I have is about 6,000 rows of hockey data with two columns each.

So when he sees the richness of these dashboards that I've made, and he's seeing things like, well, player X has a 45% win percentage lifetime, but that goes up to 53% when they play with Player Y, but that goes down to 33% when they play against Player Z, and these things are calculated for every single pair of players in the league, both for and against. He's also seeing things like the three players he is most similar to because I'm calculating a Pythagorean distance of his average goals per game and assists per game versus every other player's similar X and Y position on that virtual grid and identifying clusters of similar players. He's seeing a universe unfold in front of him that clearly goes far, far beyond 12,000 questions answered, and he instantly, intuitively knows something is off, except it's not off. It's the richness of data and it's practically infinite.

You can actually answer billions of questions from 12,000 data points, trillions even, because the relationships between those 12,000 data points and the ways you calculate and combine them are basically infinite. And we keep discovering new questions to ask. This week on Facebook when I published the season-ending totals for the league, for example, one of the facts was that the two highest individually scoring games of the season were by Nick and by Garrett with nine points and eight points, respectively. One of the goalies chimed in on Facebook and said, "Hey, fun fact, both of those, the nine-Point game and the eight-point game, they were both in the same game and they were teammates, and I remember this because they did that to me." Naturally, that spawned a new curiosity we never had before. What are the highest combined scoring games in league history by teammates?

And we discovered that Nick and Garrett's game, that combined nine and eight-point game, 17 points combined, tied for second place all time. And then hilariously, we discovered that for three of the top five such performances by teammates in league history, it's been that same goalie in question being victimized. It's funny too because he is actually pretty good goalie. So then that triggered along the way, we wondered aloud if there was a tendency for people to pass more in the summers when it's hot in there, it's not air-conditioned. So when people's cardio is under the most duress, is it less likely for them to try to carry the puck end to end? Are they more likely to pass? And sure enough, the Power BI semantic model was prepared to answer that question with just a few clicks and yes indeed, assists do go up in the summers.

Now, if that all sounds silly to you, well, in this context it is. There's nothing at stake in a recreational hockey league's analytics project other than entertainment and basic human curiosity. But the patterns I'm describing here, specifically of constantly discovering new and ever-richer questions that are answerable, that does 100% hold true when money and careers are on the line. In fact, it holds more true both because the incentives are stronger and because the upstream data capture tends to be more digitized and comprehensive than what we capture with pen and paper at the scorer's table. And by the way, this interaction with my hockey public, if I harken back to last week's episode with Scott Sewell for a moment, it really underlines for me how insanely valuable these semantic models are going to be when we finally get an interface that allows for true self-service by users.

The semantic model here can answer an infinite variety of questions, but the number of dashboards I'm going to build for people is decidedly non-infinite. And the number of dashboards the user are going to bother even looking at is even lower. I get asked questions all the time, that there are dashboards that answer the question, they're just not aware of it. And it's super lame when you think about it, that most of the time people's questions go unanswered just because it requires me in the loop. Even if it's just a few clicks in the field list that I need to make, it still requires me in the loop to do that for them, but in the not too distant future when they are able to ask these questions of my model themselves in plain English and reliably get the answers they're looking for in graphical form without having to do anything more than what feels like a casual instant messenger conversation with the hockey chatbot, holy cow, that's going to change this business immensely. Buckle up.

The last two things I want to talk about are each shorter than those first two. One of them is an informal milestone that we're celebrating at our company, P3 Adaptive. And the other is a bit of a hint of things to come with the podcast. Let's do the informal milestone first. For a number of years now, we've joked about measuring P3's progress as a company in terms of how many fantasy football leagues do we need, and often specifically calling out getting to three leagues is like a we've arrived kind of milestone. Well, folks, I'm happy to report that day has arrived and for the 2024 NFL season, P3 Adaptive is going to have three separate fantasy football leagues with 32 employees participating. Now, of course, participation in this as a P3 employee is not at all compulsory and we have less than half the company playing, but I'm particularly excited by the number of people playing this year who have never played fantasy football at all, whether at P3 or otherwise.

In fact, 14 of this year's 32 participants will be playing for the first time ever. The thing is, fantasy sports are really just numbers game masquerading under a sports veneer. I refer to it internally at the company as BVVA, Basket of Volatile Virtual Assets, as a nerdy way of expressing that fact that it's a numbers game and not a sports game. In fact, fantasy football, which I wandered into in my first year at Microsoft in 1996 is 100% responsible for me discovering my own latent data gene. I arrived at Microsoft as a computer scientist, but morphed into a data-crunching spreadsheet guy thanks to fantasy sports. So it's pretty cool, nearly 30 years later, to still be introducing others to the game that started it all for me. And yeah, we're marking off a long-discussed milestone this year.

Lastly, let's talk about this, the podcast itself. We're two months shy of our four-year anniversary of producing this show, and it's definitely changed and grown over those years. When we launched it in 2020, my only real core idea was to sit down with people who work in data and talk with them about their jobs and their origin stories, knowing full well that my personal Rolodex was basically loaded with hundreds of people like that. And that was a fun ride for the first few years, basically an excuse to reconnect with interesting friends of mine and have interesting conversations with just enough data relevance to get away with it in a professional context.

And while there was plenty of wisdom shared and discovered along the way, it was always nagging at me in the back of my brain that we weren't in our final form yet. It was very difficult in that tour of Rob's old friends format to truly provide our listeners with actionable business-relevant advice. The personal angle was interesting, but it kept us at a little bit too great of a distance from the ways in which the world of data was changing right before our eyes, primarily as a result of Microsoft's reinventing the data toolset.

And that's why, about 10 months ago, we changed things up a bit. I brought Justin in as co-host so that we could get more specific about the kinds of tech and problems that we're seeing and solving every day in our work at P3. And we explicitly set our focus to be on business impact. Whether you're a business leader or a data practitioner or if you're in that Venn diagram overlap of both, business impact is the language you need to be speaking and thinking in. Now, if you're expecting me to say that we're changing that up, well, no, that's precisely where we're going to be staying. That focus has only become more relevant with the disruption and noise coming from AI. So we're going to be sticking with it and not just for AI, because we believe the world is still a long way from reaping the full benefits of things like Power BI.

What we are going to do, however, is change the name of the podcast. Raw Data was a perfect name for a show where people sat down and just talked about their experiences. It was a deliberately vague name chosen to accommodate my desire to be unconstrained until such time as I figured out where we really should go. Well, now that we have figured out our real mission and have proven to ourselves over time that it is a sustainable format, the old name is a bit of an anchor. Think of it from a human factors and/or user experience perspective. If you didn't know about our podcast and your podcast app suggested Raw Data by P3 Adaptive with the mullet logo, would you have any idea it was relevant to you? Would you bother to select it and read the longer description? No, probably not. Ain't nobody got time for that.

That little square graphical thumbnail that appears in the recommendation section on my podcast app occupying less than 10% of the screen is the only chance we get to tell prospective listeners what we're about. And we're super, super proud of the show that we produce. Heck, it even helps us get better at our jobs. So the show deserves to be accurately represented to prospective listeners. So yeah, we're going to rename it and rebrand it so that the thumbnail tells people what we're about while yes, maintaining a sense of fun because we do have fun here, don't we? What are we going to call it? What are we going to rename it to? Well, that's an announcement for another day. Yeah, we're going to end this episode on a bit of a cliffhanger. I'm sorry. Don't hate us for it. There's a lot of work to do and there's no sense revealing it until everything is ready. In the meantime, rest assured that while the name, graphics, and yes, even the theme music is going to be changing, the focus and the format of the show will not. And with that, I'll catch you next week.

Check out other popular episodes

Get in touch with a P3 team member

  • This field is hidden when viewing the form
  • This field is hidden when viewing the form
  • This field is for validation purposes and should be left unchanged.

Subscribe on your favorite platform.