episode 217
Democratized Data Science, Custom Software is the Future, and the Data Gene Rides Again
episode 217
Democratized Data Science, Custom Software is the Future, and the Data Gene Rides Again
Every week brings a new AI model, a new benchmark, and a new reason to believe everything just changed. But for most companies, none of that matters if the people closest to the work can’t use these tools to build something real.
In this episode, Rob and Justin walk through what democratized data science really looks like. Not dashboards. Not prompts. Actual analysis and custom software built around a specific problem, driven by someone who knows the data well enough to challenge the answers. The difference isn’t the technology. It’s the person driving it. Someone who understands the data, the domain, and how to spot bad answers before they turn into bad decisions.
That’s where the data gene shows up again. When those people are empowered to build software fitted to how work happens, off-the-shelf tools stop feeling helpful and start feeling like friction. This episode is about noticing that shift while everyone else is still watching benchmarks.
Be sure to subscribe on your favorite podcast platform for weekly reality checks on AI and Analytics delivered straight to your inbox.
Episode Transcript
Announcer (00:04): Welcome to Raw Data with Rob Collie. Real talk about AI and data for business impact. And now, CEO and Founder of P3 Adaptive, your host, Rob Collie.
Rob Collie (00:18): All right, Justin. Well, welcome back. It's December 18th. We're recording an episode that we just realized is probably going to go live after the end of the year, so this is the New Year's episode.
Justin Mannhardt (00:30): Happy New Year.
Rob Collie (00:31): Picking up from last week's recording, the dry ice and pizza story from last week, it has an epilogue. There's a follow-up chapter in this story. So if you didn't hear last week, really for Christmas and our birthdays, my mom tends to send us dry ice packaged in a box that says pizza on it. You take the pizzas out, you put them in the deep freeze, and then you go play with the dry ice. So that was what I was talking about last week when we recorded. Well, that night, we had some of that pizza. Now, my mom might be listening to this. So mom, if you're listening, we love you. Enough time passes between sending us pizzas that my mom forgets that Jocelyn doesn't eat meat.
Justin Mannhardt (01:09): Ah.
Rob Collie (01:10): So we get a mixture of cheese pizzas and meat pizzas. This sounds like a problem, but it actually turns out to be a real Yahtzee moment for me because what we ended up doing is we take one of the cheese pizzas that we already have and we cook it in parallel with one of the meat pizzas that we just received. Jocelyn can eat the cheese pizza, I can eat the meat pizza, so two things really great happen here. Instead of splitting a pizza, I get pizza two nights in a row for dinner.
Justin Mannhardt (01:33): Full pizza for Rob.
Rob Collie (01:35): Well, no, no, half a pizza each night. I get two nights of pizza, and I get to eat meat on my pizza. These are both guilty pleasures, they're really nice, so it feels like a really good present. Okay, so we do that. We had one cheese pizza left over, and two more showed up in this batch with three meat pizzas so it's perfect, three and three. Took the old cheese pizza out, the one that we already had, and threw in one of the meat pizzas in the oven at the same time, same temperature, same amount of time. They're side by side, not one above the other.
Justin Mannhardt (02:03): Sure, crucial.
Rob Collie (02:05): And we sit down to eat these pizzas. Jocelyn's pizza is fine. And mine is kind of chewy, it's not nearly as done as hers. Now, still delicious. Still absolutely delicious. I loved every minute of it, right? But we were very confused as to how her pizza could be done and mine's not. And I started thinking about it. I'm like, "Okay, well, maybe there's a batch thing." The older pizza is, I don't know, whatever. Maybe use different materials or different ingredients, whatever. But then it occurs to me, oh my God, dry ice is way colder than your freezer.
Justin Mannhardt (02:37): Oh, more frozen.
Rob Collie (02:38): Way more. Dry ice is more than a hundred degrees below zero Fahrenheit. So these pizzas, it's so funny, you take these pizzas out of the dry ice and put them in the deep freeze and they're like, "Oh, it's so warm in here." That's 100 degrees warmer in the deep freeze than what they currently are, which is just hard to even imagine. So when the internal temperature of Jocelyn's pizza reached 100 degrees internally in the pizza, it's already warm at this point, mine was still frozen. Mine was still frozen solid in the oven at the moment.
Justin Mannhardt (03:12): Wow.
Rob Collie (03:12): Hers was already 100 degrees, it was 100 degrees behind. It wasn't just frozen, it was hyper frozen. And those pizzas are still probably coming up to the temperature of the deep freeze a week later, such is the power of dry ice.
Justin Mannhardt (03:29): So the cooking instructions are voided coming out of the dry ice.
Rob Collie (03:35): Are you removing it from the dry ice hyper frozen? When you took it out, was the dry ice still intact?
Justin Mannhardt (03:41): Please wait a week. Wait one week, preheat oven to 400.
Rob Collie (03:49): Again, nerd, just absolute, to be nerdy and paying attention is to have an extra dimension to life. It was almost halfway annoying that my pizza wasn't quite done, but then as soon as I realized it was dry ice, I was delighted, just enriched for having this. So I don't think we'll have a dry ice story next week, but you never know.
Justin Mannhardt (04:08): You never know.
Rob Collie (04:10): Let's talk about a data analysis project. I've talked before on the podcast a couple times about the Power BI model that I use to track a bunch of Jocelyn's health symptoms, all these inputs and outputs. Even when it's done being digested, the results are still overwhelming. I still get so many columns of inputs and outputs and are they correlating? Are there trends? Are there patterns? Are there relationships? And one time on a solo podcast a long time ago, I did something that in hindsight I shouldn't have trusted, where I exported a CSV file of all that data and just gave it to ChatGPT and said, "Hey, tell me what the correlations are."
Justin Mannhardt (04:48): I remember this.
Rob Collie (04:49): I trusted the answers it gave me, but I shouldn't have. This is exactly the sort of thing that the LLMs are bad at, they cannot crunch data. They cannot do that work. They can, however, write the code that does that work. So they're very reliable at writing the code that does that work and then the code itself is very reliable. But if you ask the LLM to just look at the data and tell you, I trusted those results and I did a podcast about it, and today's version of me would be like, "No, no, no, no, no. You do not trust the LLM to do anything like that directly." You can use code as an intermediary.
(05:29): I sat down with Claude Code and gave it access to that Power BI model. What do I mean by giving it an access? It's just running on my desktop. Power BI desktop. I haven't even published this PBIX file. There's no point in publishing it. I do all my refresh on the laptop. I do manual data entry every day on the laptop. No cloud footprint of it. I don't even tell it the name of the file.
Justin Mannhardt (05:50): I just found the instance running of it.
Rob Collie (05:52): Claude Code just goes and find it running and it downloaded ADOMD or whatever it needed, all these things to talk to it, et cetera. And it does a really good job of querying. But now I have in the form of Claude desktop, I basically have a data scientist at my disposal. This data scientist knows how to write, Claude Code knows how to write the code and understands. It's even semantically understanding my data. It was understanding what these things meant. I had to clarify a couple things for it. It recognized what were inputs and outputs, what are the things that we directly control? I just asked it, "Just go run a bunch of correlations across all of it. Just tell me everything." It did an amazing job at that and helped cement some things for us. There were certain things that we had long believed but never proven to ourselves, and so it proved that this does cause that.
Justin Mannhardt (06:40): It was like a data backed up conclusion. Yeah.
Rob Collie (06:45): Yeah. And then it started finding all kinds of correlations between variables that I didn't trust. We've made other changes. So then I said, okay, effortlessly, I said, "Okay, filter your analysis to start with this date in March when we made some big sticky change that we never reverted."
(07:03): To rerun this and everything, to be writing all this manually and just way too heavy, it just reruns it effortlessly and says, "Oh yeah, you're right. A lot of those correlations just vanished when we filter by time."
(07:16): There was a time itself, this other background variable was confounding. At one point I said to it, "Hey, if any of the variables are blank like we didn't record them that day, please exclude them from any analysis. Don't treat it as zero."
Justin Mannhardt (07:31): Treat it as unknown.
Rob Collie (07:32): Right. And it said, "I already did that, Rob. I've got you."
Justin Mannhardt (07:36): Yeah.
Rob Collie (07:37): And then to add onto all of that, for a long time I've been wanting to track as one of the variables like did we work out that day? It's just really hard to keep up with. Well, guess what? The entire history of every time we've ever gone to Orangetheory is available via API. I've been mentioned on the podcast before that I wished all that was available. Oh my God, it is. And so 10 minutes later, I've downloaded her entire workout history all the way back to 2018.
Justin Mannhardt (08:06): Seven years. What, four or five times a week you guys go?
Rob Collie (08:09): Now we do, yeah. We started at three times a week and we took a year off for COVID.
Justin Mannhardt (08:13): Sure. I remember you built out the OTF style in the basement.
Rob Collie (08:18): Yeah, and it didn't work.
Justin Mannhardt (08:18): Didn't work.
Rob Collie (08:19): It just didn't work. Those Pelotons are awesome, but you got to use them.
Justin Mannhardt (08:27): Right.
Rob Collie (08:28): So now I'm asking, "Hey, do certain symptoms get better or worse on workout days?" And it's like, yeah, it turns out they do. And then it went farther without even me asking it's like, and on the more intense workout days, because it has access to her heart rate data and all that, on the more intense, believe it or not, there's a correlation between workout intensity and certain symptoms. This is just bananas. And I'm telling it to remember things. I'm telling it, "Okay, these conclusions we've reached, I want to make sure we don't have to retrace them," because it turned out there were also correlations that it was discovering that were backwards. In other words, it was saying the more oil you use on your door, the squeakier it gets. It's exactly the opposite of what you'd expect.
Justin Mannhardt (09:09): Yep.
Rob Collie (09:10): And I'm like, "That's not right." So then we dug a little deeper and it turns out that, well, you use more oil when the door's squeaky. You think that there's an inverse correlation between the fix and the symptom, but it's not like that. Using this Power BI file as a starting point, and then running these data science statistical calculations like cause and effect analyses, these are things that have been beyond my reach. Now I've got a command over this, an effortless command over this, and so even for this analysis type task, we've been benefiting quite a bit.
Justin Mannhardt (09:46): This story is an interesting example of a broader theme that I've been wrestling with, which is it really has to do with the macro software world. You're talking about Jocelyn's personal health and all her health metrics and what she's experiencing and how active she is. And you think about something like Apple Health on your iPhone and how this service was being designed, or Garmin or Aura or any of these things that's tracking these biomarkers and these metrics and how well you're sleeping and what you're doing to try and give you these insights, and none of those things could quite hit the riddle like this.
(10:33): We've entered this world where it's very achievable to create, whether it's an analysis or a set of tooling that's so hyper specific and better, that just starts to lead my brain anyways down to these interesting rabbit holes. You think about something like Apple Health that's trying to appeal to billions and billions of people and be relevant, but it doesn't do what you're describing.
Rob Collie (11:01): Yeah, this is interesting. You're right. This is for the third chapter in a story that already had two chapters for us. The first chapter of this story was you have SaaS software that does something like Salesforce, your CRM, your ERP, your blah, blah, blah, finance system, et cetera. And those line of business systems we've said for years are very, very good at what they do, they make the machine go. But when it's time to do reporting or analysis, it sucks. Its built-in reports are terrible. They're one size fits all and they are siloed to that one system and your business lives across systems, right? We've been saying for many, many years that out of the box analysis, out of the box reporting from software is useless. Just forget it. It never gets you anywhere. So that was chapter one of the story.
(11:51): Chapter two of the story, very recently told right here on this show and elsewhere, is that off the shelf AI isn't useful for business. You need to customize it. So I think what you're observing here is worthy of saying its own chapter three, which is that even software, even line of business software itself, off the shelf, off the shelf line of business software is going to be yesterday's news because we have the opportunity to custom fit the software to precisely your needs, yours. That Apple Health, like you said, it's designed to work for a million people, a billion people, whatever. Here's an example of that, even from this analysis yesterday. Orangetheory's data, by the way, I've got it to the point, it takes a data snapshot every 25 seconds of all your biometrics.
Justin Mannhardt (12:50): And an hour long workout, so 150 plus, right?
Rob Collie (12:54): Yeah, right. It's 150 data points per workout. If you're using one of the machines, it tells you that you were on that machine. So in addition to your heart rate and all that kind of stuff, it also gives you what was going on on that machine like what miles per hour were you at on the treadmill when it took that snapshot. One of the things I wanted was to know, Jocelyn went through periods of injury with her back and stuff like that, so she was power walking on the treadmill for a long time, but then she was also running. There's an error at the beginning when she was running and then she got hurt, power walking, and now she's back to running, but that's not tagged in the data.
(13:26): So I asked her, "Hey, what's the max speed you ever walked at when you were power walking?"
(13:31): And she goes, "5.8 miles per hour," which by the way, is ridiculously fast for power walking. She's way shorter than me and I can't come close to that.
(13:38): So I told Claude Code, "Hey, let's institute a filter. Any workout where her max speed was greater than 5.8, tag it as a running workout." Just we're getting so hyper specific, and we have the ability to do that. What's a good analogy for this? You know those things that get molded 100% to you like your own custom orthotics, it's stuff like that. Take a mold of your business, take your business and press it into this clay, and then we will render software into that shape.
Justin Mannhardt (14:14): I think it's still a valid future forward thought process, and it's scary. But when I look at my personal use of different things and I realize there's some assumptions here, right? Not everybody can go deploy something to a cloud subscription where you're going to pay for some modest compute and stuff like that as an individual. I was thinking about the companies I've worked with or for. You'd buy off the shelf things for an ERP or a CRM, but then a lot of the places I worked, we had teams that developed and maintained our own software for different things. So maybe it was software that our customers used to interface with us, or maybe it was software that we used operationally, but we were doing that, but it was an intensive effort.
(15:01): What you're describing here, data scientists on your desk doing all these things, you would either have to hire somebody, commit yourself to a long learning path to figure out how to do these things. The opportunity to go get what you actually want and need is so compelling to me right now. I was talking to someone recently, they were describing this future idea of the AI operating system. AI is just like, "Oh, you want a to do list? Let me just go ahead and build that over here or you need account." It's much more fluid and personalized. If I don't have to be boxed into the way someone thought something should work, why would I be?
Rob Collie (15:46): Yeah. And everywhere I go on this journey, I'm struck by these two opposed tensions, these two thoughts that pull me in different directions. One is, "Oh my God, the thing I just did is so destructive to profession X." And then at the same time, this giddy realization that not everyone could have done this. For example, I "replaced" a data scientist with the work I was doing. I don't have those skills, I don't have the time to go back and relearn which of the various correlation techniques you need to use in various situations and all that. There are really well established principles there that I just don't remember. I'm thinking through the problem at the same time that it is and catching it when it goes astray in ways that, again, it comes back to that data gene thing. For example, is this destructive to data science, or does it just mean that we're going to do a lot more data science?
(16:50): And I do think there is an elite class of people that's going to emerge from all of this that have superpowers. It's the data gene crew, it's people like us. And we have a way of talking about this stuff as if it's super, super, super accessible, because we're experiencing this feeling of accessibility that we haven't experienced before. I sat down probably seven months ago, I sat down with Amir Netz. He gave me several hours of his time. He was super generous and we were just talking about what the future looks like, helping me plot our course for our company, and the conversation he and I had was pretty influential on my thinking.
(17:30): And one of the things he told me was, "You should go sit down and try vibe coding, and at first it's going to feel like everyone can do it, but once you get a little bit into it, Rob, you're going to see it's not for everybody." I'm seeing that over and over and over and over again. I'm even hesitant to even say it because it sounds so self-congratulatory. It sounds so like me talking myself up.
Justin Mannhardt (17:53): Yeah.
Rob Collie (17:53): But no, there's definitely a bifurcation in the world here, at least with the way the tools are today. There's the haves and the have nots in terms of do you have the wiring for this kind of work? And you still need it. I have someone, a friend of mine that I met relatively recently out here in Seattle. He has, as far as I can tell, no programming experience. He's not been a techie at all. He's been coming at me with all these ideas of these applications he wants to build and some of them are really, really ambitious and others are ones that I know that I can do. I've been kind of a bad friend for a little while because I've kind of stopped talking to him about it. I started to get worried. He's really excited about all this stuff, all kinds of energy, enthusiasm, and clearly has an energy for it.
(18:43): But I'm like, "If I give him Claude Code, not give it to him, but point him in that direction, is he going to need my support through all of that stuff that I'm good at, but I don't know that he is?"
(18:57): And so for about a week and a half, I went dark on him and wasn't answering his questions and everything. And I finally realized, oh, okay, here's what I need to do. I just need to tell him, just be honest with him. I don't know whether this is in your makeup or not. It's okay if it's not. I just told him, "Look, I can give you a map and a compass. I can send you into the jungle. When you're in the jungle, you're going to have to live the jungle life. You eat what you kill, what you forage because I can't help you through the details, but I've been proactively deciding that you couldn't do it."
(19:31): When, again, the data gene, right? It's one out of 16 people, and the fact that he's shown this kind of enthusiasm and energy for it is indicative that he might have the data gene. It's not up to me to decide whether he does or not, it's up to him to discover it. So I did. I said, "Here's the map, compass, here's Claude Code." And didn't even tell him how to install, I'm like, "Look, if you even get to the point where you can launch a Claude Code terminal and it's like working, it's time for a party."
(19:57): And he texted me a picture yesterday of a screenshot of Claude Code running in VS Code and said, "Party time." So we're going to find out, I'll report back, but this is just like a sample of one finding out which group is he going to be in?
Justin Mannhardt (20:13): Yeah. I was thinking about voices of dissent I see in my LinkedIn feed. One that I'm coming to mind, it's like a chat between friends where it's like, "Oh my God, I was able to build an application."
(20:24): And the other person's a legit experienced dev and they're like, "Show me."
(20:29): And then there's a screenshot of a web browser with local host port 5242 or whatever, and it's just sort of a comical you have no idea what you're doing type thing. Or there's a guy who he built some app and he got exposed on the web and it reinforces your point. There's a class of people that are going to be really good at this, but that's not the reason why this way of working isn't going to take hold and make a difference.
Rob Collie (20:54): Right. It's the same pattern playing out just with a different magnitude, different intensity, is like the whole Power BI revolution. No, the Excel crowd could never do data warehousing and they don't know Kimball. And I'm just like, "Oh, you gatekeeping self-protective." You cannot be the wrong kind of elitist. You cannot believe that there's some moat around you that isn't there. You have to let yourself feel the fear, feel the threat.
(21:27): My friend, who I'm talking about in this case, right, either way, if he turns out to be a competent application builder, okay, that's going to happen. And if he doesn't, well, there's others like him that will. There are going to be people who join the software development story that were not part of the software development story before. A whole class of people, just like the Power BI thing, and no one believed back then either that anyone else was going to join the party, right?
Justin Mannhardt (21:59): Yeah.
Rob Collie (22:00): And I had to live through years of disdain and ostracism from the reindeer games of the BI world because I was saying that something else was going to happen and everyone's going, "No, it's just going to be us." Almost everyone who works at our company now is part of that demographic that they didn't believe existed.
Justin Mannhardt (22:18): You and I have approached this differently from our interactions about the things we're building. I've actually found it very educational and in fact, to the point where I'm enforcing Claude to do this. So I have a pretty disciplined way about which I go about building the things I'm building. And in that course, I have a sort of requirement where Claude explains why we're doing something and what it's for because it's using technologies I don't have a clue about because I've not been a full dev. "Oh, now I understand what that means," or, "Now I can challenge the idea that it's going to pick option A over and option B from an educated point of view."
(22:54): And so as each day goes by, it reminds me of the experience I had where you saw something like PowerQuery and it maybe was scary because someone sent you a big block of M code instead of showing you the UI, but then I went from zero to at one point in my life, I was like writing M code from memory because I was tired of waiting for the thing to spend and resolve. People are going to get there in this type of world. They're going to go from never writing a lick of code to being incredibly competent at delivering value in this way.
Rob Collie (23:27): It's interesting, each one of these sea changes in technology has allowed us to discover the latent data gene sitting in people. I wonder what percentage of the world of data geners has already been harvested by the Power BI type of revolution. We did come along and scoop up a bunch of them. Not all of them for certain, but the expansion of the BI and analytics universe was like add multiple zeros to the number. We at least 1000Xed the population that was involved in creating BI, involved in creating dashboards and actually formal good ways.
(24:16): I don't think there's room to 1000X again. If you take the Power BI type of audience, we might have picked up half of them. There's that many again to be picked up in this app development sense, right? Which is still an enormous crew. The chances that you're going to turn into a vibe coding app developer and you weren't already into Excel or into Power BI or into something like that. My friend Brandon I'm talking about, I don't think he's been into any of those things, but if I replay all my conversations with him, he just never had a reason to. He hadn't had his collision yet, and now he is.
Justin Mannhardt (24:55): We'll be following this story closely.
Rob Collie (24:57): Yeah. As I said at the end of one of, a terrible movie, but a great line, "We'll be watching your career with great interest." The vibe, the theme is real. We welcome the new crowd in because to gatekeep against them won't keep them out. If you tell yourself that there isn't a new crowd coming, you will make or fail to make plans in your career to adapt to what's happening. Don't do that. Embrace it. Looking forward to the new generation that's joining the story and excited about it.
Justin Mannhardt (25:30): Yeah. It's going to take that group of humanity to power the transformation forward. This story won't have a satisfactory conclusion if it just was like, "Oh yeah, we created this thing called AI." It's the ability for people to do things with it that are meaningful, that push it forward. And I can't quite think through all the ways, but I think it's very different. I think the future is more custom, more specific, more personalized.
Rob Collie (26:04): All right. So first episode of the new year in the books in advance, look at us.
Justin Mannhardt (26:11): Look at the foresight, realizing the recording to release timeline that we're on.
Rob Collie (26:15): I mean, really, really grown up. It's taken us, what, 200 and something episodes to get this dialed in, but we're getting there.
Justin Mannhardt (26:21): Yeah, we are.
Rob Collie (26:22): It's a pleasure, I will catch you next time.
Justin Mannhardt (26:25): All right, man.
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.
Subscribe on your favorite platform.