Raw Data By P3 Adaptive
ML & Algorithms Hit the Bigtime, w/ DataRobot’s Diego OppenheimerListen Now:
Hello friends! Today we welcome one of the big guns of analytics to our show, Diego Oppenheimer. He is an entrepreneur, product developer, co-founder of Algorithmia, Microsoft alumni, rugby aficionado, and current Executive Vice President at DataRobot.
Nothing is off the table in this romp through topics that run the gamut from machine learning to quantum computing. Even string theory gets a mention. Make sure you have had your coffee for this one, though, as the conversation takes a technical turn not often seen on this show. Rob and Tom engage with Diego in a lively debate on complexity versus flexibility of tools. Shockingly, this guest manages to tweak a host’s nose (through no fault of his own as times and DBA roles have changed and evolved).
Today’s conversations with Diego have inspired a new classification of the data gene: data enthusiasm. This rare variation is not often seen in the public sector but allows people like Diego to brazenly assert to business owners and stakeholders that he can show you things about your business you don’t even know . . . and be completely accurate.
Buckle up and crank this one up. You won’t want to miss a single minute as the data gets raw.
References in this episode:
Rob Collie (00:00:57): And while I did not understand that back at the time when he formed Algorithmia, now that I do understand it, it's really clearly that was a great idea. And the real world agrees because they did very, very well at Algorithmia. Another company with a better name, DataRobot, came along and bought them relatively recently. And that's where he works today. And yeah, the conversation winds in and out, primarily through that thread of machine learning and related algorithms. We got into some other topics, of course, including the trade off between flexibility and complexity on one hand and ease of use and simplicity on the other. And how this is very, very difficult for any software platform to truly get that Goldilocks balance.
Rob Collie (00:01:39): In an entertaining twist, we wandered into quantum computing. And I proffered one of my pet metaphysical sci-fi theories about what quantum computing would really represent. He's a super high energy dude, super successful, and also super down to earth. And it was a fantastic conversation, really enjoyed catching up with him. I hope you enjoy it as well. So let's get into it.
Announcer (00:02:02): Ladies and gentlemen, may I have your attention, please?
Announcer (00:02:07): This is the Raw Data by P3 Adaptive Podcast, with your host Rob Collie and your co-host Thomas LaRock. Find out what the experts at P3 Adaptive can do for your business. Just go to p3adaptive.com. Raw Data by P3 Adaptive is data with the human element.
Rob Collie (00:02:31): Welcome to the show. Diego Oppenheimer, how are you today, sir?
Diego Oppenheimer (00:02:37): I'm doing great. How about yourself? It's been what, 10 years since we talked?
Rob Collie (00:02:39): No. Well when did you leave Microsoft?
Diego Oppenheimer (00:02:43): 2013.
Rob Collie (00:02:44): So it's only been nine years.
Diego Oppenheimer (00:02:46): Yeah, fair enough.
Rob Collie (00:02:47): Maybe we talked once or twice since then, but you're right. It's been a desert of talking.
Diego Oppenheimer (00:02:53): Time is a flat circle.
Rob Collie (00:02:54): Oh, wow. Yeah.
Thomas LaRock (00:02:56): Wow.
Rob Collie (00:02:57): That's it. Throw the microphones down. End of show. We'll let that flat circle take care of itself, and the podcast will just record itself, right? So you were on the Excel team at Microsoft, and that's how we got to know each other. I got to know you as being, if not the only, definitely one of a handful of people on the Excel team who were very keen on the power pivot thing. If I want a traction on something power pivot related, I could talk to Diego. You were very enthusiastic about that whole thing. And Excel has a jillion different missions and has to satisfy a jillion different audiences and everything. So it makes sense that not everyone on the Excel team is thinking power pivot on the brain, even back then. But the thing is, I don't know much about what you did before that. And I don't know really that much about what you've done after that. Tell us what led you to working at Microsoft? What led you to working on the Excel team?
Diego Oppenheimer (00:03:50): Sure. I grew up in Uruguay, South America, and I moved to the US in 2002 to go to Carnegie Mellon. I did my undergrad there, did my grad school there as well in data analytics. It was actually one of the first information systems management with a specialization in data analytics or whatever you want to call it. I've been passionate about the data space and it actually all came in through, I got randomly an internship. And it's actually how I got introduced to BI and I got excited about it. I was working for, I don't know if you remember PeopleSoft. There was a PeopleSoft reseller and they actually built crystal reports in a bunch of different ways. We're talking old school any sort of data reporting. I remember I was interning for this consultant. He said, "Hey, you come with me on this sales call." Whatever.
Diego Oppenheimer (00:04:40): And he shows up, pops up with his laptop and he tells the manager at that point, it's a 7-11 manager type of 7-11 in Uruguay, it's just different brands. He's like, "I'm going to tell you something about your business you don't know." And I'm like, "Holy shit, that's brave." Met this person 10 minutes ago, I'm going to tell you something about your business you didn't know. Asked about his P&L or something, grabbed some data, some spread sheet, puts it together, builds a little report, says, "Hey, you know that your sales here did blah, blah, blah, and your sales here did something else." The guy's mind blown, and I was like, "I want to do this."
Rob Collie (00:05:11): Wow.
Diego Oppenheimer (00:05:12): This is it, right? This is what I want to do, this is amazing. You can tell a story with the data. I could be that brazen. I've never been a shy person. I could be that brazen and to go and say "I'm going to show you things about your business you don't even know that's in front of your face." That's where my whole journey into the data space led into. Fast forward, got my undergrad grad, got grad school. This is 2007 where I graduated from my grad school.
Diego Oppenheimer (00:05:36): At that point I thought I was going to go be a quant in finance, because it made sense. There's a lot of money to be made. Everybody wanted to do it. I was on the East Coast. This is the way to do it. Get a job on Wall Street, started working on Wall Street. Realize I love financial data, the precision, the amount of it, what can be done with it, not a huge fan of the people, and culture.
Diego Oppenheimer (00:05:58): At that point, a recruiter calls me and says you've been working in this BI space at this bank. You've been building reporting tools. You've been building a bunch of these systems. We're hiring a team for BI in office. Do you want to come join that? I was intrigued. Also it was a free trip to Seattle, so why not? And that was it. I showed up and he said, we're going to actually meet this [inaudible 00:06:18] I can't remember who it was. I think I talked to Dave Gainer at the time and he's like, we're going to go invest in this. This is going to be the visualization layer of the future. This is how the business decisions are going to be made. We already run the world. There's a pretty decent selling point there in terms of going to that.
Diego Oppenheimer (00:06:35): I left right before the financial crisis. I looked like a genius, oh totally saw it coming. I left, went to Excel, Wall Street was going down. No, had no idea. Like literally watched the terror happen after I had moved. I had joined Microsoft, so complete coincidence, but I look really good in the sense that like I left right before. And yeah, that's the origin story of how I like data and analytics and how I ended up at Excel.
Rob Collie (00:06:58): I do remember now talking to you about you having considered the Wall Street, financial quant path. It's worth taking a moment to talk about that whole phenomenon. There's a whole generation seemingly and maybe at this point two generations of incredibly talented engineering types. Huge chunks of them just got absolutely hoovered up by the Wall Street financial beast. And you're right. That's not really a particularly positive corner of humanity. There was a trip that some of the Excel people took to Wall Street and I wasn't on this trip. I did take other trips to Wall Street but I wasn't on this trip. The stories that came back from this trip were just jaw dropping. The people in the room that the Excel people were talking to considered the Excel people to be like one of them. So they started leveling with them, bringing them into the club. And the Excel crew on this trip learned that if someone called you a nice guy, that meant that you were a pushover, that meant that you were an easy meal. They did not respect nice people.
Rob Collie (00:08:03): And if they did think someone was worthy of respect, they would say that this person had sharp elbows. And then to top it all off, one person who worked at the Wall Street firm got up and offered to get them all coffee or whatever. They all gave their orders and everything. As soon as this person left the room, the other people who worked at the Wall Street firm started laughing. The Excel people asked, what are you laughing about? We're firing him next week. He's a nice guy. It's like yuck. So where do all of these nerd bullies come from? I despise nerd bullies. If you're smart enough to work on Wall Street, you probably were growing up, being stuffed in lockers by bullies, right? And then to turn around and to inflict that forward. Oh, those are the people that are first on my list.
Diego Oppenheimer (00:08:54): It's interesting because I've probably worked with financial services in Wall Street more over the last seven years.
Rob Collie (00:09:01): Yeah, I bet.
Diego Oppenheimer (00:09:03): It's a very different world these days. I think what prevailed my original interest in the industry, and there's just a lot of stuff that it's like a little bit of a boys club and there's some of that angle around it, but the thing that really prevailed was the actual data. So when it didn't change with the industry, it's still probably the most, it's the data that collects the vast quantity of structured data that exists out there. They've actually turned data into first class citizens. Their entire business runs on data. And so guess what? The people who actually know how to process that data and work with that data and set that data up, run the business. There's this concept, I can't say the name of the firm, but data is a first class citizen.
Diego Oppenheimer (00:09:45): That attitude, where if you think about you can go from what your perception was, if you haven't worked in the industry a long time, which is boiler room and Wolf of Wall Street, which a very relationship based who do I call, how do I get deals done, how do I sell stuff to, now it's data driven. The cool kids in the room have changed quite a bit, because the business is driven by this thing. And so I think that's quite changed. I think in general, the financial crisis in 2008 really changed attitudes. And there's a little bit of a humbleness applied to it because I don't know if that's true, given that there's still the excesses and stuff like that in the industry. But there has been quite a bit of change in terms of how these companies operate and what they do, and what's important to them and who they hire.
Diego Oppenheimer (00:10:29): So you asked about the trend of where I wanted to go work and stuff like that. Everyone wants to go work at a Silicon Valley startup right now. That's where all the MBAs want to go to. It used to be all the MBAs want to go to Wall Street and now all the MBAs want to go be a product manager at the next version of Twilio or whatever that is. And they make sense from a career prospect and excitement around it. But I think financial services in general, as an industry has started to kind of work into its promise of hey, all that data we collected, we can actually do stuff with it. And we're actually doing really, really interesting stuff with it and driving the whole business. And so the people who are attached to that and the process are the value of the organization.
Diego Oppenheimer (00:11:09): And nobody's like, you can't really deny it right? Is that salesperson that's picking up the phone 10 times a day, actually moving the needle or is it the guy or gal who's preparing data analytics and saying, call these few people now and offer them this and those sales happen. So there's a big change in terms of how the operating model of the business has gone to and where the profits are coming from. And at the end of the day, it's built on a capitalistic system so profits matter. So I think there's been quite a big change in attitude.
Rob Collie (00:11:33): Interesting. So while I wasn't paying attention, both Microsoft has become a nicer place to work. Although Office was always pretty nice, right? Microsoft is now run with a kinder, gentler sort of more humane hand. And you're telling me that Wall Street also has gone a little bit in that direction. I just don't know.
Diego Oppenheimer (00:11:50): My comment is around, you talked about nerd bashing. I saying the nerds took over this year.
Rob Collie (00:11:56): We just hope that the nerds that are running things are being nice to each other instead of mean. So we've already kind of jumped ahead a little bit. 2013 is when you set out on a different journey and let's talk about that. What triggered all that and what did you end up doing? And we'll pick up the trail from there.
Diego Oppenheimer (00:12:14): I had a good friend in undergrad genius programmer, went to do his PhD afterwards in machine learning. And he always used to call me and say, machine learning is the future. The world of descriptive analytics is important. But the world of predictive analytics is going to take over everything that we do. By the way, it's impossible to implement. He had this phrase that I love using, which is the future's already invented, it's just stuck in an academic paper. He was coming up with these machine learning methodologies and these algorithms, these things that were impossible to implement. They were really hard to get at scale. We were good friends. We went to college together, we traveled together, we've known each other forever. I used to joke around, look man whenever you're ready to stop mucking around with academia, you can come get a real job and do real things at Microsoft with us. We can build cool stuff whenever you're done with whatever you're trying to do there.
Diego Oppenheimer (00:13:03): And he goes, you call me and he was pissed off. I built this machine learning model that does like, blah, blah, blah. I can't get it into production. I can't get anybody you use it. They can't even install it on their computers because none of the libraries exist. And I always joked around and are you sick of academia yet? And then one day we were doing research, do you remember automatic pivot tables?
Rob Collie (00:13:25): I mean, which iteration of it? I mean we've had-
Diego Oppenheimer (00:13:30): V one.
Rob Collie (00:13:31): No, I was on V zero of automatic pivot tables before you were even there. I'm sure there was a V minus one of automatic pivot tables before me.
Diego Oppenheimer (00:13:41): So somehow this ended it up on me, right? I ended up owning it. I think the first public version that we actually shipped was to Excel 2013 is when it actually came out. There was a core interesting component to that, which is how do we actually guess what goes into the pivot table? What's the actual heuristics that you use? What's the modeling that you actually use to go look at the data types and the data, go look at how they're going to actually be attached and actually do that guess. And it was never meant to be we're going to get a perfect pivot table. It's just to get you started. It was always about there's two kind of people in the world, the people who love pivot tables and the people who don't know how to use them. How do we get people into know how to use them or get them away or afraid of them?
Diego Oppenheimer (00:14:19): And I remember that somebody sent me out to go talk to some person at Microsoft Research. Some professor and had been very [inaudible 00:14:30] on research I think they're out in Cambridge. And I was like, hey I'm trying to figure out how to do this. We probably have some algorithms that we can use somewhere around looking at data sparsity and stuff like that, that we could actually go figure this out. And he's [inaudible 00:14:42] oh, you mean data pillage? I was like, sure. He's like, well that's the economic name of these things. This is the actual concept and-
Rob Collie (00:14:50): It's open Office.
Diego Oppenheimer (00:14:51): And he's like, yeah, here you go.
Rob Collie (00:14:51): It's open Office. Open Office called their pivot tables, data pilot. And I was immediately struck by how much infinitely better that was than pivot table. I was so pissed off and I went off to try to rename pivot tables and I failed.
Diego Oppenheimer (00:15:04): So then to that point and he's like yeah, look here. And he popped open a demo for me. I've already done this. I can't remember, it was written in MATLAB or C sharp or F sharp or some unchippable thing for Excel. He's like, we're already done. Here you go. You can go make it a product. It was at that moment that I realized this disconnect. So here I have a person who, first of all, you remember I had this structure, we got quite a bit of people, maybe not as much people think are working on Excel team, but there's a decent amount of us. And this is core to our product. We're in the same company. We have no clue what this person's working on. They actually did have something that completely worked.
Diego Oppenheimer (00:15:43): The algorithm worked, the heuristic worked, all this stuff and then also they're being like, here you go you can just ship in. I'm thinking we have a billion active users. We need to test literally every single... like ship it? I'm just going to grab this random academic code that's not been tested by anything. So that's where my conversations with my co-founder Kenny started coming up and he's like, holy shit, he's right. There's no path from here's something we invented and this time of heuristics predictive algorithms, to this is shippable. And this is the extreme example because shipping a version of Desktop Excel was the ultimate waterfall.
Rob Collie (00:16:22): The Olympics of waterfall software.
Diego Oppenheimer (00:16:24): Yeah, exactly. That was the complete extreme, but he's like, well why don't we go solve this? Do you believe machine learning is going to be important? Yeah I buy into it. Do you think predictive analytics is going to be [inaudible 00:16:36] so why don't we go solve production? And we started working on this nights and weekends with my co-founder, we started a company called Algorithmia. The point was to solve production for predictive analytics and how you're going to go package analytics, deploy them, ship the code, scale it. Now there's an entire industry around it called Emolops. We were like, shit needs to be shipped was essentially what we needed to work out. And that was actually the genesis of the company and why we went into it. And it was excitement about the space and the fact that this was going to be the future of analytics and why not us to go solve it.
Rob Collie (00:17:14): So when you started the company, the mission at the time, and I'm sure this is the sort of thing that morphs over time, but I watched you leave on a course. You departed the harbor. As you headed out to the horizon I believe the mission was almost to create a marketplace for algorithms so individuals could upload and maybe sell or rent algorithms. Did that end up being the ultimate mission or were there twists and turns?
Diego Oppenheimer (00:17:43): It's actually the other way around. So we think people need to find these algorithms and these models and to be able to consume them. They need to be able to find them. And also they need to be able to quickly integrate them. And so it's the [inaudible 00:17:58] problem of how does this stuff get into production? How do you run it? And so a marketplace where there's a supply and a demand seemed to be the right methodology to showcase, it's really hard to find the right algorithms because there's academics everywhere. And that whole, the future is already invented it's just stuck in a paper. So what if there's a place where you could go find them? And what if not only is there people who can find them, but also a place where you could just very easily adopt them and integrate them and use them.
Diego Oppenheimer (00:18:28): And so this idea of algorithms as APIs was kind of the core concept that, we didn't know that there's no such things as functions as a service. So we started going down the path of algorithms as a service and created the first, what I would say, marketplace for algorithms as a service that existed. But I did have the learnings of how we had miserably failed at the data marketplace build out at Microsoft that we worked on. Remember we just couldn't get any traction on it. So I knew what. What were the problems there? Well, the consumption model was hard. The data sets weren't really that impressive. The only one that we ever wanted to work on with the stats.com data set, because nothing was awesome and had all the sports stuff and that was cool. And it was really hard to get to.
Diego Oppenheimer (00:19:15): Still one of my favorite data projects I ever worked on, which is the world cup power pivot demo that we did. The core of it was how do we create a consumption, an exploration and a catalog in a discovery methodology and a marketplace setting and then you could actually go and quickly implement it? And so now we had this concept of a marketplace. This is where got things got really interesting. So then we started looking at it from so we have this like two sided marketplace. Everybody talks about it when you're building a marketplace, there's a chicken and egg problem. What people don't realize is that pretty much every marketplace that's been successful, there is asymmetric view to it in the sense is you can cheat, not in an illegal way, but you can cheat on one side. If you think about Reddit, the founders were creating all the first sub Reddits and publishing all the content.
Diego Oppenheimer (00:20:02): When you're trying to kickstart that flywheel you can always go on one side. We had this problem where we need models and algorithms that exist on our platform so people can consume them. But then nobody's going to consume them if there's models and nobody's going to put models on it, if there's nobody consuming them. And so you have that chicken and egg problem. So then we realized well what side of this can we really affect? This is early days, then it kind of took a life of its own. We know how to write code. We know how to write algorithms. My cofounder and I was like, you need to go populate this. Go create a hundred users and populate it with all the natural language processing algorithms, forecasting algorithms, trendline algorithms, everything you can find that can be turned into an API, you just go do it.
Diego Oppenheimer (00:20:46): And so he did. And so he started doing that and building that out to create that gravity area. And the interesting thing, this is kind of where this story's going, is that because he's lazy in a good way, he said, well, if I need to go create all these algorithms and format them and publish them and create them into API, I'm going to automate this. I can automate this because I can grab code from GitHub and automate it. So what actually ended up happening is we created a packaging deployment and scale out platform for deploying algorithms, which ended up being the product. The marketplace was cue but the product actually ended up being the automation around packaging, deployment, integration management of all those APIs, because my co-founders being lazy.
Rob Collie (00:21:33): It's a meta algorithm. It's an algorithm for algorithms.
Diego Oppenheimer (00:21:38): Well the whole system around the whole platform, that's what Algorithmia ended up being, right? The whole concept of Emolops is how do I productionize with automation machine learning? The entire company came from that. Kenny was like, I'm just going to write packaging technology for this because I don't want to be doing this manually over and over again.
Rob Collie (00:21:59): Cool. So there was a pivot, right. It was sort of that accidental discovery.
Diego Oppenheimer (00:22:03): So the product was inside what we originally created. So yes, pivot, whatever you want to call. It's kind of you know, Slack was supposed to be a gaming chat. It was a gaming company that had a chat inside it and that became Slack.
Rob Collie (00:22:14): Yeah. Or like Instagram originally was this one corner of some other app, some sort of travel app or something like that. But check it out. Everyone really, really loves this one section where they can post pictures with filters. They're like, all right, screw it.
Diego Oppenheimer (00:22:28): There's the set of features that ended up being the entire company. And that for us, it was exactly that, it ended up being the whole company.
Rob Collie (00:22:34): So how long did that take for that to dawn on you? I'm sure you're in the weeds doing that for a while before you realize that is the most valuable mission.
Diego Oppenheimer (00:22:46): So I think the data started kind of playing itself out. We had a marketplace. The fastest growing sector of the marketplace was non-public algorithms. So you had two ways of doing it. You could use the technology and you could publish a private API for that model. People would use it, but it wouldn't participate in the marketplace dynamics, it wasn't available. You couldn't find it. It was only available to you. So you and Rob would come in, you'd grab your model and you would publish it. You would get an API, you could use it. And so what we started seeing is this growing number of people that were using the service without participating in the marketplace.
Rob Collie (00:23:27): Yeah. So they were hidden from the catalog. They were just using all those extraneous things, quote unquote this year.
Diego Oppenheimer (00:23:34): Exactly. Like all the tooling behind the scene and that sector, that population became growing and growing and growing. And at one point they were the driver. Cause we used to charge for essentially to compute under us. There's a service for actually deploying it. The concept was always publishing was free, but running and consuming was paid for. So you can publish something, you have to register stuff like that. But you didn't have to pay anything to publish an endpoint. Once you started consuming it, you had to pay for the consumption.
Diego Oppenheimer (00:24:03): Publish an endpoint. Once you started consuming it, you had to pay for the consumption of that. So the publisher and the consumer were the same person in all of these private ones, right? So they're using the service and they're paying for the service to run their own stuff. And so that's when you start thinking about is, "Well, what's happening here? Why are they not participating in the market dynamics? And why are they just doing this?" And, well, you start realizing, "Oh, they're using it for the tool. They don't have a way of publishing these endpoints, they don't have a way of running them, they don't have a way of managing them." We just solved the ops problem to their essentially data science workload.
Rob Collie (00:24:33): Okay.
Diego Oppenheimer (00:24:34): And that's kind of how we started realizing that that was where our value prop really, at least the value prop that people willing to pay us for, was.
Rob Collie (00:24:41): Gotcha. Well, let's dig in there. Let's slow cook that particular story. Because I want to learn a little bit more about it. So you mentioned originally the story about the automatic pivot tables and how it had been implemented, the algorithm had been implemented in some arcane, semi-academic language that didn't really have a shippable path to be part of Excel. So I'm assuming it's the same thing, that these customers that were doing the private, or the public, but let's just focus on the private people that weren't really part of the marketplace, because that turned in to be the real business.
Diego Oppenheimer (00:25:10): Yeah.
Rob Collie (00:25:11): I guess I'm really having a hard time at the moment understanding what the value that they get is. So for example, if I write some C sharp code, well, I don't have a problem running that, right? There's a million places. First of all, I don't write C sharp code, so this is a fantasy. But if I did, I would have a million different places to run it and I wouldn't need some service to run my C sharp code. So what is it about the machine learning stuff that particularly benefited from this productionizing service?
Diego Oppenheimer (00:25:41): Yeah. The people usually writing these algorithms don't have software engineering backgrounds.
Rob Collie (00:25:45): Okay.
Diego Oppenheimer (00:25:46): So if you think about the languages where data science came from, MATLAB, are statistical packages, Python obviously now is the one that's taken over. They all are very, very statistically oriented packages and they're essentially scripting languages in a lot of cases where you are able to just very quickly write some code to do, you probably took like some stat class back in the day when you were in college that had some stats package or something like that. You're just writing these statistics and that's where the algorithms come from.
Diego Oppenheimer (00:26:16): Going from there to an actual running service is a much bigger leap than going from a random piece of C sharp code that needs to be executed. You look at the population of people who go into data science or traditionally have gone into data science, it's your statistically inclined PhD in physics, right? It's your statistically inclined biophysics person. So they have a very, very strong stats background, but a usually, usually, not always, a very weak programming background, but just enough. Just enough to build a model, get the thing, exactly what I had with that pivot table. So going from there to something that can actually be used operationally beyond your desktop, where you're just running a script to just clean some shit up, going to something like, "Hey, I'm going to run a fraud model that's going to be like processing transactions and needs four nines of uptime, and all this stuff." There's a big leap.
Rob Collie (00:27:03): Yeah. Okay.
Diego Oppenheimer (00:27:03): Between like just having code that runs and something that I can actually rely on as a service. And so we bridge that gap.
Rob Collie (00:27:10): Okay. Very cool. So Tom, by the way, have you ever crossed paths with Tom? Do you know who Tom is? He's kind of a big deal.
Diego Oppenheimer (00:27:17): I don't think we have.
Rob Collie (00:27:19): He brings the Twitter check mark to our show. That's how [Crosstalk 00:27:22]. That's how legit-
Diego Oppenheimer (00:27:25): Do they even give ones of those away or are you like one of the OGs who has one of those?
Thomas LaRock (00:27:29): Yeah. I don't know. I got mine... they had already existed for a period of time. And then they just sort of said, "Everybody can apply." I think it was the first time they opened it up, so I just applied. I woke up one day, they go, "You've been now been verified." Which to me just meant that "We believe you are who you say you are in case you harass somebody and we want to come find you." And I'm like, "Oh, okay, whatever, I'm verified." And then, but then it became a thing that people would be like, "I don't understand why so and so is verified, and these other people not." I'm like, "It's not that they don't know it's you, they want to know where to find you." There's a whole thing of, "You're harassing people and we're going to come and we're sending the police."
Rob Collie (00:28:08): We need to make a gif, one of those high quality Reddit gifs of the Goodfellas scene, he's walking in to be supposedly to be made. I'm Joe Pesci. And I'm like, "Oh yeah, I'm going to get my check mark today." And I'm talking to Tom, Tom's like, "Yeah, I remember when I got my check mark," and then I walk in the room and I see no one there I'm like, "Oh shit." And they shoot me in the head. You've been dabbling a little bit in this ML thing lately. Have you been using any MLOps services or you've been staying kind of in the ivory tower, it's inventing the future, but keeping it on a paper.
Thomas LaRock (00:28:44): First, I want to say before I forget, that algorithm as a service, would we pronounce that as aas?
Rob Collie (00:28:50): Apparently. [Crosstalk 00:28:52]. So these guys [crosstalk 00:28:54]. These guys are very successful. [Crosstalk 00:04:56] I wouldn't say that they got there on the basis of their naming. I remember-
Thomas LaRock (00:29:05): I just want to know. I just want to know when people say, "What do you do?" And you say, "I sell aas." Is that a thing?
Diego Oppenheimer (00:29:13): Can get in trouble.
Thomas LaRock (00:29:15): I hear there's a very...
Diego Oppenheimer (00:29:15): It's not legal, not legal in all places.
Rob Collie (00:29:18): But there's an age old demand marketplace for 'aas'. It's...
Thomas LaRock (00:29:21): There is. Right?
Rob Collie (00:29:24): It's never going out of style.
Diego Oppenheimer (00:29:26): People ask me like, "What's like the one learning you had from starting a startup and what were y'all doing?" And I say, "Look, don't ever start a company that doesn't start with an A. You're at the top of every list. It's alphabetical. No matter what, they're going to do the top 50 companies working on blah, blah, blah, the top 20 company. Guess what? The, A name starts first." So that's my, like [crosstalk 00:29:46].
Thomas LaRock (00:29:46): You got a triple A right there.
Rob Collie (00:29:47): Tom, we'll come back to this question. But while we're on the topic of naming, I had a conversation, Diego, when you were going off to do this with someone we both know, I'm not going to dox them. and we were talking about you going off to do this. And this other person said, "Yeah, I have no idea. I don't understand any of that. Rob, we're dinosaurs now. Diego's probably going to go off and be very successful even though we have no idea." I was a little bit more of a believer. I'm like, "Yeah, yeah. I think they're going to be very successful." But then the name of their company sounds like a heart irregularity. I'm not in on that. This didn't hold you back at all. I'm a bit of a naming snob I've become over the years.
Thomas LaRock (00:30:27): So speaking of names though, when I saw the name Diego Oppenheimer, I'm like, "That's an unusual combination." Now you mentioned Uruguay and I'm just going to guess, you look like you've played rugby.
Diego Oppenheimer (00:30:37): I do.
Thomas LaRock (00:30:38): That's what I figured. Although I cheated, I saw it on your LinkedIn profile, but yeah, I'm getting now, this will be the guess. Center back.
Diego Oppenheimer (00:30:48): I actually play in the front row. I play what you call hooker, which actually goes well with us 'aas' a service part of the conversation, if we're trying to go full circle.
Thomas LaRock (00:30:54): Yeah. So he played hooker and I'm familiar. I played rugby. So I was second row and wing. If you can believe that. So I'm familiar with-
Diego Oppenheimer (00:31:02): That's a combo right there.
Thomas LaRock (00:31:03): ... the hookers, the props, the loose ends. Anyway, sir, as Rob mentioned, I've dabbled a little bit in the past. So let's say going on six years now, I decided I wanted to kind of pivot if you will, my career focus as a data professional and get more into the data science. I have a background in mathematics. And back when I was in school, if we had a career path or an educational path that said data science, I would've been all over it. Back in the day that really didn't exist. If you were doing math and stuff, you might, as you said, get a job as a quant. So, and actuary, things of that nature. But yeah, data science, I was before that time. So you mentioned service Rob. And what I'll mention to you is in the training that I've gone through over the last six years.
Thomas LaRock (00:31:50): So Azure had, I want to say, Azure machine learning studio, the original ML studio made it ridiculously easy for you to consume data, build a model and publish a service API. Ridiculously easy. I still remember doing that in order to pass one of the edX programs where I had to build something, I had to pass in data. I think it was images of bees, pass it in and rate back whether or not it was a bee or a flower or something like that. And or what type of bee it was. And it was just so easy.
Thomas LaRock (00:32:27): So here's the funny thing. I don't want to be bashing anybody, but now they have an updated ML studio, the Azure ML studio that's out there now. And all of a sudden it isn't as easy to publish at endpoint. I'm not even sure if it's possible right now. It's like something they just sort of said, "Ah, we'll get to that later." And then when I started researching it, go, "Hey, how come I can't do this simple thing I used to do before?" They were kind of like, "Well, you can, it's just hard. You got to do a lot more code." And what Diego talked about was that simple thing of you got people who could program and do the analytics using this code, but they didn't know software. They didn't know how to build a service that could be consumed by others in scale and all that. And that first iteration of ML studio got us there. And then I feel like in some aspects kind of a step or two backwards.
Thomas LaRock (00:33:13): Not that it's not a good service ML studio right now. It's just that some of the functionality, I think they missed that point of getting that into the hands of the people, these data scientists, in order to build something that then can be consumed publicly, privately, at scale. So anyway, the short answer for me is, yeah, Rob. Yeah, I've built a service.
Rob Collie (00:33:34): Oh cool. Well then Diego must have in a shady K-street deal, sent his lobbyists to talk to Microsoft and say, "Hey, well, my clients are a little bit upset. A little bit dissatisfied with how easy you're making things. We need you to put up some a little bit taller ramparts, if you will."
Thomas LaRock (00:33:53): Oh, I did think of that. We might have to cut everything I said, I did realize I was getting in on the Goodfellas deal.
Rob Collie (00:33:59): Yeah, everything needs an upgrade, Tom, just that sometimes, upgrades aren't an upgrade.
Diego Oppenheimer (00:34:09): When you make things simple, you usually, when designing to make things simple, you remove flexibility. When you're making things a lot simpler, you're trying to put like the happy path. You kind of remove flexibility. When you build for flexibility, you tend to sacrifice the simplicity. The balance is really hard. I think if you go look at Azure ML is a good example of that, the pendulum has gone back and forth. I think we're on iteration number four. It's a really good product. It's a lot of smart people working on it, but you've probably seen the pendulum go back and forth a couple of times where it's like, "Is this a pro tool? Is this a everybody tool?"
Diego Oppenheimer (00:34:49): I think if you start looking at it in the world of Microsoft, like again, from my outside view, I have no inside view of this, right? Other than my opinion, I can see how they think they're moving the world of power BI to be more the simpler tool. Power BI is going to end up being much more than BI. They've already started absorbing predictive analytics into it. It's already being built and absorbing Azure machine learning functions into it. So the simplicity of building ML services is starting to surface in what I called the simple tool. The Azure ML tool has gotten more in the like, "We want to be more flexible, but we're more the pro tool."
Diego Oppenheimer (00:35:24): I think that's where you see kind of the duality of that ecosystem start to play out. It's really hard to build the same tool for both. That said, just because I have to do the plug for my new employer, you should totally check out DataRobot because it's actually amazingly simple to go from a data set, to building a model, to deploying it all through the UI, or code, if you want to. You can see where a company's dedicated most of its time to solve that problem. The speed of data to insight and kind of publishing that. So anyway, that'll be my last plug. Maybe not my last plug, I might write a line, but you should check it out.
Thomas LaRock (00:35:59): I want to hear more about DataRobot. I'm very intrigued.
Rob Collie (00:36:04): Yeah, what is podcasting for, if not for guests talking their own game.
Thomas LaRock (00:36:09): I absolutely want to hear more about DataRobot right now. I'm all in. I saw where you worked. I'm like, "DataRobot, I've heard of you. I know that."
Diego Oppenheimer (00:36:19): Yeah.
Rob Collie (00:36:20): Let's slow down for a moment and talk about that simplicity and complexity curve that comes up over and over and over again in software, whether we're aware of it or not. So as much as power BI, so I want to almost like take exception to something you said, but not really. I think we're going to find ourselves in agreement as usual, but power BI kind of wants to be a simple tool, but it is too damn flexible to be simple. It's just too good to be simple.
Rob Collie (00:36:44): Like take the DAX language and the M language, for instance, both of those things, in my opinion, just they belong in the all time software hall of fame for end to end capability. They're amazing. You almost never fall off the end of those two things in terms of what their intended usages are, anyway. You don't really use DAX for ML, but within its space and it actually overflows its banks really effectively, like you can end up doing things with daks that were never, ever, ever imagined by its inventors.
Rob Collie (00:37:12): What will happen though is that people who are transitioning over from something like Tableau or some other tool where they had this one thing that that tool made super, super simple for them, it's just a click. And it happened to work. They happened to be lucky that the simple one size fits all flavor that was offered by that tool actually 100% fit their business needs. They don't realize how lucky they were that they fell into that bucket. And then they get over to something like DAX. They're like, "Why is it so hard?" Again, it's that trade off. It can do basically anything.
Rob Collie (00:37:41): So every time as a company that, this isn't the only thing that we do, but a huge chunk of our business is helping people get the most out of Power BI, every time we see Power BI adding some new like quick measure feature or auto this or auto that, we're like, "Oh yeah, bring it on." All it's going to do is seduce people into thinking that it's easy when it's still... Data is hard. It's just always going to be hard. There's always going to be something hard about it. Sooner or later, either you're going to learn it yourself and become super, super competent or you're going to need some help.
Rob Collie (00:38:12): I view a lot of those auto magic features that they keep trying to add to Power BI, I view them as more marketing than actual functionality for that same reason. Because DAX takes a lot of heat for being complicated.
Diego Oppenheimer (00:38:24): There's a lot to unpack there.
Rob Collie (00:38:25): Let's slow it down even further. We're going to slow down, slow down.
Diego Oppenheimer (00:38:29): I think there's combo here. So features are coming out, people are like, "Hey, this is complex." But there's also a whole new population of people who are actually doing stuff with data who didn't before. The barrier to entry is being lowered. It is complex and some of these things are complex, but you actually have a much wider population of people who are subject matter experts, because they understand the marketing data or the process in sales or whatever it is that you're working on.
Diego Oppenheimer (00:38:52): Suddenly they're given a tool where they can actually be productive with, but there's a learning curve for them, but it's not only a learning curve on the tool, it's a learning for them on the understanding of how to work with data. We have a lot more people in the world who can actually do that now. This is actually kind of one of them I think, it's exciting about these industries is that there is a reasonable thing. I do buy into this idea that every single information worker can be a data person with the right tool and the right education, the right kind of approach to it.
Diego Oppenheimer (00:39:24): When you have a generic tool like Power BI, where the population's getting wider, faster that they can build, there's more people coming to it from different experiences. There's like the hardcore people writing DAX all the way to the, "This is the first time I've worked with data." As that audience gets wider, that problem that you see, which is like, "Hey, how do I use this, this is too complex, this new feature." I think that's just because you have a wider range of skill sets and the population you're serving, because you have more people who are essentially signing up for being data people in these organizations. So, that's just my kind of uber comment on the whole thing. I think you have a wider population, a wider skillset. This is overall a positive thing, but it also makes it hard to build for a widening population.
Rob Collie (00:40:06): Of course, you and I know that most of those people that were saying that are sort of newcomers to this game, they're coming from Excel. I agree with you that information workers, almost all of them, can become data people. But again, there's that interest factor, there's the enthusiasm factor. It either speaks to you or it doesn't to be using tools to get better at it. That's what I think is actually the data gene is more the enthusiasm for it, rather than the capability. I think it's much more of an enthusiasm thing, and not everyone's got it.
Diego Oppenheimer (00:40:37): Yeah. There's a career to be made about it. There is enthusiasm out, but it's also kind of critical. I mean just think about we rewind to even right when you started, right? DBAs were still a thing, right? That was the-
Rob Collie (00:40:48): Burn, Tom! I'm just kidding.
Diego Oppenheimer (00:40:51): Right? I mean me too, but I'm just saying, and I'm not saying that DBAs, I mean their rule has evolved so much. You're not this person stuck in a basement in IT managing fields.
Rob Collie (00:41:01): Now you're speaking Tom's language.
Diego Oppenheimer (00:41:03): Managing fields in a database. I mean maybe they do exist, but I mean, it's pretty rare to see them. They've evolved way more into data engineering principles and to building this. So the people who actually are working with data has expanded quite a ton, right? And the bear eventually has been lowered quite a bit. Did I offend everybody on the podcast?
Rob Collie (00:41:21): No I think it's- no, no.
Diego Oppenheimer (00:41:22): Is that-
Rob Collie (00:41:24): I mean I saw Tom go through an entire range of emotion.
Thomas LaRock (00:41:27): Did I get a jackpot up there.
Rob Collie (00:41:28): He went from one end of the spectrum to the other in like three seconds. It was awesome. It's called range.
Thomas LaRock (00:41:35): Yeah.
Diego Oppenheimer (00:41:36): Yeah, range. There you go.
Rob Collie (00:41:38): Tom explored the space of his own emotions in that moment. Next up cowbell. Yep. All right. So we slowed down the slowdown. Now we can come back to the original flat circle. When did DataRobot come onto the scene? When did you end up working there as opposed to Algorithmia?
Diego Oppenheimer (00:41:56): Yeah, so DataRobot acquired my company in August of 2021.
Rob Collie (00:42:03): Not that long ago.
Diego Oppenheimer (00:42:04): So, this is pretty recent, but DataRobot's history has come into the scene, so DataRobot started in 2012. It was started by two data scientists who work in insurance industry, these original what I would call actuarials plus plus who do data analysis on kind of risk profiles and building it. One of the things that the core concept behind the company, it was started as what is called an automated machine learning company or auto ML. What auto ML does is a series of techniques where you grab the data and you look for best fit of that data using a bunch of different algorithms and techniques for a optimization that you're trying to do.
Diego Oppenheimer (00:42:42): So let's just say, I'm looking for a model that's going to predict in the best way if this person's going to churn or not. Okay, we're going to grab a whole bunch of the data, what's the data that's relevant, what's the data I have, apply a whole bunch of techniques and go look at the training set, what that's going to do. And they were kind of like the, to a certain degree, if not the inventors, the very, very early pioneers of this auto ML concept. It's interesting because the auto ML concept really opens up to a new population.
Diego Oppenheimer (00:43:09): So we were talking about widening populations. It opens up to what we call the citizen data scientists. I'm not a huge fan of the naming convention. I would had invented this, why I'm not in marketing, but like the core concept of it is like, "Hey, there's these BI plus plus people who are statistically inclined, who understand some of the math of it. And if they were given the tool, they could build machine learning models with their subject matter expertise in a much faster way. If we just give them that tool, we can expand that population."
Diego Oppenheimer (00:43:37): So versus looking at coding, machine learning engineers, or even coding data scientists to a certain degree, we can expand this to people who get statistics and our subject matter experts, which is not super different than what we were thinking about when you starting kind of the Power BI and Power Pivot journey, which is, "Hey, can we turn any Excel person into a more advanced data analyst?" So kind of that same motion again, playing out. So that's kind of the origins of the company.
Diego Oppenheimer (00:44:03): The company since then has grown quite a bit and now offers tools in the entire life cycle of machine learning. It's a true platform. I joke around because it's like, I'm almost playing the same movie again, right? There's the tool to create [inaudible 00:20:17] to create the models that you had, like Power Pivot, and then those models need to be published and certified and governed. So they need to go back into analysis services. Well, guess what, we now have a tool that builds models automatically and there's its called concept of MlOps, which is our productioning and governing. So that same movie is playing a little bit again, but it's interesting because I think it's the next evolution. It's the next iteration of this same paradigm of why didn't the population get people to be able to work with their own data and be subject matter experts in the predictive world and then managing that whole concept.
Diego Oppenheimer (00:44:49): So that's kind of what DataRobot does. So we have a service you can go sign up for, datarobot.com. You can essentially like upload your data set, even if it's like a basic CSV and you can actually start building machine learning models within seconds and kind of testing those out and building out. So it's really opening up the aperture of people who have accessibility to be doing these kind of predictive analytical workloads.
Rob Collie (00:45:10): That auto ML thing that sort of the original origin of DataRobot. If I understood it, let me see if I can play it back to you. I might not have understood it. So we'll see. I give it a data set and say, "Hey, I know you got lots of algorithms, different kinds of algorithms, different strategies that might be able to predict things based on a training set, but I have no idea which one of those is going to be best for my data. So I'll just sort of give you the inputs and the outputs. "Given these inputs, I need to predict this." And just hand it to you, and it shows them tries lots of different algorithms and sort of like almost has like a competition behind the scenes, and figures out which one is the best fit so I don't have to know going in which one to try. Is that basically it?
Diego Oppenheimer (00:45:52): You got the concept pretty quick.
Rob Collie (00:45:53): That's awesome.
Thomas LaRock (00:45:55): But here's my question about that. So I've used the auto ML with Azure. Azure has auto ML and I haven't used the one for AWS yet, but I intend to compare and contrast the two services. But here's the thing. When I'm building models, when it comes to say the scoring, usually you're building three, four models, right? You score and then you combine the results in order to come up with something that might be better than if you had chosen anyone. Are you doing that as well? Or is the end result, it's actually a blend of models that work best, it's not just one [crosstalk 00:46:29]. That's what I wanted to say.
Diego Oppenheimer (00:46:31): It can be. Yeah. There's multiple double clicks into that. So like this is like basic concept was what Rob described. Say like V1 was that. Now as you evolve into that space, there's what data's important, what's not, can you automate the cleaning process, "Hey, so for the decisions for the model and for this data, all of this stuff in the data set is completely irrelevant. Get rid of it." Now I got a performance game. "Hey, by the way, these four things," what are called features, right? There's the whole concept of feature engineering. If you think about it in the table format, what are the columns that matter? Did I buy something in the last three months, is the number one predictor of, "Am I going to churn as a customer or not?" So that is an important column, but what my wife's maiden name is or whatever, like is not.
Diego Oppenheimer (00:47:09): So you can start cleaning up the data site based on importance. A lot of the concept behind machine learning is you're teaching a machine to recognize a pattern based on patterns it's seen before, so that when it sees something it's never seen before, it can assign a probability of what it thinks it is. I mean like that's like the ultimate oversimplification of the situation, right?
Diego Oppenheimer (00:47:29): So like, I'm saying, "Hey, I've never seen the sample before, but I think it's this based on all these other patterns." So especially when you have a data set where you already know the results and you have the data that you started with, now you can actually be like, "Okay, well, how did it perform against that?" Back testing, like how did I perform against that. You can actually start. So now your techniques, there's an ability to grab those techniques and actually verify which model worked best, what combos worked best, what ensemble of those models worked best. And then what data they'd need. So you can start automating a lot of these pieces as you go, once you kind of get that core concept.
Rob Collie (00:48:03): Wow. This blend thing-
Diego Oppenheimer (00:48:03): As you go once, you get that core concept.
Rob Collie (00:48:03): Wow. This blend thing is fascinating. I'm imagining, like I'm sitting in a room with my six trusted advisors, and they're each named after an algorithm. They each are an algorithm, and I keep throwing a customer up on the whiteboard. They all vote, whether they're going to flee or churn or not churn for instance, right. I put up a customer and one of my experts goes, they're going to churn. And I look at them and go, Jimmy, you always say that about this kind of customer, I know not to trust you. So it's like a meta ML on top of the other algorithms. That's deciding which algorithm is most trustworthy in the circumstances based on history, is there something like that going on?
Thomas LaRock (00:48:39): It's more for an average I would say. Like your six advisors, you would say, okay five said yes, and one said, no, so that's 83% chance.
Rob Collie (00:48:46): I see. I've heard about lots of things like this in the past, like in the space program or in the military, like really mission critical things. They'll always have three different algorithms calculating the same task. And when they disagree, it's a vote between the three, there's always an odd number of algorithms calculating the same thing, so that if one of them has some sort of eccentric bug and some sort of circumstance, hopefully the other two, like maybe the Boeing MCAS system should have had that triple redundancy.
Thomas LaRock (00:49:12): Well, maybe it did. I don't know.
Rob Collie (00:49:14): Yeah, two of the systems say crash the plane.
Thomas LaRock (00:49:18): Yeah, right. We don't know. [crosstalk 00:49:22]we're going to get sued. Boeing is going to Sue us.
Rob Collie (00:49:26): Oh, that would really put us on the map, wouldn't it? It's like the original Air Jordan's banned in the NBA.
Thomas LaRock (00:49:35): That's right. That was a smart move.
Rob Collie (00:49:37): Yep. Yep. He would've been a terrible basketball player if they hadn't banned those shoes, he would've never been famous. They really messed up there. Okay, so DataRobot obviously they were, when they acquired arrhythmia, they were looking to bolster their offerings in that space. I've personally never been acquired. Obviously there's lots of things that are private about that. What's that been like being absorbed?
Diego Oppenheimer (00:50:01): It's been super interesting. I think we were lucky in the sense that we had approached the same problem from different angles, so our products were really complimentary. We could literally stack the products up. There's a bunch of different kinds of acquisitions like you and I have probably looked at a couple of the acquisitions at the Microsoft world, from the inside and you either buy market share is one way of doing acquisition, you're trying to buy a market. So you're buying the customer base and you're buying the contracts and you're doing that. You're buying a technology, you care about this technology because you have a gap in your technology and it's faster to purchase it and integrate it than to get there yourself. And then the third one is where you're just acquiring because you have people that it's faster to buy people who are smart and do things and to get there.
Diego Oppenheimer (00:50:46): That's the more traditional Aqua hire where you get a group of people who know how to work together, and it's like, Hey, I can put a team on this. We were in this second category, which was, there was a gap in the market for them and the product and we filled that gap really quickly, the products worked together. And so one of the interesting things is we talked a little bit about how algorithm itself had really focused on this kind of like deploy software engineering principle, the op side of getting the stuff up and running, and that was the hard part. Well, DataRobot had look at the, what we call Mo ops from a very data perspective and the loyalty data science, things that matter are like, Hey, what happens when my data changes, I need alerts on that.
Diego Oppenheimer (00:51:24): Like, Hey, I have a model, but now suddenly all the data that was trained on has changed it's called model drift or my model is not as being as accurate anymore. So one thing that we didn't discuss here, there's a core difference between coding in the world of machine learning and coding in a traditional world is that, in the traditional world code is deterministic, right?
Rob Collie (00:51:42): Yep.
Diego Oppenheimer (00:51:43): It works just as code. In the world of machine learning it's probabilistic. So it could be different, the results can be very, very different because the data has changed or the statistical probability changes, but the code has not changed. So you have to adapt your systems to say, Hey, the nothing changed, but something changed because the data that's going through these systems is quite different. So it's that difference between deterministic and probabilistic.
Diego Oppenheimer (00:52:07): And so DataRobot had built technology for the, what I'd call the data scientist to understand those kinds of things like data drift and model drift and things like trust and bias and AI and a lot of that is like, Hey, if I train a model, that's only seen people of a certain ethnic background and economic social background, it's just going to make predictions inside of that. They had a whole lot of tools built around how do you balance data sets and how do you things that, so we were on this IT side of things, they were on the more like science side of things, and when you combine the two things is really what you needed. So that was like the Genesis, when they approached us, had a couple conversations with their CEO and their head of product and said, Hey, this is our vision, this is how we see the world working in the world of machine learning, this is what we want to be as a company.
Diego Oppenheimer (00:52:52): And I got pretty excited because a big part of that mission was very similar to those original vision documents that went around before the Genesis of power pivot, which is, Hey, we have this tool, we can make everybody who has Excel at data analysts, we have the server products to operationalize this whole thing. This is a 10 billion business for Microsoft right now. It was a $0 business when we were talking about it. And there was a question, is it going to be a real business, I think you and I probably believed and it was always going to be a build business. But I think it's amazing.
Rob Collie (00:53:24): Well, I didn't know. I didn't know. I was doing my usual cavalier software engineering thing. Like, Hey, this is fun, and I believe in the mission, but it's not going to really be that big a deal, is it? And then when I went out and used it on my own for the first time, and I was just like, oh my God, that's when I really knew. But you were still at Microsoft, you and I got to know each other, I think primarily in that era after I'd already completely lit up with it going, this is going to be world changing.
Diego Oppenheimer (00:53:51): Yeah. Well, I bought into this idea that Microsoft was going to be the BI player, like the.
Rob Collie (00:53:59): Yes.
Diego Oppenheimer (00:53:59): It only took 10 years, but top of the magic quadrant, whatever you want to call it, if that's important still, it's at the top now it only took a decade. I saw a lot of similarities. There's a lot of emotion that goes behind, like selling your company, especially something that you worked for seven years, you were the CEO, I had 60 plus employees, there's whole thing around this that you go through a little bit of Jekyll and Hyde like every day, this is great, this is terrible. Oh my God, what am I doing, so there's a lot of like emotion attached to that.
Diego Oppenheimer (00:54:25): But in general, I think the biggest thing for me was I saw the same ingredients of being able to actually do almost the same movie, but again, in what I would call the future of data, which is this predictive analytics. And so for me, that's what got me really excited about the acquisition and going through it, and then ultimately where I was like, if we have a shot at building that system that we built that Microsoft again in this space, like sky's the limit. And I still believe that, obviously I'm six months in, but I really believe that.
Rob Collie (00:55:02): The part about being very emotionally attached to the company and all of that, I can understand that for sure. The other thing though, is that the big change for you is that you've gone from being in charge to now you're back to being part of a bigger organization. I'm wondering how that feels?
Diego Oppenheimer (00:55:18): I have a pretty strong personality and I'm very opinionated and I've never been shy about that. I never really cared for being a CEO, I just like getting shit done. That was my attitude at Microsoft, this is my attitude started the company, I really like getting stuff done. And obviously I don't want to work for assholes and lucky enough that I will never need to work on something I don't believe in or want to work on, ever again. And that's awesome. That's to me is the ultimate privilege of being able to just work on the stuff that you want, that's winning in my opinion, like things that expand your horizons and want to do it.
Diego Oppenheimer (00:55:50): I get along really great with my boss, who's the CEO of the company. There's a ton to learn from him as a person and what he does, and I have empathy for his role.
Rob Collie (00:56:00): Yeah.
Diego Oppenheimer (00:56:00): And what he does. And I can see it, and I know the parts that suck and the parts that don't suck and like what you have to do. But to me being CEO was just a consequence of what needed to be done at the time not the role I pursued. I didn't leave Microsoft to be my own boss. That was never even like, and the top 10 things of things I cared about being my boom boss was not one of them.
Rob Collie (00:56:21): Rewinding now. At Microsoft, you didn't seem terribly overwhelmed or intimidated by the management structure above you. And that's not just me politely saying something like you never listened or something like that. No, you weren't that way, but you just floated along, did your thing, people didn't really get in your way.
Diego Oppenheimer (00:56:37): If you could show results, people get out of the way, that was the thing, so I always just find situations where I can actually produce the results. And as long as I can produce the results, people just get out of your way.
Rob Collie (00:56:46): Good strategy, yeah. Don't interfere. Don't go get in his way, whatever you do, just stand back. You've got that track record. You're not going to find yourself Chafing under the yoke so much are you, as long as people who are managing you are smart. That's cool to hear. So machine learning, we used to talk about AI, I know that we still do a little bit. ML is sort of like become the dominant phrase and there's some machine learning and it sounds like the majority of it, even in this conversation today, the majority of it is "just math", just the statistics, just the things that the actuary plus plus crowd was doing, and that's different from the things that intentionally try to mimic biology. Like neural networks are a different thing, genetic algorithms are a different thing.
Rob Collie (00:57:37): And those were the things that when I was in college in 1993, those things were exciting. I took a neural net worth course from the psychology department, the computer science department didn't offer this, the math department didn't offer this, this was only in the psychology department. And you could tell that the professor teaching this course and the psychology wasn't regarded as one of the reindeer games crowd in the psychology department, this was a black sheep of the psychology department. Like, oh yeah, that's that guy again, he's off doing that neural networky stuff in the computer lab, like why can't he be doing the same things with the rest of us, like attaching electrodes to people's palms and showing the movies and seeing how they were respond that's psychology. Then along the way, I also encountered SSAS data mining.
Rob Collie (00:58:22): And I think a lot of the algorithms that SSAS data mining was pushing back in the day are some of these big, effective ML algorithms of today. That product had a really hard time gaining any traction. It might be that really we didn't understand it at the time, but it might have been just exactly this productionizing thing. Predictive algorithms are generally at their best when they're operating on very, very, very low grain, highly granular data, and making very specific judgments about individual records essentially, you can generally speaking almost immediately put those judgment into production. They can go and result in action. You had to wire these things in, it wasn't a great fit to like publish reports from these.
Rob Collie (00:59:05): But even then though, that whole corner of that product was just so neglected. It was always struggling for attention. And now we have ML, it's everywhere and we've got AutoML and MLOps and S algorithms as a service, we have all these things. Do you have any perspective on why, maybe it's just all the things we've already talked about, but why SSAS data mining didn't end up being like the apex predator in the ML world? Because they were first damn it in big ticket software they were first, right?
Diego Oppenheimer (00:59:39): Yeah.
Diego Oppenheimer (00:59:39): So I can't comment on that, like the product at all to be honest, because I don't know enough about it or the history or anything, but as an industry in general. So first of all, it's like, Hey, I took in 1993, I took this, well guess what? The most common machine learning algorithms, they haven't changed since the 80s, if not before that. The common algorithms in machine learning are linear regressions, logistic regressions, vector machines, like Naives Bays, K-means clustering, Random Forest. There's not like some like super thing that got invented, there's like gradient boosting algorithms, these are all [crosstalk 01:00:11]
Rob Collie (01:00:11): I have to say XG boost.
Diego Oppenheimer (01:00:12): Yeah, yeah. Like these are statistical methods that have been known for ever since we've had computers, like the algorithms in themselves have existed. The things that have changed are the quality of what I call signals data. So all of these algorithms are based on signals, and how do you process those signals and how do with them. And so the quality of the data has changed because you can prepare it, you can reduce it. And the availability and effectiveness of compute has changed. I can do more, and so like neural nets have existed since the 80s. Why we're at neural nets anywhere? Well, because we couldn't actually get the computational power to be able to even look at all the data that they needed to be looked at to actually do the learning that is needed behind them. The methodologies that we use for computer vision today have not changed since like what they call the AI winter and the 80s where like all this stuff came out and people were like, well, how do I use this, well, you don't have the computational power, you don't have the data. It was insanely expensive just think about storage.
Rob Collie (01:01:16): Yeah.
Diego Oppenheimer (01:01:16): Like compute processing, all this stuff has essentially in the last 10 years become like a tax on electricity is about the cost of doing computation at high scale. Think about CPU power, GPU power, like storage, networking, like all this stuff that one the, you can go today and spin up the computing power of the entire national lab would've had 20 years ago on a $700 AWS bill, if you wanted to right now.
Rob Collie (01:01:43): That's crazy.
Diego Oppenheimer (01:01:44): And so like that's one big differentiator, same method, same algorithms with much more computational power, one big differentiator. The other one, I talked about quality of signal. Everybody talked about big data, we were all in it. I'm sure you and I were in the discussions where like, we probably had to stop some executives at Microsoft being like, don't call that big data, that's just a large Excel file, let's not big data, I'm sorry. I remember stopping a demo, a very public demo because like, please don't call this BigData you're just going to make an asset yourself.
Rob Collie (01:02:14): Wait, wait, which kind of that, algorithms of service or?
Diego Oppenheimer (01:02:17): Yeah, yeah. Not not that one. Right.
Rob Collie (01:02:19): Okay, the other. Okay. Okay. Yeah. Yeah.
Diego Oppenheimer (01:02:21): And so the core of it was the world of big data and where had brought us to, and a bunch of these technologies, we've stored so much signal. And so now these algorithms, now we have the computational of power with the application of these algorithms to be able to process that signal, we can actually do stuff with it. The reality is that we can now actually do real stuff with all that data that we collected. And then the final concept of that is that we've also created abstraction layers. Nobody goes in and writes their own k-mean algorithm. They grab a tool that will apply K-means and prepares the data and sets it up and absorb it. So like there's an abstraction layer on top of like just, I like forced myself into this hour.
Diego Oppenheimer (01:03:07): So the combo of those three things have made it that now we're rushing into this new era, and so a lot of that stuff to your original point didn't exist when the data mind. So we didn't have the computational power, the companies didn't have access to it, you couldn't apply the data, the algorithms were probably not as well implemented from an abstraction perspective, they were too complex. So if none of these three things on their own, you have this perfect storm of like, that's why the era of machine learning's been ushered in, right now is like, I have the computing power, I had always the algorithms, I have the abstraction layers and I have the accessibility and all those three things now combined actually allow me to do this stuff.
Rob Collie (01:03:42): That's a phenomenal answer. I really appreciate that. I just got smarter. I seriously learned something there. I appreciate that. It reminds me of probably like a red thread or a Quora thread or something. But like someone asked, if you could go back 200 years in history and take with you all, like 100% of modern society's knowledge, the country you pledged your sword to what they like dominate. And the answer is no, you could go back with all that knowledge and you couldn't even make a pencil because all of the infrastructure required to make a pencil isn't just going to like materialize out of thin air. I think it's in guns, germs and steel or one of those.
Rob Collie (01:04:22): At one point in time in a steel mill, the only thing in the entire steel mill made out of steel was the thing that was crushing the rock and the rest of it, was it out of wood. Everything in there was made out of wood because they didn't have any steel, you had to bootstrap. And it took a very, very, very long time for the society around you to, it sounds very familiar, you can have the knowledge and also harkens back to the future's been invented, but it's stuck in a paper. There's a lot of supporting infrastructure that needs to come into play.
Diego Oppenheimer (01:04:54): It's that gap between idea and execution.
Rob Collie (01:04:56): Which is a huge gap, huge gap, ideas are worth nothing it turns out, as a young nerd, you think ideas are everything. Nope, nope, nope. Cheapest thing on the planet. I got to ask, of all of the workloads that are running through DataRobot right now, what percentage of those workloads ultimately are running on GPU versus CPU? Just a rough guess?
Diego Oppenheimer (01:05:21): So I don't know what the split is to be on, I wouldn't even know where to guess. I know it's likely can be over rotated to CPUs by a bunch, but here's the reason why. That AutoML strategies and the modeling that's been in evolved over the last 10 years has been like hyper optimized for being super efficient, cheap, fast, and light to run on CPUs. There's been a lot of work that's been done to get those things like just, they scream in a good way on CPUs. And so CPUs in general are like four times cheaper than GPU. The reason why they use GPU in a lot of cases, how deeply do you want me to get into [crosstalk 01:05:58]
Rob Collie (01:05:58): Oh yeah, let's go, let's go, let's buckle up.
Diego Oppenheimer (01:06:00): Yeah, okay. So like neural nets at the core of it are all about matrix algebra and GPU are simplified processing units. But CPUs can do more instructions than GPUs, but essentially you remove all the fluff on GPUs, this is why they're good for graphics. But at the end of the day, GPUS are really good at doing matrix algebra, which is very, very fast because they don't have to do a bunch of the other operations. So because neural nets, which are very good for unstructured data, so bunch structured data being like voice, video, images, like that type of stuff, freeform text that instruction set allows them to be a lot faster at computation, which gives you efficiencies, but they're a lot more expensive. And so in a lot of cases, when you're making that decision between CPU and GPU, it's like, okay, does the actual thing I'm doing is it worth paying four times more to get it an X amount times faster, sometimes that X is 60 X, sometimes that X is two X depends on the methodology.
Rob Collie (01:06:54): Let's pause on that point, because I thought the whole reason that GPU exist is because they were cheaper for the thing that they were made to do. Why not just throw a whole bunch more CPUs in my computer to render the graphics. The GPU still has to be more cost effective. Is it just the crypto craze that's making GPU more expensive? Is that what we're talking about or have they always fundamentally been somehow more expensive in some dimension I don't understand?
Diego Oppenheimer (01:07:19): I think it depends on the dimension that you're looking at. So like from a cheaper, like if you're looking for like flops, maybe like, and for specific instruction sets, but in general CPUs are cheaper than GPU period. GPUs are just faster and more efficient at doing the math that's necessary for computing this. So a good example, you talked about crypto, it would take 10, a hundred times longer to do in CPUs, and guess what? It's a race in crypto because if you don't actually compute the block, somebody else computes the block, you don't get the block. The cheaper part is really a speed, like I need to get there faster, otherwise somebody else takes my money.
Thomas LaRock (01:07:54): To me and this could be completely wrong. I've always understood as Diego was saying about matrix multiplication, but I try to make things simpler. And I just say, if you're doing deep learning, if you're doing image classifications, things of that nature, you're going to want a GPU it'll process things faster for you.
Thomas LaRock (01:08:14): But if you're not doing that, and you're just doing, let's say regular machine learning, chances are the CPU is going to work for how Diego says at this point, people have optimized code and you're really just grabbing a package and you're saying, KNN you pass in three parameters and things are just dumb for you, you don't even have to do it by hand anymore. There's just all sorts of packages that just run for you and in most cases by now have been optimized for performance. Not always, but sometimes, you can always improve things depends on the data that you're working with, depends on the problem you're trying to solve. But for me, that's where I start. I say, unless I'm doing deep learning, start with a CPU and see how it works. If I'm doing deep learning, I just go right to the GPU, I need a GPU to write.
Diego Oppenheimer (01:08:58): All added love level of complexity to that answer, which is, it's not always in deep learning either as like black and white, because you have two different things that you do in deep learning. One, which is when you're training the model, grabbing a ton of data and you have to actually essentially compute through the entire space and create that kind of like algebra where GPU's are like super effective. And essentially something that could take you 100 days on CPUs could take you 10 days on a GPU. So running a GPU for 10 days is cheaper than running a CPU for 100 days, whatever the math looks like, that's one potential. But then you actually have the scoring, the scoring of those models, which I could train a model in GPUs, get the model and now score it on CPUs because it makes the scoring acceleration that I get on GPU is only two X, and it's not worth the money.
Diego Oppenheimer (01:09:42): There is an actual pretty big depends, it is not a black and white in terms that, but in general, we talked about the time and place in the computing power, the GPU is being able to apply two deep learning algorithms is one of the reasons why we're in this era right now where so much of this deep learning is actually making it out because without those chip sets, we would still have just an algorithms with underpowered, like chips that can actually do this stuff and take years, now it can now feasible to train a model in hours or days and run them versus months and years.
Thomas LaRock (01:10:14): Okay. So how long before we get to the quantum chips?
Diego Oppenheimer (01:10:21): So I wish I knew more about like, I'll be the first one to admit that my knowledge of quantum computing is like superficial at best while I am understand general concept of it, my black holes of knowledge get pretty big in terms of how we implement. So I actually don't know the answer to that. I think we have an error in between, which is specialized chips, like Asics, where for specific machine learning tasks, we even remove more instructions than from the general GPU and we get faster, you specialized chips that allow us to do certain tasks a lot quicker. And I think there's a whole era there, which is kind of more where my head has been than the quantum because as I said, like, I think it's out there, I get the general concept behind it, but I actually don't know how to run the line between where we are right now or when content becomes relevance in the space.
Thomas LaRock (01:11:08): Diego, that's what I want. I'm want you to tell me when we will achieve quantum singularity?
Rob Collie (01:11:13): Yeah, no, no. So I think Diego's demonstrating that he has a perfect mastery of it. Quantum computing is something that you both understand and don't understand at the same time.
Thomas LaRock (01:11:24): There you go.
Rob Collie (01:11:24): That's how it works.
Diego Oppenheimer (01:11:25): Yeah. Good.
Thomas LaRock (01:11:26): Why can't we make a prediction? I just want a simple prediction when will the singularity happen? I don't understand.
Diego Oppenheimer (01:11:33): Yeah.
Rob Collie (01:11:33): Let me share a little bit of a personal sci-fi theory of my own about quantum computing. You've all heard the theory and oftentimes there's like compelling math behind this saying we're living in a simulation running on someone else's hardware. You've heard this thought experiment before.
Diego Oppenheimer (01:11:51): Yeah. I've seen the matrix.
Rob Collie (01:11:52): Okay. Well this, but then there's like, it's actually like a lot of like serious scientists sitting around astrophysicists and those sorts of people going. Yeah, it's probably more likely than not, when most of people start saying those sorts of things it's a little eye.
Rob Collie (01:12:03): More likely than not. When most sorts of people start saying those sorts of things, it's a little eye-raising. Uh-oh. So the way I think of it is that, let's say we're the ones running the simulation that the universe is running on, it's our machine learning algorithm. It's like DataRobot 20 years from now is unknowingly or intentionally running a simulated universe. Okay? And we've got these self-replicating patterns that are going on in this universe that are starting to think of themselves as alive. We're not really paying attention to that, because all we're doing is just using it for some purpose, it's helping us predict something. We have no idea that these things running in this simulation have actually started to develop any sort of self-awareness, right? And then one day we notice that the thing in the machine, in the software, has started to learn to execute instructions in the software in such a sequence, down at such a micro, micro, micro level that it's actually tunneling down out of the software layer, and tricking the hardware that's running the simulation into performing calculations on its behalf.
Rob Collie (01:13:01): It's going extra-universe. It's going outside the universe and tricking the base hardware into performing tasks, right? And that is the first time that we notice, the ones who are running the simulation, that is the first time we notice and start to think about, oh my God, this thing's quietly gotten out of hand, right? It's now exploiting the hardware.
Rob Collie (01:13:22): I think that the moment we start to do quantum computing in a way that's actually successful is the moment where the people running the simulation go, oh shit. Because that's the only way it works, right? You're actually using, essentially the equivalent of sending off your calculations into billions of parallel universes to achieve faster-than-possible calculation speeds. Because you're not just existing in one timeline anymore. It's really, really freaking bizarre. End of sci-fi theory.
Thomas LaRock (01:13:49): That's not how it works. [crosstalk 01:13:53] out into multiple dimensions.
Rob Collie (01:13:54): No, no. Tom, if you start to pay really close attention to stuff, that is exactly how it works. Or, it's something every bit is crazy as what I just said, but just different, even crazier. My explanation of it is the simplest version. There are even more terrifying.
Thomas LaRock (01:14:10): No. You should think of it as a folding-up map. Think of it as query folding, okay? Think of it as query folding. That's what it is.
Rob Collie (01:14:21): When something goes from can't-break RSA to can-break RSA, something very, very, very special and impossible has happened. And very famously. And again, I don't understand quantum theory either. I understand it to the level of like, could talk about it in my dorm room at three o'clock in the morning with my friends all going, whoa. That's my level of mastery, okay? So that makes me an expert. But there was a very, very, very specific statement, which was, I think it was Bohr that said this, "Anyone who is not shocked by quantum theory has not understood it." Quantum computing is going to be every bit as shocking in how it works as quantum theory. If you're pushing back on my crazy, crazy, crazy sci-fi explanation. Anyone that, that resists that, then you don't get it. You don't get it.
Thomas LaRock (01:15:09): I am. Cause Bohr was talking about quantum physics, not quantum computing.
Rob Collie (01:15:14): Quantum computing is exploiting quantum physics. That's what it's doing. It's not just they're using the same...
Thomas LaRock (01:15:20): That's not my understanding, no.
Rob Collie (01:15:22): Oh, it's totally it. It's totally it. They're using entanglement. They're using quantum entanglement. They're using all of the Schrodinger weirdness, the faster-than-light transmission of information, they're using all of that shit. All of those things. [crosstalk 01:15:36] All of those things that break classical Newtonian physics are all being exploited. Hell, you don't even have to go farther than this. Like quantum tunneling makes the transistor possible. Even the transistor is a violation of everything that we would consider normal and humane.
Thomas LaRock (01:15:51): Yeah. Right. It's crazy.
Thomas LaRock (01:15:53): And so is a bumblebee, but they exist too. So we're good, right?
Rob Collie (01:15:57): Yeah. It's all right. All of that Schrodinger cat, half- dead, half-alive, modern society is already built on it anyway. So, it's not even really worth arguing about.
Thomas LaRock (01:16:07): There's no argument here. We're just trying to understand what it is.
Rob Collie (01:16:09): I just explained to you what quantum computing is. It's tunneling into the base hardware of the universe and exploiting it for our purposes. Yep.
Thomas LaRock (01:16:21): Did you really just explain that me?
Rob Collie (01:16:21): Well in the dumbest-downed way possible. I mean, I just try to make it understandable, Tom. I'm not the one who's going to be doing it. Oh, hell no. I really don't get it.
Diego Oppenheimer (01:16:31): I'm kind of curious for Rob's next episode on string theory. I'm definitely tuning in for that one.
Rob Collie (01:16:38): Yeah, I don't get that either. I get that even less. Basically it's where all science devolves to magic, like literal magic. And non-determinism, we talk about that also, right? This is really amazing. If you examine two atoms of fissile isotope, they have a half-life in terms of how fast they decay. At the moment before one of them decays there is literally nothing we could ever tell the difference between the one that's about the decay and the one that doesn't. It's not like one of them starts to like vibrate funny or anything like that. It just spontaneously pops.
Rob Collie (01:17:13): And the information, the probability of when that's going to happen comes from outside of our knowable universe. Everything is weird, when you get down the fundamentals, everything makes no sense at all. It's just crazy. Quantum computing is where we're going to, if we're even able to do it, I'm not even sure it's going to work. If we are able to do it, that is the next huge breakthrough in like us breaking the magic. It's unbelievable that we're even trying to do this, and that there are people who actually believe it's going to work. I'm just like, okay. It's so weird.
Rob Collie (01:17:48): We're not used to things that happen with no cause, right? But it's weird. One isotope has a different half-life than the other. So there's something out there that's deciding that one of these is more stable than the other. Can't tell though. Now we've gotten really weird.
Thomas LaRock (01:18:02): So how does DataRobot solve for string theory?
Rob Collie (01:18:06): Yeah how does it, you know...
Diego Oppenheimer (01:18:09): I don't think that's part of our mandate here.
Rob Collie (01:18:12): So what are they called? The quantum CPUs? Is QPUs? Or what is...
Diego Oppenheimer (01:18:16): Well it's measured in qubits, right?
Rob Collie (01:18:19): Qubits, yeah. That's their capacity.
Thomas LaRock (01:18:21): It's the measurement.
Rob Collie (01:18:21): That's their capacity, right? I can't wait for the data center that's full of quantum... [crosstalk 01:18:27]
Diego Oppenheimer (01:18:26): I think, if I'm not mistaken, didn't AWS actually put the first, you know, rental...
Thomas LaRock (01:18:33): No, I think IBM had it before Azure did. Azure had one a few years ago.
Diego Oppenheimer (01:18:38): No but I'm just saying Azure, like AWS, just did it, yeah. It's called Amazon Bracket.
Thomas LaRock (01:18:42): Yeah. Azure's had it, and IBM had it.
Rob Collie (01:18:47): Are we going to be hearing about this incrementally forever? And we're going to completely miss the point where we passed, where it actually became truly productionized? Or is there going to be some moment where we go, oh my God, now it's real?
Diego Oppenheimer (01:19:00): It's actually interesting because the process of building algorithms for quantum computing is actually very similar to how you do this for machine learning. You essentially use these notebook experiences, right, where you can grab the data and mung it and apply the algorithms, send it to go compute. It actually runs it and brings back like the analysis. It's not super different from how the data science process goes today, but it's now with a different set of algorithms and a different computational factor under it. In terms of the actual process, it's actually quite similar, if not identical.
Rob Collie (01:19:36): A fellow student of mine in college, when we were learning about sorting algorithms and we learned about merge sort, he immediately went to the Spark mainframe and wrote a merge sort routine that forked the process on every merge, on every subroutine. And so he basically gained infinite, relative to what he was doing, infinite computing power, because that thing was just sitting there idle. He was basically turning a logarithmic time algorithm, an O(N log N) algorithm to one that ran in damn near linear time because he had infinite, it was almost like the CPU was free. He was just able to use more of it, right? I think there's something similar to how algorithms will be written, to take advantage of quantum computing if possible. It needs to be something that can be done massively at parallel. That's kind of how they work, but I'm way out over my skis again.
Thomas LaRock (01:20:25): Rob, that's just called Serverless. It's just magic.
Rob Collie (01:20:28): Is that what it is?
Thomas LaRock (01:20:28): Everything just works. You don't need a server at all. It's all good.
Diego Oppenheimer (01:20:32): That was another concept of something that showed up. So we were laughing about algorithms as a service, but essentially we didn't know it was called Serverless. We didn't know what functions the service was, but clearly the concept of grabbing a piece of computation or an algorithm, being able to create the computational paradigm, feed it data, compute, and then tear that down, that was algorithms as a service, which functions as a service, which is Serverless. And you know, I get all the joke, but I mean that's what it is, right? This is why nobody's ever going to hire me to do branding or marketing. I always like the just-in-time computing, that was actually something that made sense. Right? Because really what serverless is, it's really just-in-time computing. I get it up, I can do it just for the time, and then I get out. So it's like that concept. But again, nobody's hired me for doing marketing tasks, so here we go.
Rob Collie (01:21:16): You've done just fine without that. There was something you said very, very, very early in the conversation that your co-founder Kenny said, which is that descriptive analytics are great and all, but predictive is the future. I'd like to adjust that. I think it's part of the future. There's so many things that get labeled like 'the future.' And you have to, right? That's the simplest version of the sentence. It's the one that gets people's attention. It's good marketing. I still think 'the future,' if we want to use that same false dichotomy, 'the future' is still the world getting its descriptive analytics right for the first time ever. And that's a big part of our business here.
Rob Collie (01:21:53): Now the same time though, and we definitely do some ML work for our clients, people that work for us who are much more savvy with it than I, because I don't really learn new things. It's one of the hallmarks of confessionals of this show is that I'm a terrible learner. I'm not terribly interested in learning new things. I'm especially not interested in learning things where there's already a really established community of expertise because I'm always comparing myself to them. So DAX was great, because I got to run unopposed. There were no DAX experts.
Diego Oppenheimer (01:22:21): Small fish, big pond, or versus big fish, small pond type thing?
Rob Collie (01:22:24): Yeah. I would be very intimidated trying to join the DAX party today because of just how advanced everyone's gotten. My question is, even though our business is still, I think primarily in the of analytics business, and in the taking action on that, where it intersects the human plane and the ways that you do your analytics and the ways that you apply them. There's a real art to that that goes way beyond just the left-brain nature of computation and aggregation and all that kind of stuff. At the same time, I do have a little bit of FOMO. Most companies that are in the space of P3 have decided to just go and rebrand themselves as ML. Like the first paragraph on their homepage talks about AI or machine learning or whatever. And we haven't done that. We've reached the size and the capability where we could deliver on that promise, so it'd be a change in marketing message that we could back up. We haven't gone that way.
Rob Collie (01:23:21): Basically we're experiencing a gold rush of epic proportions within the descriptive analytics space. At this point in time, I don't see a reason why there's any limit to how big our company can get. Infinite demand, and an infinite number of professionals being manufactured, who aren't being properly valued, right, and so in a way our business model is connecting that demand with the supply. So should we spin up a second brand that's like a sister brand, and just declare it an ML shop? Because again, you are firmly planted in this other world. We both, back in the day, were more in the descriptive world together. You've gone off into a parallel universe, joke intended, and you're looking back in to the universe that you left behind, what advice would you have for someone like me? And actually all of our listeners, most of our listeners, are in a similar spot. Their center of mass is still descriptive, even if they're dabbling in the other things.
Diego Oppenheimer (01:24:20): I don't think it's a zero sum game.
Rob Collie (01:24:21): No, it's not. Right.
Diego Oppenheimer (01:24:23): Right? It really isn't a zero sum game. Right? So I think descriptive analytics are just as important as ever, if not more. And just because predictive analytics are where a lot of the new innovation and stuff like that. It's a well-known problem, and it has a definition and it's super important and more people need to get there, but there's also this other thing, right? So it's really one plus one, not if you do one or the other. What's going to happen, there's going to data-first companies in the future. And then there's going to be the companies that don't exist. Every single industry, every single company is going to be powered by signals, and the data that you collect, and you have two factors to that, right? I need to be able to analyze the past and ask questions and have hypotheses and go be able to answer that, which is really what BI is really good about.
Diego Oppenheimer (01:25:14): At the core of it, what BI allows me to do is ask a question, have a hypothesis about it, and then go answer that based on the historical data that I've actually seen. What happened here, what happened there. There's a storytelling aspect to it, but it really is around I start from a hypothesis. I have a question. You probably get this even more than I do. I used to laugh, right? Especially when BI was a new concept, it's like, I just want you to make my business intelligent. I'm like, I don't even know what that means and it's in my title, we're called business intelligence. I don't know how to make your business intelligent. What I can tell you is like, what do you want to answer? What questions do you have? What hypotheses do you have about your business? And then let's go find data that either confirms or dissuades that hypothesis that you have so that then you can actually go make business decisions.
Diego Oppenheimer (01:25:53): Because at the core of it, it's all about decision-making. Descriptive analytics and predictive analytics have two completely, not completely different, but two parallel paths in terms of how you do decision making. Because you're going to go make investments, you're going to go answer questions, and you need BI and descriptive analytics to be able to go make those bets on your business, how you're going to go approach it, what you need to do. But also the predictive part gives you like, well, what happens when I don't have the historical data? What happens when I'm in the net-new situation? and the world of real-time data and the world of changing and everything being online and everything big signals tells us that the unknown space is just as big, if not bigger, than the known space of my business. How do I actually address both?
Diego Oppenheimer (01:26:38): So these are completely complimentary technologies in my opinion. I think a lot of companies are like ML's hot, every single boardroom's like, how do we add AI to it, it's the same movie. Right? Which is like, how do you make my business intelligent? And I'm like, I don't know what you're talking about. What I can tell you is how to optimize a process, how to do better decision making using machine learning techniques. It's an implementation detail, not a goal state. The goal state is to understand your business better, make wiser decisions, and react at the speed of the market, whatever that market looks like.
Rob Collie (01:27:10): I love of your point about the same movie. I think the data industry, and especially the customers of the data industry, run on FOMO. There was business intelligence. You hear about it, you hear about it, you hear about it. And you just get this nagging sense of FOMO. And so I'm sure there were a million conversations that took place around the world where some C-suite executive turned to IT and said, "Hey, what are we going to do about business into intelligence?" It's this means looking for an end. And then big data. What are we going to do about big data? What are we going to do about ML? It just sort of repeats over and over again. It's all about the end goal. What are you trying to do? It's the improvement that really should be driving the conversation.
Diego Oppenheimer (01:27:48): Going back to the beginning of it, and this is truly how I feel, I feel like it's my life mission to like help build tools to understand data better. I love it. I love everything about it. I've made an entire career, now it's going to be almost 20 years of career, just building tools on how to understand data better and how to work with data. Because at the end of the day, there's so much that goes into our everyday decision-making and life, and everything about everything's being powered by like some level of data stream. And so it's pretty exciting to be in the tooling space of that. I used to joke around, we worked on the seventh wonder of technology.
Rob Collie (01:28:21): Yeah.
Diego Oppenheimer (01:28:22): Right? Excel. I mean, it's amazing, right? And where do you go from there? There's so much more in terms of like the data processing and tools that can be built and like what's the people are doing. And I don't like getting into cliches like, oh my God, we're changing people's lives for the better. But we are moving to a society that is purely powered by not the physics version of signal processing, but data processing and data understanding and how we do it. And it powers everything. And so it's pretty exciting to work on any concept of that. We're making humans better at whatever they want to do at a certain degree.
Rob Collie (01:28:52): It's a productivity gain. Right. And that tends to contribute more to the world than what each individual working on it consumes.
Diego Oppenheimer (01:29:00): It's impressive. Like what you can undo and what we've moved over the last decade in terms of advances in this space and how businesses are getting better. And the outcome can be in any way, right?
Diego Oppenheimer (01:29:12): But, to your point around the productivity gain, it can be all the way down to, all we care about is like squeezing out more margins, to like somebody was asking me, do you rather work with a big airline? And they have a lot of issues now with COVID and stuff like that, but they have a lot of data problems, right? And so it's really interesting. I always ask them, I was like, I'd rather work with a budget airline than one of the traditional airlines. Not because of a judgment of them, but the budget airlines are really interesting, because they're not trying to optimize everything. They're usually the most advanced data airlines, which is kind of contrary to what you would think.
Diego Oppenheimer (01:29:43): But if you think about budget airlines, they're going at every single thing and looking at every single margin, everything optimization. So their data literacy is usually off the roofs, because that's how you build the budget airline to a certain degree. They're pushing the limits of what you can do with data, which I find interesting. And it's not a comment on like, I'd rather go on a budget airline than not. It's just interesting from that perspective.
Rob Collie (01:30:04): I never thought of it that way, because for many, many years I've been going around saying like, hey, when economic conditions get chaotic, when there's a recession, or a pandemic or whatever, the need for data processing the need for data, the need for BI, all that kind of stuff goes up, because the pressure goes up. When the pressure is higher, there's a premium on intelligent decision-making. You're describing a corner of the market that is sort of perpetually in that state. And I had never really thought of it that way before. So again, I got smarter.
Thomas LaRock (01:30:35): Rob, you keep talking about how you're getting smarter. And I think at some point we're going to just want to take a baseline of where you are and then...
Rob Collie (01:30:42): Well, it's not hard to get smarter, you know?
Thomas LaRock (01:30:46): See? Now I was being polite.
Rob Collie (01:30:48): I know, I know. But Tom, someday you'll also understand quantum computing at the same level I do, you know? I mean, there's, there's always something to strive for.
Thomas LaRock (01:30:54): Maybe, you know what? You and I will always be entangled.
Rob Collie (01:30:58): That's right. That's right. In fact, all the atoms making us up will be. It's just, ever since the big bang, man...
Thomas LaRock (01:31:03): Hey, before I go, software hall of fame, put a pin in that. You and I should figure out what belongs in the software hall of fame. Start making a list-
Rob Collie (01:31:12): Oh I like this...
Thomas LaRock (01:31:13): -of how to qualify people.
Rob Collie (01:31:14): I like this.
Thomas LaRock (01:31:14): And then vote people in. The power, the P3 software hall of fame.
Rob Collie (01:31:19): I love it. I love it.
Thomas LaRock (01:31:19): And every year we can have an awards ceremony.
Rob Collie (01:31:23): Yeah. And the controversy of the ones we've left out...
Thomas LaRock (01:31:26): Oh yeah.
Rob Collie (01:31:27): It's going to be delicious.
Thomas LaRock (01:31:29): Tableau? Tableau? Go fuck yourself. You're out.
Rob Collie (01:31:31): No, we give them the honorary. There's an honorary tier that's sort of like faint praise.
Thomas LaRock (01:31:36): Okay.
Thomas LaRock (01:31:38): Diego's like, 'I don't work in market area.' I'm like, 'Well I do.' So I'm on this, the software hall of fame. We can ride that way.
Rob Collie (01:31:44): That's basically my job too, Diego. So it's the podcast. Hey, listen, I really appreciate this, catching up with you after basically 10 years.
Diego Oppenheimer (01:31:53): Well, appreciate you having me, it's awesome to chat and catch up. There's probably not a day that I don't talk about Power BI and Power Pivot. I still think today it's one of the most amazing software stories out there for anybody who's in data, and what the simplicity was. And I kind of laugh about it because I think internally, outside of all the politics between server tools in Office at the time and stuff like that, it was a very different Microsoft. It was few people who understood what it could be.
Diego Oppenheimer (01:32:20): And like, I didn't come up with a strategy, right? I just got like invited into it. So I don't even pretend like this was some genius moment. Like I knew nothing at the time, right? But I love it. I mean, I look back, I'm super fond of the time, and the period. And looking back now, and looking at the powerhouse, pun intended, that like that entire platform is? I mean, it's awesome. I love it. I'm a huge fanboy still today. I still use it. I have it on my desktop. Anytime somebody's like, oh, how do I do this in Excel? Just give me a second, [crosstalk 01:32:50]
Rob Collie (01:32:55): All right, man, again though. What a great conversation. Thank you so much. And congratulations, Diego.
Diego Oppenheimer (01:32:59): Appreciate it, thank you.
Thomas LaRock (01:33:00): Thanks for listening to the Raw Data by P3 Adaptive podcast. Let the experts at P3 Adaptive help your business. Just go to P3adaptive.com. Have a data day!
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.