Raw Data By P3 Adaptive
SQL Batman Didn’t Want to Be a Data Janitor, w/ Thomas LaRock
Head Geek-SolarWinds & Co-Host of Raw Data By P3Listen Now:
In our inaugural episode, Rob and Tom cover all this and more:
- The slow decline of the “storage only” professional
- Tom has some unkind words for a rock and roll superstar
- Why SQL Batman had to retire
- The rise of hybrid IT/Business professionals
- Why are database administrators (DBA’s) such a miserable crew?
- Rob shares a great story about a former colleague who pissed off Steve Ballmer
- Why storage has gone “curly” but analysis remains “rectangular”
- How discovering Power BI feels like you’re the first person to discover fire
Rob Collie (00:00): Okay, welcome. So this is our inaugural episode and our first guest is also going to be our ongoing 75% co-host; 75% only because his schedule won't always allow him to join us, but he's Tom, or Thomas LaRock. Now, I've known Tom for over a decade and he's just a fantastic human.
Rob Collie (00:22): I hope you're going to find that to be a theme here on Raw Data with our guests and various participants. But data-wise, the thing I've always found so compelling about Tom, his crossover status. Now, here's a guy who branded himself publicly as SQL rockstar for years, and he kind of still does. And you'd think that pretty much cements him as a storage professional.
Rob Collie (00:44): But basically the whole time I've known him, he's been trumpeting the idea that analytics are the real show. Now, in my experience, that sort of crossover is atypical and it's super valuable and it speaks to why I'm thrilled to have him as my fractionally available co-host and as our first guest. So how do we get this started? Luke, do you think we could do one of those like fancy produced intros with music and stuff?
Luke (01:11): Yes. The budget did allow for a fancy produced intro.
Rob Collie (01:14): Oh yeah?
Luke (01:15): Yep.
Rob Collie (01:16): Well, let's do it then.
Announcer (01:18): Ladies, may I have your attention please?
Announcer (01:22): This is the Raw Data by P3 podcast, with your host, Rob Collie, and your co-host, Thomas LaRock. Find out what the experts at P3 can do for your business. Go to powerpivotpro.com. Raw Data by P3 is data with the human element.
Rob Collie (01:40): Welcome to Raw Data. I'm your host, Rob Collie, CEO and founder of P3, powerpivotpro.com. And Tom.
Thomas LaRock (01:50): Hi, Thomas LaRock here. I am a head geek at SolarWinds.
Rob Collie (01:56): Thomas. He's Thomas LaRock.
Thomas LaRock (01:58): Thomas.
Rob Collie (01:59): Yeah. And who else do we have here with us?
Luke (02:01): My name is Luke. I'm talk radio guy in South Florida. And I'm the guy that knows nothing about data.
Rob Collie (02:09): So welcome to the first ever edition of Raw Data. I'm really excited by the crew we've got here. So let's jump right in. Tom, we've... Or should I call you Thomas?
Thomas LaRock (02:23): You call me Tom. I usually introduce myself as Thomas just that people don't think it's like John or Dom or something like that.
Rob Collie (02:30): I see. Thomas is easier to parse. I get it. But I just wonder if you were one of those guys that changed his name at some point, like you grew up. Because I've always known you as Tom. But Thomas. So we've known each other now for... We had our 10 year anniversary recently, didn't we?
Thomas LaRock (02:49): Yes, we did. It was 10 years ago, this past June.
Rob Collie (02:53): Yeah. It's crazy. It was like it was just yesterday. So we have interesting complimentary backgrounds, you and I, and that's why I think I was really excited to do this with you. What's your handle on Twitter?
Thomas LaRock (03:07): SQL Rockstar.
Rob Collie (03:08): SQL rockstar. Now, Luke here, in his day job, he actually interviews real rock stars. He just had Sammy Hagar.
Thomas LaRock (03:21): Oh, I thought you said rock stars.
Rob Collie (03:22): Oh, cold. All right. Well, he had Jason Newsted. How's that?
Thomas LaRock (03:30): I don't even know who that is.
Rob Collie (03:34): I'm starting to regret my selection of cohost. He was the basis for Metallica for a while.
Thomas LaRock (03:42): Oh, yeah. Okay.
Rob Collie (03:44): Not anymore. Luke is really kind of slumming it with us. He's gone from real rock stars to SQL rock stars.
Thomas LaRock (03:53): Wow.
Rob Collie (03:54): I know, I know. I mean, it's not like I'm any better. I'm not Sam Hagar. Anyway, in a previous conversation, I said, hey, well, I come from sort of like the analytics and business intelligence side of the world. And then I said that your background originated more on the storage side and you kind of recoiled just a little bit. It didn't seem like it was right to you. And then you thought about it. So what does storage mean to you? Is it the right word? Am I using the right word?
Thomas LaRock (04:32): Well, I think you are using the right word. I just never thought of it in that manner. To me, when I hear storage, I think of the guys in charge of the SANs or racking servers and things of that nature. The storage admin, there's doing storage. I was a database administrator. I never really thought about it.
Thomas LaRock (04:53): But in a way it is storage because it's the storing of the data. So the data has to go into an engine and then to disk and then from disk back through the engine and back to the client. So yeah, you could think of it as storage. I usually think of it or I usually tell people the focus was on the internals of the database engine itself. In this case it would be Microsoft SQL server.
Rob Collie (05:18): Yeah. So a lot of the history of the analytics industry and the business intelligence industry is we're still, I think, in the middle of a multi decade hangover of the influence of the storage industry on the way that a lot of analytics were, even from a software industry perspective. By necessity, there have been so many storage professionals. Storage and retrieval.
Rob Collie (05:51): If you can't store data and recall it, you can't run a business. You can't even execute a transaction. Whether you're doing any reporting or analytics or not, storage is table stakes. And so when we met, I had recently joined Twitter within the last six months before we met, we met in what? It was like May of 2010, somewhere in there anyway. We just celebrated our 10 anniversary, I should know this. But I forget. I'm that guy.
Thomas LaRock (06:23): You forgot our anniversary?
Luke (06:26): Shame.
Rob Collie (06:27): Yeah, I know. I know. And so I had joined like the data channels on Twitter. I had done the right thing. I had gone and I joined the data channels and I quickly discovered that almost everyone on those channels were storage professionals. They were primarily database administrators, DBAs. So when I walked up to you and I got introduced at that live tweet event that was, I don't know, it was kind of a funny thing that people used to do.
Thomas LaRock (06:53): Remember we used to meet people and go places?
Rob Collie (06:57): Yeah. That was radical stuff. But do you remember the first question I had for you?
Thomas LaRock (07:02): I do. I do. It was essentially the clean version is, why are DBAs so miserable?
Rob Collie (07:13): It's quite an opener.
Thomas LaRock (07:15): It is quite an opener. I was stunned because I remember looking at and going, first of all, who the hell is this guy? And secondly, how does he know us so well?
Rob Collie (07:27): The power of Twitter, man.
Thomas LaRock (07:30): Yeah, you've been stalking us. And clearly I had no defense. I wasn't about to sit there and tell you, "Oh no, we're the happiest bunch of people. What are you talking about?" I sort of looked at you and I'm like, "I don't know why we're so miserable. I have some ideas as to why we're so miserable." And I think we talked through some of those ideas.
Thomas LaRock (07:52): And if I recall, towards the end of that initial conversation, your comment to me was basically we were just like the Excel community. We had a lot of the same traits. Part of our misery was rooted in working, not just with data, but with users of data.
Thomas LaRock (08:17): And it was interesting to find the parallel between what I thought was a unique group of individuals, this database administration community, and all of a sudden the Excel community. I'm like, what do we have in common? We actually have so much more in common than I had ever realized. So you're question really opened up a brand new perspective for me at that moment in time and going forward.
Rob Collie (08:47): Yeah. The common thread there, I think, is a community. Although really the Excel people don't really have a community. They're a demographic, if you will, but they don't really have a community in the same way that the DBAs do. They make the world go round. The world runs because of... And I know there's lots of people that make the world run, but DBAs and Excel people, people who are good at Excel, these are incredibly essential roles for the world that no one really sees or appreciates what really goes into it.
Rob Collie (09:25): And so when they interface with the rest of the business world, they tend to be taken for granted, even though what they do is some pretty arcane skills that are developed there. But yeah, the Excel people don't have a water cooler in the same way. I have been more recently monitoring things like the accounting sub Reddit. Okay. Now here we go. Here's where the grumpiness is. Yeah. There's some Excel grumpiness. It's the same kind of blowing off steam outlet as what I was seeing on Twitter back 10 years ago.
Thomas LaRock (10:03): Yeah. I've often referred to just the internet in general as a cesspool of misery. But then you get into that dark corner called Reddit and you're going to just find I think a lot of people more... Maybe it's the anonymity. You don't have to really use your real name and you can just sort of vent.
Thomas LaRock (10:24): And I think for some people, Reddit is just a place where they can vent. But for an outsider like me, I'm not really active in Reddit. I can go there and I'm like, "Wow, these people are really miserable." No. Actually, they just need to vent.
Rob Collie (10:37): Maybe. Although honestly I find Reddit, and again, I curate what I consume from Reddit rather than just like taking the default feed, but I find it to be the most intelligent and civil corner of the internet. But again, it's probably because I've tuned it.
Thomas LaRock (10:55): Oh yeah. Absolutely. There are corners of Reddit where people are civil. Absolutely. And then there are some horrible, horrible places. But for all that, nothing's as bad as YouTube comments.
Rob Collie (11:11): That's what Joe Rogan says.
Thomas LaRock (11:12): That is the worst, worst thing.
Rob Collie (11:16): Yeah. I don't think I've ever really gone down that rabbit hole, so I'm going to probably stay away. Circling back, where you and I sort of found I think almost immediate common ground was in the notion that I had come originally from the Excel community. That's what I worked on at Microsoft for a very long time, was worked on Excel before I got involved in the business intelligence side of software.
Rob Collie (11:44): And now for the last 10 years been running a company in that space, a consulting company. Where these two worlds meet is where an extra kind of value is created from data. So there's the primary usage of data, which is running the transaction. Someone buys something, for example, you've got to record the transaction. You've got to process it. You've got to make sure that they paid, all that kind of stuff.
Rob Collie (12:11): Obviously that's primary usage of data in business is to make the actual transactions operate. But then there's this secondary value of data and the secondary value is mining it, if you will, for insights about your business to improve and optimize. And that's where I've been. For a while out there, I was a SQL server MVP. Microsoft had knighted me as a SQL server MVP and I don't know SQL at all because the BI stuff was looped in with it.
Rob Collie (12:44): But one of the things I found super, super, super compelling about you over the years, Tom, is that you don't view where you came from as the only thing. You're evolving. And you've been very, very open and enthusiastic about the world of analytics, BI, whatever you want to call it. Whereas not everyone in your original community, your community of origin, where I met you, not everyone in your community that you came from is like that. And you are exceptional in this regard. You're not the only one.
Thomas LaRock (13:18): Right. I don't think I'm exceptional, but you are absolutely right. There are lots of people that would have you understand that let's say you were at an event, a large three-day conference, and it was SQL server focus event. Then everything should be core engine, deep dive, 500 level sessions.
Thomas LaRock (13:39): And wait, what's this thing? Business intelligence? We don't need any of those sessions here. Go somewhere else. And those people absolutely exist. They still exist today. There's fewer I would say today, but they're out there. They used to be a lot more.
Rob Collie (13:54): Where'd they go? If they're not there anymore, where did they go?
Thomas LaRock (13:59): I think they've started just disengaging with the community as a whole. Because the middle ground, the middle class, they have kind of embraced a little bit more of the analytics space because it's everywhere now. It's prevalent all over. You can't get away from it. I think some of those more extreme people just don't find comfort in being around or say they just don't feel that's the group for them anymore.
Thomas LaRock (14:25): They want to be with people that are really just core engine. That's it. So we certainly had that, like when you and I first met, those people were out there. I wouldn't say I was one of them, but I would say I was kind of stuck in my own little silo. I had my own blinders on for my own reasons. It was just stuff that I was working with and I wasn't really exposed to as much. But over time I was exposed to it.
Thomas LaRock (14:51): I have a background in mathematics. And so for a lot of it, it was kind of interesting and familiar to me. And of course, a few years after that, that's when data science started becoming an actual term, which was interesting again. I see a lot of these people saying, "Oh, well..." I'd also worked on my Six Sigma certifications at the time. I think I got green belt at the time. And that was a lot of stats.
Thomas LaRock (15:22): And these people were just mesmerized by being able to understand what a standard deviation was and how to apply it and how to use it. It was the application of these tools in order to get insights from your data. And I liked it. So I continue to kind of try to absorb a little bit of that. Meanwhile, I still have one leg in, hey, what's happening inside the engine? Somebody's going to come to me and say, "My query is slow. I need to make it faster." And I want to be able to help them too.
Thomas LaRock (15:54): But I also want people to come to me and say, "Hey, sales are down. What can we do?" "Oh, well, what data do we have?" "I don't know." "Let me help you sort this out and figure out if I can find any value for you." So yeah, it was an interesting time, I think, around 2010. And I think that's when BI really started getting more mainstream. There was a lot of work by Microsoft for reporting services.
Thomas LaRock (16:21): Of course, when Power BI came out, I want to say it was about five years ago now. So there's a lot of work by Microsoft to help turn the corner. I mean, they even changed the name of, I'm no longer a SQL server MVP. I'm now a data platform MVP. So even the wording and everything about it has kind of changed and been a little bit more welcoming is what I would say. So these days I think most people are very comfortable with the idea that they might have to have one foot in the analytics space as well inside the database engine.
Rob Collie (16:55): Yeah. It seems like such obvious, low hanging fruit, if you're already up to your eyeballs in the data platform in various ways. Why not even just from a career standpoint go and pick up that secondary value? You're like nine tenths of the away there maybe. The challenge though, of course, is always the human element. The more someone identifies as an IT professional, typically the less business interested they happen to be. And this is a very, very broad brush.
Rob Collie (17:28): So people are listening to this right now going, "Wait, I'm in IT and I'm obsessed with business." Well, yes. And that's great. And that's actually a rising trend, this idea of the hybrid. I'm running into all kinds of people these days with job titles that so clearly scream one foot in IT and one foot in the business. And that's the way of the future, I think.
Rob Collie (17:51): But there's still a pretty strong center of mass there for if you think of yourself as IT, you're focused on the tech and not necessarily even as interested in the business problem. And that's the human component of it. I think that if you view BI and analytics as just another part of the stack, just another part of the technology toolkit, well, that's where we've come from. The entire BI industry has always been like that.
Rob Collie (18:22): And spoiler alert, it's never worked. That mindset has never once worked. It's this hybrid mindset and the tools that enable it that are really changing things right now. Boy, I've really buried the lead here. Here's the question I have for you, and it's a two parter.
Thomas LaRock (18:41): Oh, you didn't say there'd be a quiz.
Rob Collie (18:43): Yeah. You can't be wrong about this. I'm going to ask you both parts at the same time so you have an opportunity to contemplate both answers simultaneously. So you're SQL Rockstar on Twitter. That's your brand. That's in many ways synonymous with you. And you know how rebranding is. Rebranding is very difficult. So when did you first get into storage? When did you first get into SQL server?
Thomas LaRock (19:08): Oh, early two thousands. Let's just put a stick and say 2003 ish. I was programmer/developer before that and using Sybase and Oracle and SQL server. But around 2003 was when I started doing more the database administration role.
Rob Collie (19:27): Okay. So let's set 2003 as just semi arbitrary milestone and say you could go back to 2003 and tell your 2003 self, "Hey, self, when you get around to branding yourself, here's what you should call yourself." Would it still be SQL Rockstar? That's question number one.
Thomas LaRock (19:51): Oh, I'm sorry. Do you want me to wait? Okay. I'll wait.
Rob Collie (19:54): Well, I was hoping for a little bit more than a one word answer as well, but I'm sure. You're as long-winded as I am, so we're a good pairing. The second question is if you could instantaneously rebrand your Twitter, for example, today, you could pick another handle today and have it be retroactively what you always had been, would it be the same as your answer for 2003? If you could just pivot today with no switching cost.
Thomas LaRock (20:21): Here's something that I guess you didn't know about me. I've already changed my Twitter handle once.
Rob Collie (20:26): Oh, I did remember that. Yeah. You were just Thomas the Rock for a while, weren't you?
Thomas LaRock (20:30): No.
Rob Collie (20:30): What?
Thomas LaRock (20:31): I was SQL Batman.
Rob Collie (20:33): No, you weren't.
Thomas LaRock (20:34): I was SQL Batman. And I was SQL Batman because in my role, that's basically what you are. You're Batman. Something goes wrong, they call you and you come in and you're superhero and you have to fix it. You're just like, I'm Batman. And I had that for maybe almost a full year of being on Twitter at first. I had stickers that said SQL Batman with the bat logo. I had all this stuff going on. That's what I was originally doing.
Thomas LaRock (21:05): And I ran into some issues with licensing, if you can imagine. If I wanted to get stickers printed up and use a service, they'd be like, "We're not doing this. You can't use that." And I'm like, "Come on. I'm harmless." But no, there's a whole thing about protecting trademark and copyrights. So I got tired of trying to utilize the SQL Batman. I even had sqlbatman.com.
Thomas LaRock (21:38): I just made change and I said, you know what? I'm going to change my blog to be thomaslarock.com. I'll just use my name for that. But for Twitter, I'll use the handle SQL Rockstar. And the reason I chose rockstar, my last name is LaRock. I've had the nickname the Moniker Rockstar since I was about 16. My friends in high school, like LaRock, rockstar. It's just the way it was.
Thomas LaRock (22:03): So I decided I would just call myself rockstar. But at the time the rockstar movie was out and that was yet another problem. So I just said put SQL in front of it and then it's what I own and I'll move forward with things. And if I could go back, I'd probably tell myself, "No, do data rockstar instead." But honestly, these days I look and I go, "Just use your name."
Thomas LaRock (22:25): There's no reason to have the thing at the time. It was what pretty much most of the cool kids were doing on Twitter at the time. If you were inside of the database community, you were using SQL in front of everything and being cute. And I did it and I've never bothered to change it. So yeah, if I could go back in time, I would tell myself have more of a different focus. If I wanted to use a cute Moniker, it'd probably be Data Rockstar, Data Pro, or something like that.
Thomas LaRock (22:58): If you notice, actually for a while, I didn't use my real name. It was at SQL Rockstar and the name was also SQL Rockstar. I changed a few years back to put my real name in there. So you can see Thomas LaRock and at SQL rockstar. That was kind of my compromise for myself instead of just changing my handle, which I don't think I could do, because once I got verified, you can't touch things. Otherwise they take away your check mark. You don't want to lose the check mark. The check mark makes me legit.
Rob Collie (23:25): Oh yeah. That's right. That's why I invited you. It's the check mark. We needed more check mark. Infinite percentage increase in check marks on the show. So yeah, data rockstar. That would've been good. I could get behind that. Here's a hypothesis of mine. That's more than a hypothesis. This is an opinion of mine. It's been really interesting. I've watched this evolve over probably almost a full 20 years I've been watching this story and it's basically that storage changes all the time, but analysis is...
Rob Collie (24:00): Actually, there's all kinds of technological improvements that allow us to do things fast or do things better, lower friction, et cetera. There's been a lot of big changes on that front. That's really like the reason why my company even exists is because of how much change is happening and already has happened in that space. But at Microsoft's in the early two thousands, about the same time that you were getting into SQL server for the first time, Microsoft was really struggling with this, starting to wake up to the idea that data might not be just stored in tables.
Rob Collie (24:39): That data might not always be table shaped. And this was really causing almost like an existential crisis in the data world at Microsoft. And it's really funny. I got to read white papers written by like the architects that even at the time were being paid like $3 million a year. These really high [inaudible 00:25:00] flu white papers that sounded really smart. And from what I remember them now, they have no whole bearing on where the world actually went.
Rob Collie (25:11): They were just completely off. Basically the answer was, oh, well, we'll just make it so that we can also store XML blobs in SQL server and that'll take care of it. I mean, there are all kinds of funny things. There are so many funny things to reflect on. But while the storage half of Microsoft was freaking out about this and while often in the shadows things like Hadoop were being born that was the real answer to this crisis. Not XML of blobs in SQL.
Rob Collie (25:49): The analysis world was also panicking about it at Microsoft. Something really fundamental about the way that we work was potentially at risk. And there was another architect at Microsoft who had decided to kind of crash the Excel team for about a year. He just needed a place to land. This guy had been around forever. He came to the Excel team to tell us that the Excel grid, rows and columns, that was old fashioned, that was outdated, and that was going to go away. We needed all of that sort of jagged, heterogeneous content, where like, how do you store a webpage?
Rob Collie (26:32): That the content of a webpage doesn't fit into a row oriented storage that well, does it? And each web page is different. Every different piece of information might have different columns if you were going to try to store it in a column oriented way. And he was convinced that that same phenomenon was coming for Excel. That every row of data in Excel might have a different column set than the previous row and Excels whole formula language, everything needed to be redone to accommodate this.
Rob Collie (27:11): Now, of course, we now have the benefit of hindsight, 15, 17 years later, this hasn't happened and no one's dying for it either. But remember, this is like a main man at Microsoft. This is someone that gates himself. He was almost like a lieutenant of him. And at that point in my career, I'd already finally learned enough to know when someone was just like totally off the rails. And he was so much more senior than I. The only card I had to play against him was just to repeatedly say like Tom Hanks in big, to say, "I don't get it," just over and over and over again.
Rob Collie (27:55): Rows and columns. We've had rows and columns in Excel forever. Everyone's played battleship. They know how to line up a row and a column and give it a coordinate. Your thing. I don't get it.
Thomas LaRock (28:07): It's a great strategy.
Rob Collie (28:11): I also knew that it would work because my superiors had made it very clear to me that we weren't going to do stupid, like crazy computer science things with the Excel product. We had more responsible things to do. But none of them really wanted to go toe to toe with this guy. So they kind of put me out there. But I knew who was writing my review.
Rob Collie (28:34): And this guy would go around and tell everybody behind my back that, "Whenever I bring this up with Rob, that guy, that guy, he's just done. He's just done." He was so obnoxious. He's basically telling everyone that I was too stupid to understand what he was going for. But again, even him now, all these years later, he would probably admit that analysis is rows.
Rob Collie (29:00): The only things that you analyze are the things that are in common between like if you've got like 15 rows of data or 5 million rows of data, and some of them don't have certain columns, well, you wouldn't be including those in your analysis unless you had common attributes in each row of data that was interesting to analyze. Otherwise that row wouldn't even be involved.
Rob Collie (29:29): So the way I've been boiling this down for people lately is that... And this architect, he described this odd heterogeneous storage, he described it as curly data. And I liked that. I like that idea, curly data. And like you need to go store the internet for a search engine, damn straight that's curly data. That is not nice, clean tables.
Rob Collie (29:55): But when it's analysis time, analysis you're always pulling rectangles. You're always extracting rectangular table shaped row sets in order to perform the analysis. That's separate from the storage. So the query engines that have been built over time that allow you to retrieve data from things like Hadoop. Well, how many sequel-like interfaces have now been built to pull regular shapes out of those sources?
Rob Collie (30:32): So my world of analysis has, at least until now, been very well insulated from the storage revolution in terms of, what do you want to call it? Curly data. The curly data storage revolution. And the same way that analysis wasn't disrupted terribly much by the transition from tape reels to hard drives, the fundamentals of what you were doing, the technology was different, but the fundamentals of what you were doing were not rewritten just because we started storing things differently. That was a monologue. That's one of my things. What's your reaction? I've never told you that story before. I don't think so.
Thomas LaRock (31:14): No, I don't. And my first reaction is, is that person still at Microsoft?
Rob Collie (31:19): No.
Thomas LaRock (31:19): And I need to know right now. I was going to say later you're going to tell me who that is.
Rob Collie (31:26): Before we move on then, I won't tell you who it is, but A, he left Microsoft in a huff shortly after that, after he did not get his way on that. Right. I kind of get to almost like paint a silhouette of him on my airplane. And he famously when he told Ballmer he was leaving, Ballmer threw a chair across the room. So now you know everything you need to know to look up who this was.
Thomas LaRock (31:53): Threw a chair across the room, because Ballmer wanted him to stay. Huh?
Rob Collie (31:58): Yeah. And he went to Google and all of us on the Excel team were just sitting chuckling like, "Eat it up. Yeah, you should take him."
Thomas LaRock (32:11): Well, that explains a lot about Google Sheets. Okay. So here's the thing, when you were just describing to me about the curly data and you're talking about the analytics and you got to the point that was in the back of my mind as you were speaking, which is that you say row, I'll say it's... What's the fancy word? Observation. That row, the observation of a data event, you may not have information for all those columns or attributes.
Thomas LaRock (32:43): And that's totally normal to me right now. I'm like, yeah, I get it. One of the things I say is nobody goes to school to become a data janitor. Hmm. I didn't. There was no course. And I think your response to that was here we are. This is what we do. We are the data janitors of the world, whether you're Excel or you're a DBA, this was the common ground we had.
Thomas LaRock (33:12): I didn't know it 10 years ago. It took me a while to get up to it. But why are we miserable? Because we're data janitors all day. This is what we do. And why don't we have the observations for all this? Are you kidding me? I don't know. A sensor went down. Oh, okay. Or we just didn't think to ask that question. And so it's not included in 10,000 survey results. We didn't think that question was worthwhile. It's like, but there was data, and I had this whole model built and it needed that.
Thomas LaRock (33:42): Now what am I supposed to do with these 10,000, 20,000 records? It can be very frustrating. I can see what the man was trying to describe, but he really wasn't able to articulate what he thought was coming. And more importantly, he didn't really understand the tool. He thought the tool had to change, but the reality is the tool itself didn't have to change.
Thomas LaRock (34:12): It was the application of the tool was going to become different. And he couldn't see that even at the time. I mean, Python was a thing. You could have done so much more. There's a bigger world out there than just the Microsoft data platform, as great as it is. And I love it.
Rob Collie (34:30): Come on, come on.
Thomas LaRock (34:30): It's true. But there's still stuff out there, stuff out there. But yeah, that was kind of my thought was this guy was not a data janitor and he wanted the tool to do this specific thing. So you guys were going to have to go and reinvent it, which would've been a huge waste of time. Whoever was really in charge there for you guys, thank God they knew not to try to shift gears.
Rob Collie (34:55): Yeah. I completely agree. The thing that he was missing, and it's not like I knew it then either, if I'd known it, I would've told him this storage revolution that they saw coming is decoupled from analysis. Again, up until this point, you never know what's around the corner. But up until this point, analysis has insulated. It's kind of like I just need to know...
Rob Collie (35:25): Another way to say it is that I don't actually know truly deep down how a SQL database is structured. I don't need to. I think of the table that I pull from it, which is oftentimes a view written by a friendly person, such as yourself. That view is reconstituting my rectangle of data that I need from all kinds of other tables that I don't necessarily see.
Rob Collie (35:53): I don't care. It's beautiful. I don't need to. And so if the view, the rectangle that I'm getting happens to be stored out there on many different hard drives and in a hive farm or something and in curly format, but I get a rectangle back, my job doesn't really change. Take the analysis hat off for a moment. What are your observations of this?
Rob Collie (36:22): When I call it like a revolution in storage, is it really? How much has the curly storage model, Data Lakes, Hadoop, all that kind of stuff. How much is that... I don't know. I was going to use the word invaded to sound dramatic. How much has that stuff kind of invaded your world?
Thomas LaRock (36:43): Well, in terms of say the Microsoft data platform, it was years ago when they introduced the concept of PolyBase. So PolyBase is just a simpler way for you to link to almost any other data structure and to pull the data into SQL server. And they're trying to make it very easy to connect basically from their data platform and extend into any other platform in order to get the data into one place and then build your rectangle for you.
Thomas LaRock (37:17): So it's there and it can comes up every now and then and somebody says I've built this. It's not working as well or things of that nature. So it's definitely part of the ecosystem these days. And the latest one, what is it called? Big data clusters that Microsoft just rolled out. They're making efforts to build into their ecosystem something that is equivalent in other ecosystems.
Thomas LaRock (37:46): So if you are a Microsoft customer and you need certain functionality inside that data platform, it actually exists somewhere. It's a framework, it's a tinker set. All the pieces are there. You might have to build something more or less than other things. But a lot of that functionality is really there, especially in Azure. There's just so much these days.
Rob Collie (38:09): Azure. This is not SQL server. Now, there is SQL Azure.
Thomas LaRock (38:19): No. There's Azure SQL database. Microsoft marketing would not want to hear you say SQL Azure.
Rob Collie (38:24): Well, if they're listening, I'll call it a win. If we reach the point where we can upset people with the way we describe things by using the... I'll start calling Power Pivot, I'll start calling it Power BI in Excel. It's just really the only rational name for it.
Thomas LaRock (38:44): It used to be called SQL Azure and I love that name.
Rob Collie (38:49): Look how dated I am. It makes sense why they call it data platform, because there's just so many things in there now. And so many of them, as you were hinting at, are clones in a way, improved clones in many cases of things that we see on the Linux platform. If you go look at AWS, so much of AWS, the services available there, it's what I call the Linux cool kids stack.
Rob Collie (39:21): If you're launching a startup in Silicon Valley, you're issued your MacBook and here's your AWS subscription. These are like the starter kit. Microsoft licensed, what is it HDInsight? That's basically like a Linux distribution. And so there's a lot of literal Linux services available on Azure.
Rob Collie (39:46): And at the same time, you also see these more windows based services in the Azure platform and you start almost like lining them up. You start saying, "Oh, this one's kind of like that one from over in the Linux stack," but it's Microsoft taking a look at it going, "Oh, we can do better."
Rob Collie (40:02): And so it's a really interesting ecosystem going on over there. Let me put you on the spot here. Have you done any technical hands-on work with, I wonder what we call it, modern storage, the curly storage, or have you've been in sort of the chief geek role long enough that you haven't gotten your hands dirty with that?
Thomas LaRock (40:23): So it's head geek.
Rob Collie (40:24): Head geek. I'm so sorry.
Thomas LaRock (40:27): I do joke that I haven't had a real job in a long time. I'm very far removed from my production DBA days. However, in my role as head geek, I get my hands on the things, but not for production purposes. It's more for I've got to learn to understand what these things are doing, how certain things work, because I need to be able to explain some stuff to others.
Thomas LaRock (40:49): But what I have done, and it's been a few years, Microsoft partnered with edX and they put together some certification programs. So you would take like 10 classes online through edX and they would align with a certification. I got a certification in, let me think now, well, one was in big data, one was in machine learning, I think, and another one in artificial intelligence. So have I put my hands on the curly data? I'd say yes.
Thomas LaRock (41:24): But those being Microsoft focused programs, it was touching a lot of areas of Azure. So did I have to go into Azure data factory, consume some data, transform it, write some use SQL to pull some insights out of it? Yeah, I had to do all those things. It's been a while. If I had to do it again, I could probably go back and figure it all out again.
Thomas LaRock (41:47): But once I did it for the program, there was really no need for me to touch it again. Lately what I have been doing is I've been spending a lot of time learning Python. Sometimes people say, "What should I learn, Python or R?" And I kind of view it as two different things. I think R is very much focused on being a tool for data scientists. And I think if you're a data scientists, you want to use it. That's great.
Thomas LaRock (42:12): I think Python is a little more extendable. It can do all the same data science things that R can do, but it can also do some other things. So that's why I chose to dive into Python and I've been spending a lot of time on it. And then there's this little website called Kaggle. Have you ever heard of Kaggle?
Rob Collie (42:27): I have.
Thomas LaRock (42:28): Yes. So I've started doing some learning and competitions in Kaggle. And again, focused on using Python, but I can also go use other things. If I need to drag some data into Excel to be a data janitor for a little bit, then, yeah, I can do that. So there's a bit of an ecosystem than a say a toolkit that I built up for myself now. And that's where I've kind of been spending some time and getting my hands on that curly data.
Rob Collie (42:58): I'm all kinds of angry now.
Thomas LaRock (42:59): Why is that?
Rob Collie (43:00): I've got a couple of things to straighten out. First of all, in the answer to the question of, should I learn R or should I learn Python? The answer to that question is nine times out of 10, DAX.
Thomas LaRock (43:14): All right. You're wrong, but that's okay.
Rob Collie (43:19): Come on. There's a lot of trendiness in it. Now, there's still a tremendous usage of it. I'm not saying that learning Python is a bad thing. I think it's actually a really good thing. It's so often people's actual needs are better served by something that might not have that same kind of cool kids edginess to it.
Thomas LaRock (43:39): Yeah. I wouldn't want to do a lot of... A lot of times I see Python being used as all these examples. Some of the ways they're manipulating data, to me, I'm not sure I would really want to do it that way. I would want to use a different type of tool like Excel or Power BI or something, because I'm a little more comfortable with that than what these lines of code are doing.
Thomas LaRock (44:01): But if I want to build a model in machine learning, I could use Azure ML Studio. But under the hood, it's kind of just running the same code I could just do for myself. So I don't know. It's either/or, but I just feel that at the end of the day, Python just has a little bit more.
Rob Collie (44:18): Yeah. I mean, it's just so often a lot of Python will be written to draw a chart.
Thomas LaRock (44:26): Yeah, exactly. Oh no, you're right.
Rob Collie (44:29): Or to do a very fundamental aggregation that would've been so much more powerful and flexible if you built a DAX data model around it. I even go to developer conferences on occasion now and the whole goal is to say, "Hey, look, you know so many things that I don't know, you're so much more technical than I, and yet I'm going to do some things up here on stage that you can't do, really important everyday things that you can't do and I want you to be upset about it."
Rob Collie (45:06): Because I'm really just not that technical. I'm the least technical person at the company. Everyone we hire is so much better even at the things that I am good at. They're so much better at those than I am. A lot of things we're talking about like in the Azure platform, for instance, we have people who are very good at those things. I've never seen them. I haven't gotten the certification that I could even forget, like what you were talking about.
Rob Collie (45:34): And then you said, "If I need to drag some data into Excel and be a data janitor." Come on now. Modern Excel that has the DAX engine and the power query engine in it. We've escaped that. We've escaped the janitor hood as long as we work in an organization that understands what we can do, which, again, that human factor. Most companies are very, very, very slow to wake up to the fact that their resident Excel guru has now become a completely new species.
Rob Collie (46:03): The person who discovers what I call modern Excel, which is really the Power BI engines, the under-the-hood engines baked into Excel, when they discover that or they discover Power BI itself, they feel like they're the first person to discover fire. They sit back at their desk and go, "Oh my God." And they say things like the equivalent of, "Did you see that?" And everyone looks at them like, "No, we didn't see anything. In fact, maybe you should get back to work." It's a very unsatisfying.
Rob Collie (46:31): And then some period of time later, those people end up working for us. That's where our employees are made, is in those trenches. I was mostly just joking. You just equated Excel and data janitor hood so glibly that I had to circle back. I had to say something.
Thomas LaRock (46:52): I think it Excel is the tool of choice for most data janitors. We should make a commercial.
Rob Collie (46:58): That's true. That's true. We've experimented with some advertising like on Facebook. It's not running at the moment. The ad says, "Are you running a spreadsheet sweatshop?"
Thomas LaRock (47:11): Yes. I think I've seen that.
Rob Collie (47:13): Yeah. We have these people sitting in what looks like a bombed out factory, but there's all these spreadsheets on these monitors and everything. The reason I don't like the janitor term is because it sticks to the person more than it sticks to the org. That's why the spreadsheet sweat shop, I prefer that nomenclature. That's not the preferred nomenclature, dude.
Thomas LaRock (47:40): So I totally get how you have that apprehension about using the data janitor term. But I want you to know that in those courses I was doing, to earn that certification, one, or actually more than one, was taught by a friend of yours, Wayne Winston.
Rob Collie (47:56): Oh, The Wayne.
Thomas LaRock (47:57): And Wayne opened my eyes to how to use Excel in so many wonderful ways with descriptive statistics. And that's the type of stuff I'm talking about. I'm talking about, hey, I have these columns. How many are missing values? How many are no? Stuff of that nature that a lot of people would use Python for, but for me, I might just use Excel for that from time to time.
Rob Collie (48:19): Yeah. Good old rectangles.
Thomas LaRock (48:21): Wayne was so good. Such great courses.
Rob Collie (48:24): Was it live?
Thomas LaRock (48:26): No, it wasn't live. It was recorded.
Rob Collie (48:30): I tell you, a live course with Wayne would be another experience all together. He is such a character. I bet they had to edit him down to 20% of what... He used to visit Microsoft and believe it or not, teach classes to Microsoft's finance departments.
Thomas LaRock (48:54): I believe that.
Rob Collie (48:55): But then he'd come hang out with the Excel team in the evening and just like hold court. Oh man. It was like drinking from a fire hose. It was awesome. He lives near me. I mean, I'm in central Indiana now. I'm in Indianapolis and he's in Bloomington. I've been here for five years and we still haven't gotten together. That's on me. I'm probably not going to see him...
Thomas LaRock (49:19): Can't get together now either.
Rob Collie (49:20): Can't get together now either. Yeah.
Thomas LaRock (49:23): So it's not all on you, like the last three months.
Rob Collie (49:25): Oh yeah. I mean, I've got a good three or four month excuse now. I mean, I did reach out. We were going to get together, but then you know following up, that's the trick, isn't it? I think that's probably a pretty good place to wrap episode one. What do you think?
Thomas LaRock (49:38): I think so. I think we battled long enough about nothing. Raw Data is really a podcast about nothing.
Rob Collie (49:44): Is that what we're going to do? It's a podcast without substance. I look forward to doing more of these. We have not come close to talking about everything. We've got lots of ground to cover.
Announcer (49:59): Thanks for listening to the Raw Data by P3 podcast. Find out what the experts at P3 can do for your business. Go to powerpivotpro.com. Interested in becoming a guest on the show? Email [email protected] Have a data day.
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.