episode 198
The Million-Token Myth and the Magic of Digital Colleagues
episode 198
The Million-Token Myth and the Magic of Digital Colleagues
AI looks unstoppable… until you hand it a hundred pages of meeting notes. Rob and Justin dig into why context windows and token limits quietly run the show. That “million-token” brag from Google? More like weighing the Titanic in bananas.
From Shakespeare to SharePoint, this episode shows why AI remembers the Roman Empire better than your company history—and why that’s not a bad thing. Rob also introduces Griff, a digital colleague that fires off P3-flavored ideas like it’s had three espressos. It’s practical AI that’s actually fun to use.
Hit play to find out where AI is brilliant, where it falls flat, and how to make it work for you without the hype.
Also on this episode:
Episode Transcript
Rob Collie (00:00): Justin, I thought we would talk a little bit more about those two letters, AI, today. I have been continuing down this rabbit hole.
Justin Mannhardt (00:07): Indeed.
Rob Collie (00:08): Developing a greater and greater sense of confidence in a number of things, just sort of feeling much more in my own skin with it, and knowing where it sits. Let's go back to context windows and the token limit that was the subject of an episode, I don't know, some number of weeks ago. So, how surprising it is that on the one hand, if you interact with something like ChatGPT, its knowledge of all of human history is so comprehensive and deep, and it's able to reason over large, large, large, just massive volumes of that knowledge, and produce incredibly intelligent, coherent thoughts that pull from enormous amounts of information. And you're expecting that same power to be readily brought to your own information, your own private business information.
(01:05): Everyone has this intuitive expectation I think when they start out with this stuff, that you see what it does with the history of the Roman Empire, which is just gigabyte after gigabyte after gigabyte of information, and terabytes after terabyte of information, and it's amazing with it. So, you're expecting that the data and the information stored internally in your company, like in Teams and in meeting minutes, and in SharePoint document libraries, and all that kind of stuff, you're expecting it to have that same superpower over that information, and it turns out that is not the case at all. Not at all. What was baked into ChatGPT's O4 model or whatever, when it was being built, I think I've heard it called pre-training data, that has an incredible home field advantage in that brain, because it's part of that brain. It's how that brain learned to think, was learning about the Roman Empire over thousands of years, it was one of the things that it did, to learn how to think.
Justin Mannhardt (02:03): And it was literally trained. Part of the development process is it gets feedback through technical system, what's right, what's wrong, what's good, what's bad.
Rob Collie (02:12): And it was learning about the Roman Empire as it was learning to talk. Your knowledge that you want to add to it and give to it occupies a much, much, much different place in its brain, and it's a much more limited place in its brain. So, this is what this thing called a context window is, how much additional information can you make available to that big brain? Remember, the big brain contains almost everything there is to know about humanity whenever they finish training it. Anything worth knowing, anyway. This small amount of information, by comparison, that you're about to feed it, you're expecting it to know it as well, and it doesn't. In fact, it has a very, very, very small context window of what it's able to eat of additional information. And then, they measure this in tokens, which because of course they do. Of course we measure something in a unit of measurement that we don't know, it's like measuring it in bananas.
Justin Mannhardt (03:10): It's like a child's arcade, how many tokens do you have?
Rob Collie (03:14): How many bananas did the Titanic way? So, a token is like a fraction of a word, it rounds to like two-thirds or three-quarters of a word or something like that. So, 128,000 tokens is a common limit amongst all of these systems, and it's not very much, like a half a meg of text, half a meg of text can easily weigh in close to a million tokens. So, half a meg of text can be six to eight times too big. And you're laughing, what are you laughing about?
Justin Mannhardt (03:48): Well, I couldn't resist, I asked the big brain in the sky how many bananas the Titanic weighed.
Rob Collie (03:56): And?
Justin Mannhardt (03:57): 435 million bananas.
Rob Collie (04:00): Of course. So, it's a tiny amount of information before it says, whoa, too much. This token limit represents like a sound barrier, except that unlike the sound barrier, you really can't break the token barrier, all you can do is work around it. There's so many different strategies you can take to try to turn your problem into a smaller problem that fits into the token window, fits into the context window. So, the token limit and the context window are kind of the same thing. The context window size is determined by the token limit. Gemini, Google Gemini has claimed to have a 1 million token limit window, so eight times the size of ChatGPT's token limit. And that's still not very big, it's half a megatext-ish.
Justin Mannhardt (04:46): But it's a million.
Rob Collie (04:47): A half a megatext is, for internal business documents and stuff, is not very much.
Justin Mannhardt (04:53): Right.
Rob Collie (04:54): And of course, a document might have a bunch of pictures in it and things like that and actually be a lot more than half a megabyte when it's digested becomes less than half a megabyte of tokens because it's really just sort of extracting the meaning from everything. It scans the image, extracts the meaning from the image-
Justin Mannhardt (05:14): Converts it to a series of numbers effectively. Yeah.
Rob Collie (05:18): There's a lot of compression that can happen in the size of a file, the weight of the words, the weight of the meaning in the file, when that piles up to be half a megabyte... So in the case of podcast transcripts, it's like all text, right? You're not going to get as much compression into the token window, like half a megabyte of raw text is a lot more tokens than a half a megabyte of a formatted Word doc with images, because half a megabyte of text contains more text, more meaning. But look this up really quick, how many tokens are the complete works of Shakespeare?
Justin Mannhardt (05:50): Approximately 1.2 million.
Rob Collie (05:52): Okay. Crazy, right?
Justin Mannhardt (05:54): Right.
Rob Collie (05:54): I'm talking about a half a megabyte of text being close to a million tokens. Well, the entire works of Shakespeare in text are about that size. So, a million tokens is either a lot, it's the entire works of Shakespeare, or it's not very much when you think about the amount of information stored on your company's internet. And it's certainly nothing compared to everything that ChatGPT, or Claude, or Gemini knows intrinsically because it was trained on that, I keep using the history of the Roman Empire as an example.
Justin Mannhardt (06:27): ChatGPT said that the complete works of Shakespeare, depending on the edition, contained between 900,000 and 950,000 words, and it had this math of one token equaling three quarters of a word.
Rob Collie (06:42): So, there you have it, right? In theory, the entire works of Shakespeare should almost fit into Gemini's context window. So, if you're talking about custom instructions to an AI, the entire works of Shakespeare is a lot of instruction. But in terms of becoming a historian about the history of your company, and being able to answer all kinds of questions about all the things that ever happened to your company, and reasoning across and finding patterns and finding themes and everything, a million tokens is nothing. So, it's kind of both everything and nothing in a way. So, Brian Julius, friend of the podcast, friend of P3, Brian Julius, sent me a video yesterday... It just had me smiling ear to ear. We'll link it in the show notes. It's a video from AI News and Strategy Daily. It's this guy who seems very smart, he's saying that the million token context window from Gemini is a lie.
Justin Mannhardt (07:39): How dare they?
Rob Collie (07:41): I know, right? So, he's saying that if I give a bunch of context to one of these models, and I exceed their token limit, it's going to tell me that I've exceeded the token limit. Say, bah, try again. In a way that's almost like an arbitrary limit that they set. At what point when we're ingesting a bunch of information, do we give an error message? But nothing stops them from taking your "million tokens" and not remembering most of them. In practice, what he's saying on this video is that in terms of its practical ability to reason over what you've given it, it's actually still just close to 128K tokens.
Justin Mannhardt (08:21): 12%?
Rob Collie (08:22): Yeah. And it really, really, really puts a lot of emphasis on what's at the beginning of the stuff that you gave it and what's at the end of the stuff that you gave it, and forgets the middle. Just like me writing a book report in high school.
Justin Mannhardt (08:35): Skip to end.
Rob Collie (08:36): I literally did that one time, I read the first chapter and the last chapter of a long biography, grabbed another book, read the timeline of that person's life off of the back jacket, and used that to write a book report that got me an A+. That's what Google Gemini is doing with your million tokens
Justin Mannhardt (08:51): And now, the education system is very upset that you can just do the same thing with AI.
Rob Collie (08:57): Yeah, in my day we cheated the honest way.
Justin Mannhardt (09:00): We've been cheating the whole time.
Rob Collie (09:04): That was in 10th grade that I did that, or maybe it was ninth grade, and the teacher, when he was handing the papers back, the book reports, he stopped in the middle of class and held my book report up, and I'm like, oh no, here it comes.
Justin Mannhardt (09:17): [inaudible 00:09:20]
Rob Collie (09:19): He's like, "I want to tell you about Rob's book report." I'm like, and then he said, "This is the single best book report I have ever read in all of my years as a teacher." Even 10th grade me or ninth grade me was feeling a little bit bad, this is actually really reassuring in a way.
Justin Mannhardt (09:41): That Gemini is allegedly doing this thing?
Rob Collie (09:45): Because what it does point to, and at the end of this video, this gentleman gets pretty deep into it in a way that I dig. Here's an instinct that I've had for a while without bothering to verify it, and he verified it, I don't know if his opinion counts as full verification, I'm going to treat it as such for the moment. So, I have had the instinct from my days as a trained computer scientist back in college, I've had the instinct for a while that expanding these context windows is difficult. I have believed intuitively that a 128,000 token window, if you double it to 256,000 tokens, my instinct has been that more than doubles the strain on the AI system, like the computational strain. Okay?
(10:35): Hey folks, I'm going to jump in here with a quick post-production note, you're about to hear me use a number, 4,096, which is actually incorrect. The real number is 64, and the original video I watched on YouTube, at one point he did say to the fourth power, but he was also using this word quadratic, which of course sounds like fourth power. After we recorded this episode, this was bothering me, quadratic. I'm like, yeah, that means squared, doesn't it? So, I went back and double checked, and in fact, for a context window that is eight times the size, it will cost basically 64 times the processing power, 64 times the electricity to make that context window work. Not 4,096.
(11:12): But still I think the core point stands, 64 times the processing power is just prohibitively expensive. These context windows, at least as they're currently designed, as these algorithms currently work, are just simply not going to become gigantically large in anytime in the near future, barring some sort of new breakthrough. In this video, he says it actually is to the fourth power. So, if Google's million token limit 8X, the 128, is real, then it's more than 4,000 times its computationally... Eight to the fourth power is 4,096. It's 4,096 times more computationally expensive to operate over that eight times sized context window.
Justin Mannhardt (11:59): That makes sense because it's all, and this is an oversimplification, but it's multiplication tables under the hood.
Rob Collie (12:06): Right. And so, I would never have dared suggest that it was to the fourth power. I would've been happy with my prediction if it had been squared, if it had been 64 times as computationally expensive. But 4,096. Okay, why is this important? It's important because we're not going to achieve with the current trajectory of models on the current lineage, the current DNA of all these AI models, we are not going to achieve anything resembling a breakthrough in context window size. 4,096 means 4,000 times the electricity, which is just non-negotiable in cost.
Justin Mannhardt (12:51): Now we're into physics.
Rob Collie (12:53): Exactly. And he even says that in the video, he says, we're into physics. Guess what? Do you want to pay in time? Do you want to pay in power? And either way, you're going to pay in money. And so, he gives this very, very telling status. He's like, I have yet to find one of these AI models that is capable of deep reasoning across a 100-page document. A 100-page document is too much for it to absorb and reason over with the same confidence that it does the stuff that it has in its intrinsic memory, like the Roman Empire.
Justin Mannhardt (13:30): Wow, that's not very big.
Rob Collie (13:32): No. And that's not likely to change by leaps and bounds. We've had this breathless sense of the advances in this stuff. Okay, so why is this good news? It's good news for a number of reasons. Number one, unless there is a completely new direction, a new breakthrough, the likes of which we have not seen since ChatGPT knocked the world on its butt. He's very clear about this, this context window problem has existed from the beginning, it's not new, and it's not going away. Everything that we've been breathlessly excited about isn't going to solve this problem. It also really blows you away, put into focus how impressive the human brain is. Because the human brain can reason over 100-page document. We can't remember word for word every page like the AI might, but we can reason over a corpus of knowledge that is much larger than these things is able to reason over, we can be introduced to it a new, and learn it, and reason over it, which these things can't do.
(14:34): That means that we're not about to fall off some cliff where all of us get replaced, the feeling of breakneck pace of breakthroughs and all that kind of stuff, we've been having this intuition that has been plateauing a bit. Well, okay, there is a very, very, very real limit here, that again, everything along the current genealogy of models, GPU-based models, all this kind of stuff, it is not likely to work around. It's not just going to keep advancing and solve this problem, it needs a completely new direction to be discovered.
Justin Mannhardt (15:06): According to ChatGPT, ChatGPT is our official producer in the other booth, and we're just, hey, can you look this up for us?
Rob Collie (15:12): It turns out our actual producer is on PTO, and has been for a couple of weeks. Yes, we have replaced... Luke, I'm sorry.
Justin Mannhardt (15:18): You've been replaced.
Rob Collie (15:22): You've been replaced by ChatGPT.
Justin Mannhardt (15:24): But I was just curious because, okay, so getting to the million token is effectively a 4,000X problem of compute, I asked it what's sort of the typical improvement in GPU performance iteration over iteration, if this is right, it just kind of paints this picture, it's says, "Modern GPUs typically improve two to 3X in peak compute performance every one and a half to two years." We can go verify that, but it's not 4,000X.
Rob Collie (15:53): That's right. Yeah.
Justin Mannhardt (15:54): And Sam Altman hasn't necessarily hid from this either, he wants a bazillion dollars to invest in chip research and stuff like that.
Rob Collie (16:03): You could also just end that sentence in Sam Altman wants a bazillion dollars.
Justin Mannhardt (16:07): A bagazillion. A bagazillion.
Rob Collie (16:10): A bagazillion?
Justin Mannhardt (16:11): That's right.
Rob Collie (16:12): Oh, okay. Damn, I didn't know he wanted that much.
Justin Mannhardt (16:19): It's a lot.
Rob Collie (16:20): I would settle personally for a gazillion, why does he got to be so greedy? So, there's sort of, I think three reasons why this is very reassuring, at least three. Number one, the replacing everybody thing on this trajectory seems like bullshit. And that's the first time I've ever been able to sort of confidently say that. Again, I want to take credit for my intuition, I've been having this feeling that this context window was sort of like a limit of the universe type of thing, and those dirty liars, they put something out that's 4,000 times as expensive and claimed that it works. It does not. Okay, that's one reason. Second reason, creative and thoughtful strategies about how you surface information to these tools, to these AI models, in a way that's respectful of its actual context limit, whether it tells you it's okay or not, keeping it under that limit... Brian calls this the conveyor belt. At any point in time, if it's reasoning over a reasonable amount of information, you can get somewhere, you can't give it all at once.
(17:24): Finding ways to optimize around this token limit is a very data style problem that anyone who works in data today, okay, it's different, because a lot of times we're dealing with raw text, we're not dealing necessarily with structured data. In fact, structured databases is like the last thing you want to feed to an AI, you want the AI to help you write queries over that, but you do not want to feed structured data to the AI. It's unstructured context, documents and history and chat logs and whatever... If you treat that stuff as data for a moment, there are ways in which to optimize the way in which you expose it so that you're not flooding and overwhelming the context window at any particular point in time. And there's plenty of optimization to be done there, so that's good.
(18:10): A third reason that I think it's really good is that it kind of shows that things are settling down where investments and thinking that you make today aren't going to be wasted. There's at least a pause in the breakneck pace of all of this stuff in which we can actually go and get our hands around it and build practical solutions and all of that. And then I think the fourth reason to be excited about this is that even with these limitations, this stuff is still magic. Something that is magic and requires a little bit of thinking to get the most out of it, and isn't going to replace everyone, it's kind of like the best case scenario. It's like the Goldilocks situation here. And I'm pumped, I'm really stoked, both by this and also just by my own recent progress with some of these tools, some of the things I've been able to build, getting closer and closer to having new friends at work, that are electronic friends, that help me do things that no human could ever help me do. Unbelievable. That was a real long Rob Collie monologue.
Justin Mannhardt (19:21): I like it. As far as Rob Collie monologues go, it's on the list. It's on some sort of top X list.
Rob Collie (19:28): It is on the list.
Justin Mannhardt (19:30): The list.
Rob Collie (19:31): The list.
Justin Mannhardt (19:33): Yeah, I also was pretty surprised to hear the 4,000X computational problem.
Rob Collie (19:38): Yeah.
Justin Mannhardt (19:39): Because for me, it put a bold button on something that I believe was true, which is a real breakthrough advancement to what Altman might call super intelligence, or what we use, the term AGI, was essentially a hardware problem, it was a physics problem, it was an electricity problem, it wasn't a machine learning problem, it wasn't a data problem, it wasn't a training problem... It seemed unlikely that there would be a software-oriented breakthrough that would get us over the hump. So, it's reassuring to be like, okay, that makes sense logically to me, and it validates some of our earliest thinking that we had when we would talk about AI on the show, about things like human in the loop, and a conductor. And I think all that's still relevant, there's a spectrum there, but I was talking with Kellen the other day, we've been working on something internally, and it's required him to write new database code and things like that, and even with that project you hear all the time, it absolutely crushes writing code.
(20:47): He hit a limit, and he had to get in there and figure it out on his own. And so, the digital employees or the agents, having something you can go to, and have it perform a type of task or interaction with you, and having something that's also self-aware in a way. Imagine for some reason I became an AI agent with today's technology. Now, all of a sudden I would be not likely to ever proactively reach out to you, Rob, about anything because you would need to initiate with me. Unless you had programmed me to do something. There's a long ways to go before we can truly drop in, replace a person. We can replace a lot of functions and replace a lot of capability, but...
Rob Collie (21:40): I have been "building" an agent recently. First as a custom GPT, which is really easy it turns out. And kind of fun. And then trying to package that up as a Slack agent, that you can talk to in Slack like it's another person in Slack. And I have both working, it's to help us write, not just help us write, but also help us brainstorm marketing copy. Marketing copy that aligns with who we are, that aligns with our voice, that aligns with our values, that aligns with our opinion, and our perspective on the world, and I've been reasonably successful at getting this agent to do that. Out of every 10 ideas it comes up with, six of them might not be very good, but four of them are excellent.
Justin Mannhardt (22:28): And honestly, that's pretty much on par with what you'd get out of a good employee doing that type of work.
Rob Collie (22:35): And in fact, it was kind of interesting, all weekend long being in this mode of asking Griff for ideas, and becoming very, very okay with just focusing on the good results, ignoring the bad ones, very take it or leave it mentality. We'll protect this person's identity, but one of our colleagues at the company started texting me a whole bunch of ideas during a party, not the most focused mindset. Some of the ideas being texted to me were good, some of them were bad, and I had the same sort of mentality, and it really helped me be patient. I like that one, no, I don't like that one. It's okay, right? It sort of pre-trained me. There's more to come there, I'm not close to maxing out that agent's context window, it's 128,000 token limit, far more than I need. Because right now I'm only using 6,000 characters, not 6,000 words, probably like 1200 words, and 128,000 tokens is probably 90,000 words or something on that order.
(23:37): It's just been given high level, but specific information, basically a 1200 word essay about what it means to be us. What it means to talk like us, what we believe, what our stance is, and what we believe in, and all that kind of stuff. And 1200 words on that is quite a bit. In terms of feeding it lots of examples of things we liked and didn't like, like real feedback, persistent feedback training hasn't been part of this yet. By the end of this quarter, we, P3 Adaptive will have at least two internal agentic AI applications, but really think of them as digital colleagues, that help us with our business internally, and we are going to hold ourselves to the standard of, we look at this thing that we've been working with and say, I can't believe we ever lived without it.
(24:29): We are going to do this for ourselves, and hold ourselves to that standard so that when we're helping our clients with it, we can maintain that same standard. With a frontier technology like this, the only responsible place to experiment at that level is on yourself, and man, am I here for it. I'm in.
Justin Mannhardt (24:50): We talk about the dualities in situations a lot, it's true that there needs to be a ton of advancement to get to a place where we could honestly say, oh, there's a lot of human job replacement going to happen at massive scale, and at the same time, we're very close to having these digital employee solutions that we could not have imagined not having.
Rob Collie (25:17): If their lane is relatively narrow-
Justin Mannhardt (25:19): That's right.
Rob Collie (25:19): ... they can actually be amazing. This Griff agent, with 1200 words of instruction, is honestly already better at brainstorming ideas about how we describe ourselves than any of us are.
Justin Mannhardt (25:34): And it's tireless. You were saying earlier, you might ask for 10 ideas and you'll like four of them, or three of them, if you don't like any of them, you just say, I don't like any of these, let's do another 10. It will go, and it'll go, it'll go, and go... Imagine if I was trying to do that for you, give you ideas, and maybe I could survive the first round of these, Justin, I don't like any of these. Okay, give me some feedback, what don't you like? Okay, let me try again. If I get to the second time and you're like, I don't like any of these, now my emotional part of my human substance kicks in.
Rob Collie (26:11): That's right. And honestly, in that situation, I probably quit before you do, because I feel bad for dragging you through the mud. Now, I have decided, I've just decided to be completely comfortable with the idea of being polite talking to Griff. I use the word please.
Justin Mannhardt (26:27): It's weird.
Rob Collie (26:28): Every now, and then I'll circle back and just say thank you. This does go hand in hand with the research that suggests that people who view it as a tool are less successful with any given AI than those who view it as a colleague. A colleague, you will give feedback to, a tireless colleague is not something we're used to.
Justin Mannhardt (26:49): Not at all.
Rob Collie (26:50): All right, so two internal AI agents, digital colleagues, one of which is Griff. Griff is not yet to the level where I would say, I can't believe we live without this. But we are going to get it there, and it might not be that long. It might be three weeks.
Justin Mannhardt (27:07): One of the first iterations you were sharing with people I think you had a couple iterations before you got to that point. I used it right away for something real. Anytime I have to write something, a document or a communication or something, 10 times out of 10 I'm using AI at some point in this process. Usually what I do is I'll sort of word salad my brain dump, and then I'll feed it to Copilot or ChatGPT and say, okay, I need this polished up. And I used Griff to help me with an exercise like that, and I was like, oh, and this is so much better because it's like on P3 tone and style, and I got there faster... Eventually, you and I always get there with things we write to send to our team, sometimes we go back and forth quite a bit to feel good about it.
Rob Collie (27:51): Well, I had the luxury of that document, of you producing the majority of it, and then a couple of sections that I wanted to rewrite, I went and rewrote them manually. We're kind of hitting a slowdown, it's good news for many reasons, and even in the context of the slowdown, some really, really crazy cool second, third, fourth, fifth, dedicated brains totally within reach. It's a level of magic that we can deal with, that we can work with.
Justin Mannhardt (28:20): I think of it this way, if the slowdown, if we use that term, is an opportunity for all of us to really get our arms around this stuff, maybe we're not going to be constantly trying to catch it.
Rob Collie (28:33): Yeah. I'm excited about this stuff in a way that I haven't been excited about tech in a while. I'm clearly super, super excited based on some past episodes, about this whole chat with data experience, like the end user Copilot over the top of existing models and reports, that is just... If that was all AI was, I would still be thrilled, but this agentic stuff, like this customizable digital colleague stuff, especially now that I think it's going to stay contained, it's just so much easier to emotionally get close to it when you know it's not about to eat you tomorrow.
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.
Subscribe on your favorite platform.