What We Learned Building Before Buying
On the surface, it sounds a little reckless: why build your own AI data preview when Microsoft and others are already weaving Copilot into everything? Feels like building your own toaster while standing in an appliance store.
But vendor AI isn’t fully baked yet. Copilot today can be brilliant one minute and confidently wrong the next. Ask it why pivot tables were named that way, and it’ll spin a backstory full of imaginary campaigns and protests. Amusing? Sure. Reliable? Not yet.
That’s why we started experimenting. Not to replace Copilot, but to build a prototype—a homemade preview that let us ask questions of our own data, find the boundaries of “safe” versus “hallucination,” and learn faster than waiting on product updates.
The Technical Approach: What “Prototyping Our Own” Actually Means
This wasn’t about reinventing enterprise software. It was about wiring just enough plumbing to see what would happen.
- MCP servers as connectors. We plugged language models directly into our semantic models. No dashboard middlemen, no filter confusion.
- Bypassing the report layer. Reports are fine for humans but maddening for AI. We cut out the noise and let the models hit the source.
- Multi-LLM testing. OpenAI, Claude, Gemini—each with different quirks. Running them side by side gave us perspective you can’t get from a single tool.
- Safe sandboxes. Experiments stayed in zones where a bad answer was a data point, not a disaster.
So “prototyping our own” meant creating a controlled environment where different LLMs could query real semantic models and show us what worked, what broke, and what looked promising.
What We Discovered
Some things just worked. “What were sales last quarter by region?” landed clean almost every time. Others fell apart fast. “Since we changed the pricing model, which customers shifted tiers?” led to hallucinations and wild guesses.
Key lessons:
- AI handles structure well. Straightforward aggregations? Easy wins.
- Context is harder. Business rules and time-based logic still cause stumbles.
- Borders are real. The value isn’t making AI answer everything—it’s knowing where it’s trustworthy and growing from there.
One pleasant surprise: semantic models traveled well. Once measures and relationships were defined properly, multiple LLMs could make sense of them without extra handholding. That’s a quiet endorsement of investing in the model layer.
Practical Next Steps
So when’s the right time to experiment versus wait?
- Experiment now when the questions are repetitive, structured, and low-risk. That’s where you’ll build team literacy fast.
- Wait when workflows are mission-critical and the cost of error is high. Guardrails matter there.
Either way, the lesson is clear: your data foundation sets the ceiling. Weak models equal weak answers, no matter how advanced the AI sitting on top.
The Bigger Picture
Semantic models are enjoying a renaissance. Once seen as a BI best practice, they’re now the bridge between structured data and AI’s conversational layer. They give large language models something solid to stand on.
That bridge is where the real convergence is happening: structured numbers, unstructured text, and AI reasoning, all in the same workflow. Prototyping your own AI preview isn’t rebellion—it’s rehearsal. It’s the fastest way to be fluent before vendor tools fully catch up.
🎧 Want the unfiltered version, straight from the conversation? Catch the podcast episode here.
Get in touch with a P3 team member