headshot

Me:  “Can you PLEASE stop answering the question before I even ask it?  It’s kinda freaking me out.”

Jamie:  “I knew you were going to say that.”

It is a good time to be an Excel pro.  The world is waking up to the fact that Excel pros pretty much run things.  Some of the most visionary work being done in the software industry is targeting Excel pros as a result – PowerPivot is of course the #1 example.

True story:  near the end of the PowerPivot project, a faction within the team splintered off to form a new company.  Their goal was to build a brand-new class of tool for Excel users – one that doesn’t just crunch numbers, but that can predict them.

That new company is Predixion, and Jamie MacLennan is their CTO (but I like to think of him as “Leader of the PowerPivot Rogue Faction.”)  When PowerPivot was getting underway, Jamie was the manager of the entire programming team.

Brief Aside:  A Peek Inside Jamie’s Mind

My favorite Jamie story:  every year, Jamie would get up in front of the entire Analysis Services team and deliver an hour-long presentation on the employee annual review system at Microsoft.  He would explain in great detail EXACTLY how everything worked behind the scenes.

In short, he basically told all of the secrets.  Not official secrets, because the review model was never officially secret.  But in practice, only the managers knew how it worked.  And by pulling back the curtain on all of that, he helped all of the individual employees greatly, while making the jobs of every manager MUCH harder.

No one at Microsoft EVER did that, but Jamie did it every year.  And no one could stop him.  No one could even openly acknowledge that they wanted him to stop.  Since I was a former manager myself, I knew the kind of empowering effect his presentation was having.

On to the interview!

Kasper has already posted on some of the things that Predixion can do.  Continuing on that vibe, I sat down “virtually” with Jamie to ask him some questions that I thought you might find relevant.

ROB:  OK – Predixion Software.  If I’m the kind of person who uses PowerPivot (which I am), why should I be aware of what you guys are doing?

JAMIE:  I realize I have to be careful not to answer this as why I should be excited about the PowerPivot integration, since as we discussed at length, we come from significantly different backgrounds in our way of looking at data.

So from the PowerPivot guy point of view, there are some significant things that we’re doing and planning on doing with PowerPivot that are really cool. First of all, we’re a predictive analytics vendor that really cares and really believes in the PowerPivot vision – over half of our development team, myself included, helped design and develop the product anyway – so we design and develop a predictive analytics solution that makes sense in a PowerPivot world.

Predixion Insight consumes data from PowerPivot for the purpose of predictive analytics. This means that all of the data consolidation and business modeling power that PowerPivot provides to the Excel Power User is now available to feed an equally powerful predictive analytics engine. Furthermore, PowerPivot users can use Predixion Insight to write back the predictive insights they gain into PowerPivot. This means that for the first time you can perform your BI/PowerPivot magic, sprinkle in some predictive fairy dust to enhance your data, and seamlessly continue to be a true Excel Superman by directly integrating those predictive results back into your PowerPivot application. This is what I show in our extended demo when I add predicted results to a PowerPivot table, use a PivotChart report to analyze those results, and then use Insight Now to determine the best analytical attribute, and even more apply Insight Analytics to reformat that attribute to make it useful in reporting scenarios. This is the true back-and-forth mashup of predictive and traditional analytics that we’ve been dreaming of for ages.

image

Some of the Predictive Tools in the Predixion Addin

On top of all that, Predixion’s roadmap includes deeper and more substantial integration with PowerPivot than we have even today. There are a lot of exciting things on the horizon for Predixion, and they all include PowerPivot.

ROB:  In a sense, what Excel and BI pros do every day, is “mine data.”  So how should we think of actual Data Mining tools as relates to our existing toolset – calculations, aggregations, pivots, queries, etc.?  And is there a difference between Data Mining and Predictive Analytics?

JAMIE:  This is one of those problems where the terminology has been consistently misused and morphed over time so that the label “Data Mining” doesn’t mean much of anything anymore – similar to the labels “Democrat” and “Republican”. The term “Data Mining” originally meant what we now call “Predictive Analytics”, but now has been co-opted to include anything involving looking at data including ad-hoc query, to search, to OLAP, even triggered eventing, or Amazon’s Mechanical Turk. Interestingly enough, most of the public opposition to “data mining” focuses around these interpretations around the term, and not the original intended meanings.

In hindsight, I think the term “Data Mining” really is more applicable to “what Excel and BI pros do every day” because you are semantically digging through data to find meaningful or useful tidbits. Predictive Analytics is a much better term to describe what we do because it eliminates the connotation of manual labor to accomplish information discovery. I honestly believe what we do goes beyond “predictive,” so I like to call it “predictive and behavioral analytics.” Some disagree since “behavioral” has a human-psych connotation, whereas I apply it to the behavior of the data – so maybe “predictive and descriptive analytics” is a better term. In any case it’s too long and “predictive analytics” seems to be the terminology that wins.

image
More Goodies

ROB:  I *love* that all I have to install is an addin.  I just need is my username/pwd, and the Predixion addin uses YOUR servers to do all the heavy lifting.  When I run a model using the addin, how much server horsepower am I tapping into?

JAMIE:  How much server horsepower do you want?

The beta cloud that you ran against had an array of about 10-12 4-core, 4 GB machines that we can grow or shrink dynamically based on usage patterns. The initial cloud at launch is around the same size. We’ve found that that machine configuration gives us the best bang for the buck for these types of tasks.

ROB:  Speaking of that, my 10 Million row data set uploaded fast, in about a minute.  Are you guys taking advantage of PowerPivot compression to make upload faster, too?

JAMIE:  No, even though our architect co-invented and implemented the PowerPivot compression mechanism, we aren’t using that method for our upload/download scheme. We are using different compression, as well as 128-bit encryption, of course, and it doesn’t hurt that you have a big fat internet pipe on your Pivotstream data center 🙂

ROB:  In my experience, the type of data set is important – some data sets don’t yield any new insights, whereas others reveal some pretty astounding stuff.  Can you describe the sorts of data sets we should focus on with the Predixion tools?

JAMIE:  That is an interesting question – the typical goal of the report writer or OLAP jockey is to reduce the large datasets into key numbers through massive aggregation, thus producing a view onto the data providing high level information.

However what happens with this approach is all of the marvelous gory details of the data are lost. Predictive Analytics thrives on those bits of details, so you need to come up with data sets that are a little different than what you would use for slicing and dicing.

Typically the problem is with the granularity of the data. For example, if you were looking at records around medical prescription data that was summarized by zip code, the best you could determine is patterns at the zip code level. You’re not likely to find actionable information an individual doctor could apply for treatment. Whereas for aggregate type of analysis, details are the devil, for predictive analytics, the devil’s in the details. I don’t want to make a blanket statement that you always want the lowest level of granularity, but you have to make sure you have the appropriate granularity to solve your problem.

One example that comes to mind is a cancer study where a variety of tumor measurements were taken for both benign and malignant tumors. For each aspect, such as diameter, several measurements were taken. Since the final analysis was at the tumor level, the individual measurements of the tumor weren’t interesting – there’s no particular reason to believe that the first diameter measurement has any different importance than the second – so aggregate values of the measurement were used. In this particular case, the researcher took the average of all measurements, the deviation of the measurements, that is, a measurement of the variety of results, and the most extreme measurement.

So, in summary, I would say that detailed data is much more important for predictive analytics than for traditional analytics, but the answer really lies in understanding what you’re analyzing and the answers you’re trying to get. Predictive analytics is a slightly different discipline, but in reality, if you understand your data, it’s really just a small mental hop away from where you’ve been working all the time.

ROB:  What’s it like at Predixion Software on an average day?  Does the vibe feel like Bungie Studios, part two?

JAMIE:  Yes! Definitely! We even have a life-size replica of a rules engine in our lobby – no, wait – that’s the “No solicitors” sign…. Actually a typical Predixion Software day is a lot of fun. I’ve worked with and have known most of the development team for many years so verbal and mental bandwidth is very high – we can spin on ideas extremely quickly without having to get bogged down in procedures because we all know where each other is coming from and we deeply understand each other’s capabilities. We moved out of a shared office space into a private space a little over a month ago so ad-hoc communication is simply shouting between offices – it’s actually very reminiscent of the early Analysis Services days when discussions were loud and aggressive and decisions were made quickly and acted upon immediately. No hurt feelings – just common goals and shared responsibilities driving the company.

That, and punctuating the day with coffee, pinball, and nearby parks and restaurants makes Predixion the best place I’ve worked so far.

[PDT[4].png]

The Official Programming Pose of Predixion Software

ROB:  I’ve heard you have some pretty interesting early adopters.  What can you tell me about who they are and how they are using Predixion?

JAMIE:  Well, I can’t say who they are, but there has been wide industry representation in our early adopters. Along with over 50 consulting firms, we’ve had telco, retail, law enforcement, manufacturing, finance, and healthcare customers all participate in our beta.

ROB:  What is the most useful, surprising, etc. thing you’ve ever found with Predixion?

JAMIE:  Once I performed a segmentation analysis on a large corporation’s internal employee satisfaction data and a name popped up in the results, and not in a happy way. Usually these types of analysis are particularly generalized and individual manager names would be random to the overall trends and anonymous, so it was pretty surprising to see– there had to be a lot of employees indicating this individual for their name to show up like this. To validate these results I performed some classification analysis and this name showed up again as a top-level indicator for employee satisfaction. (As a side note, this kind of data analysis is what I would refer to more as “behavioral” rather than “predictive” since I wasn’t trying to predict anything, rather just seeing the pictures that can be painted through data analysis).

It turned out through examining surrounding circumstances that there were compensation issues that were impacting employee satisfaction in the region under the manager that weren’t represented in the data – when these factors were added to the data, the manager’s name fell out and the true satisfaction issues could be addressed. In the end by using predictive (and behavioral) analytics the corporation was able to find and address a key driver in employee dissatisfaction.

JAMIE:  That’s all the questions you have, isn’t it Rob?

ROB:  Showoff.