Datamining using PowerPivot and Predixion Insight

Since this week the public beta of Predixion Software’s Data mining in the cloud for Excel is available. Those of you who are familiar with the the Microsoft SSAS Data mining Add-ins should be very comfortable with what is inside Predixion Data mining for Excel.  I have done a previous blog post on doing data mining using PowerPivot with the MS data mining add-in where you can see how it currently works .

Predixion Insight for Excel is like a new version of the current SSAS add-in, the Predixion insight team consists of the folks that previously build the Add-in for MS and now started on their own.

The biggest change is that you no longer need an SSAS server installed. All action happens on the Predixion servers in the cloud. Second biggest (for me) is that you can use PowerPivot data as a datasource for you Data mining. Using it in combination with PowerPivot requires nothing more then Excel and a Predixion subscription for data mining. Furthermore the overal UI had been improved to make data mining a more user friendly experience. And it support 64 bits.

From the Predixion site:

Predixion’s intuitive and easy-to-use solution allows users to run predictive analytics in the familiar environments of Microsoft Excel® and PowerPivot. Whether you are an existing SQL Server® Data Mining user, a BI specialist or a newcomer to the arena of Predictive Analytics, Predixion Insight™ will enable you to easily create, manage and run powerful and accurate predictive models without extensive training or specific knowledge of the methodologies currently required to create successful predictive projects.

In this blog post we are going to see what are the key influencer are of the number of items on stock from the Contoso sample database.

First we need to install the Predixion Insight for Excel, just run setup and the client will be installed within Excel. Next time you open Excel the client will be there. We have two tabs “Insight analytics”:

and “Insight Now”:

The “Insight analytics” tab is mainly for the advanced data mining,the insight now enables you to get started immediately. Before we can do anything we need to connect to the predixion servers with the account we created on the website:

After logging into the Predixion cloud service we can start data mining. I have loaded information from my datawarehouse in PowerPivot for Excel, i have information about my stock. I have loaded the fact table FactInventory that contains the actual nr of stock, this contains 8 million rows . The fact table is related to a lot of descriptive tables that surround the fact table, called the dimension tables. I have loaded a few of these descriptive tables into PowerPivot as well. What do we know about an item that is on stock:

  • When was it on stock? Year/month/day
  • What Product?
  • What Productcost
  • Aging of a product in inventory
  • The Country of the store it is in.

Of all these properties we want to know what influences the nr of days in stock the most. For this i want to use the “Analyze key influencers”  function. So i click on it.

This gives us a screen where i can select what my source is, Excel or PowerPivot, I select PowerPivot. Now i can select what table i want to analyze, i select the fact table. We could place filters here but i decided to plague the Predixion server all out with my full 8 million rows :).

Next we can select the column we want to we want to determine the key influencers for:

Of course we don’t need all the columns to be analyzed, we can select the columns we want to include in our analysis:

And this is where we notice something not right. As you can see we can select DateKey, StoreKey, ProductKey. But when we analyze this it would analyze this as a Key value, instead of Year 2009 it would test for the value 1-1-2009 and Store “Amsterdam” it would check as Integer 12. So we need to do something first, we need to prepare our PowerPivot table so that it contains descriptive values.  Luckily for us this is not that hard, just add a column in the PowerPivot field window using the =RELATED function:

Now we can select these columns in the data mining add in:

Now we are good to go, just click Run and the data mining will be started.

The great thing here is that everything happens on the server, i can start multiple operations at the same time. And of course it being in the cloud i can open this up on another machine and immediately access the results.

One thing i noted is that the information is send to the cloud through an encrypted tunnel so no worry your data can be read while sniffing your network.

When i click on Minimize to Task pane you will see a new Predixion pane will show up where you can see all your tasks:

As you can see i ran this demo before so i can use these results to show the result of the mining Predixion did, just click “Results” and the report below appears:

As you can see it is pretty easy to combine the information you have in PowerPivot with the enormous powers of data mining. The new user interface and the availability of the Predixion servers in the cloud really make data mining available for anyone. Just as PowerPivot makes data analytics available for everyone. The Predixion Insight for Excel works with Excel 2007 and Excel 2010 32 AND 64 bit, of course PowerPivot won’t be available with Excel 2007.

Predixion Insight is also working on a on-premise and dedicated off-site cloud solution which leverages SQL Server, SSAS and SharePoint which they call Enterprise Insight.