I’m currently talking to a client who is very charmed by the possibilities of PowerPivot to analyze data, one thing led to another and we came to data mining .. I got thinking wouldn’t it be GREAT if we could use the Excel data mining add-in on PowerPivot data (with DAX at our disposal). After some sparring with Rob Collie I found a way! Which I am going to show to you today in this screencast.
Think about the possibilities we have, make some great calculations with DAX to make a calculated column or measures and then unleash the power of the data mining add ins !
A few prerequisites: since the data mining add-ins are 32-bits you have to install the 32 bits office 2010 with 32 bits PowerPivot. I hope they launch a 64 bits Excel 2010 version soon!
You have to have a SSAS (2005/2008 /2008R2) instance installed on your network, not the in memory PowerPivot instance but a regular version.
I used this video to guide me in the demo’s Introducing the Table Analysis Tool for Excel 2007. You can download the Excel data mining add-ins here (more videos in there to see the possibilities).
UPDATE: The SQL Server Data Mining team is working to extend the power and ease of use of SQL Server Data Mining to the Cloud. You can use this add-in to run datamining against a service in the cloud, you don’t need a SSAS instance installed ! Check it out on this page:http://www.sqlserverdatamining.com/cloud/
Aaaah, here is the screencast already 🙂 Thank you so much, great work Kasper !
Thanks for presenting on this topic. I am a consultant with statistics teaching experience, and I have presented on the topic of using Excel for Microsoft Data Mining. My web link has references to some of my past presentations. I believe we need more presentations on data mining, since some people are confused about the mechanics, and even more are confused about creatively determining a data analysis strategy.
In this post, I will make specific comments to the excellent video (thanks for producing it)
1:08 I like the use of DAX to create a calculated column. In this case, the goal was to make a binary variable to apply logistic regression. However, another approach would be to put the number of bikes into ranges and apply a decision tree to understand the different groups (instead of choosing a nominal cutoff value of 9). Any bike represents a sale to Adventure Works.
2:44 I noticed on the right the panel was labeled “Gemini Task Pane” — that name might be leftover from the demo days, but perhaps it is intentional, I do not know. I would expect it to be named “PowerPivot Task Pane”.
5:38 You commented on a married woman who made a lot of money: “Don’t want to be rude, but the tool says it is not OK”. Statistics and data mining do not imply value judgments, such as “not OK”. People, however, apply interpretations based on their past experiences. I believe it is a good thing that a married woman can make more money than her peers in this data sample. That exception provides an organization (Adventure Works) with insight that income is not likely to be an issue for this specific customer. Using the exception analysis can provide salespeople with insights into who is either above or below their peer group on individual variables. It is accurate, therefore, and value-neutral to call values in the bright yellow as “exceptional”.
8:45 The graph demonstrates that the forecasting algorithm bases its decisions more heavily on the more recent data points. How far to go back depends on several factors: 1) what is the purpose of the forecast, and 2) what variability or stability patterns show in the past data. When I was a graduate student at GA Tech, time series analysis was an entire class because there are subtleties which even the SQL Server tools allows an Excel analyst to surface (through parameter setting). On this topic, the more people can learn about it, the better their results and interpretation can be.
9:12 You mentioned about the difficulty in typically using data mining. Mechanically, some have yet to feel comfortable with the ribbon (or have not used it yet), and I would encourage that group to jump into Office 2007 or 2010. I have believed that the leverage of the Microsoft tools pairs accessible data mining (particularly through Excel) with a world-class database. There are other free tools on the market which perform data mining. However, many people would not know the names of these alternative choices. On the other hand, Microsoft has earned its brand recognition of the Excel product, which is a platform for showcasing newer technologies like PowerPivot, DAX, and SQL Server Data Mining. The true leverage will come in corporate environments who have to have large volumes of data analyzed by a number of skilled analysts, a task which all the open source data mining solutions struggle with.
I encourage your readers to see my free data mining resource at http://www.marktab.net. Also, I have produced several YouTube videos illustrating data mining at:
http://www.youtube.com/marktabnet