Ok truth be told I am a big time Lakers fan and this season is a big disappointment for me, one of the things that irks me is that the Lakers are being paid the most of all the teams in NBA but lack the results. In this blog post I am going to compare the winning pct per team per dollar paid so we can see the results.
First I want to import the data that I need. I am going to use Data Explorer to import the data into Excel and then I am going to use PowerPivot to analyze the results.
First data source that I am going to use is http://espn.go.com/nba/salaries/_/page/1
this I want to combine with http://espn.go.com/nba/standings/_/group/1
I start by importing the standings, go to Excel and the Data Explorer tab and select “from web”
Paste in the URL and select Table 0
Now I want to remove the first row so I can select Remove Top Rows and select the first row and I also select Use first Row as Headers:
Now I am only interested in the Team name and win – loss pct so I am removing all other rows by selecting Hide Columns:”
Now I want to clean up the Team name to remove unwanted characters:
I select Replace Values and remove y – and x- with an empty value.
Ok I am happy with the result and I press Done to import the data into Excel:
I want to rename the Query to NBA Standings and Load it to the Data model.
Next up is import a list of all the players and their Salaries. Again the same thing Import from web, this time as is:
I press OK to import and rename the Table:
Now we only see 40 players here as this webpage we import from actually has multiple pages. The cool thing about Data Explorer is that under the covers its is a special language called M. What we are going to do now is make a function out of the import function that we can call iteratively for each page.
First I need to make sure I can actually edit queries, I go to the Data Explorer tab and select “Enable Advanced Query Editing”:
Now I go back to the table I just imported and click on “Filter & Shape” and click on this little “script” icon that now appears:”
This pops open the script dialog:
I change the script to the following:
(page) => let
Source = Web.Page(Web.Contents(“http://espn.go.com/nba/salaries/_/page/” & Number.ToText(page))),
Data0 = Source{0}[Data]
in
Data0
This will add a empty sheet to excel with just this function, nothing can be shown on the table but it needs to be there, I also renamed the table to Query.
Ok now I want to use that function for all 14 pages. I go to Data Explorer and select import from other sources, “Write Query”.
This opens a blank query window:
Here I paste in the following:
let
Source = Table.Combine(List.Transform({1..14}, Query))
in
Source
this will combine the results of a the query function executed from 1 to 14
Press OK and the all data (527 rows instead of 40) will be added to the sheet:
Pretty awesome right ? Now lets add this to the model as well by clicking load to data model:
Quick tip, don’t rename the tables. This doesn’t work in the current build of Data Explore and will screw up the connection to the model.
Now I want to create a relationship but we have problem, how do I relate:
with
We need to make the values in the column the same. I will plan to extract the city from team in dataset 2 to combine it with dataset 1 and remove LA from dataset1.
So I click on filter and reshape again at the salaries. I select the column and select by delimiter, select space and at the right-most delimiter.
That will give me:
Now I rename the Team.1 to City and Team.2 to Team. The last problem is that LA has two teams. You can see that in the first dataset they solved that by deviating from the norm by adding LA Lakers and LA Clippers. I need to do the same here.
First I rename Los Angeles to LA. Now I add a new column that based on my expression add the team name if the city is Lakers. I do right mouse click on the Salary column and select Insert column, Custom… This opens a new window where I can type a expression. What I added was:
if [City]=”LA” then [City] & ” ” & [TEAM] else [City]
This will give me the result that I want (I also renamed the columns):
Except that the steps did not anticipate a NBA team with two words “Trail Blazers” . So I added a special step for that to the formula”:
if [City]=”LA” then [City] & ” ” & [TEAM] else if [City]=”Portland Trail” then “Portland” else [City]
I press OK and reimport the data.
Now I am able to create a relationship in PowerPivot:
and start my Analysis by creating a Pivottable. I also formatted PCT and Salary in PowerPivot to get a better view.
There is a large amount of players in there that are not assigned to any team. I want to filter those out of the result set. So I go back to Data Explorer and deselect players with an empty city:
This gets rid of 100+ players that I don’t care about.
Now lets look at the data and add a measure that determine the PCT of Salary compared to all other teams.
I add the following measure to PowerPivot:
Salary all teams:=CALCULATE(sum(Query1[SALARY]),ALL(Query))
The first shocking result was when this measure returned the value of 1.8 billion ..:
Now I add a percentage of salary for each team against all salaries.
pct Salary:=sum(Query1[SALARY])/[Salary all teams]
To make it a little bit more clear instead of PCT I also create a rank:
Rank by PCT of winnings:=RANKX(all(Query[NBA]),[Sum of PCT])
this really gives you an idea that money doesn’t buy you everything:
The most interesting one is Indiana all the way down with almost half the salary costs as the Lakers at 7th best team in the NBA:
and the teams above and below the Lakers earn less together then the Lakers themselves..
Next up I would like to add a measure that will calculate the salary per player. So add the following two measures:
nr of players:=COUNTROWS(Query1)
Salary per player:=sum(Query1[SALARY]) / [nr of players]
Unfortunately that didn’t give us much interesting information, besides what we already know. It is interesting to see that a few teams have almost 20 players on their team:
While Chicago and the Lakers are not even capable of doing 5 on 5 in training ..
Hope that gives you some idea of the power of Data Explorer and PowerPivot together.
Hello Kasper.
I did the same steps as you, and I have the same end result. Just wanted to warn other readers of your blog to when copying / pasting the queries, remember to rewrite the quotes symbol because it does not copy properly formatted. I also had some trouble to convert the data type of column “salary” and “pct” text to decimal. My solution was to use the command “Replace values” to change dot (.) by comma (,).
I am very excited about the new possibilities that the Data Explorer can provide. Very good your post (i am a sport fan, too).
Hi @Marcos Vinicius Fontes ,
Thanks for the feedback ! and glad you are loving it !
Kasper
Dear Kasper.
I am thrilled about this tool!!
Looking forward diving into this script language. I have therefore tested your example against the Danish Real Estate database – see http://www.boliga.dk.
I would like to collect all deals from 2006 towards Today. This query –
http://www.boliga.dk/salg/resultater?so=1&sort=omregnings_dato-d&maxsaledate=today&iPostnr=&gade=&minsaledate=2006
– generates a result set of 537139 house sales in DK for that period.
To collect the complete result into Excel using Data Explorer I have created 2 queries – see those below.
The problem is that there is a time Out even with smaller page intervals set to eg. 1..150.
Do you see any method to optimize the script in order to get the complete dataset in one go?
Thanks in advance.
/Jens Ole
Query1:
(page) => let
Source = Web.Page(Web.Contents(“http://www.boliga.dk/salg/resultater?so=1&sort=omregnings_dato-d&maxsaledate=today&iPostnr=&gade=&minsaledate=2006&page=”&
Number.ToText(page))),
Data1 = Source{1}[Data]
in
Data1
Query2:
let
Source = Table.Combine(List.Transform({1..150}, Query1)),
ChangedType = Table.TransformColumns(Source,{{“Rum”, Number.From}},Value.FromText),
ReplacedValue = Table.ReplaceValue(ChangedType,null,””,Replacer.ReplaceValue,{“%”}),
ReplacedValue1 = Table.ReplaceValue(ReplacedValue,”%”,””,Replacer.ReplaceText,{“%”}),
ChangedType1 = Table.TransformColumns(ReplacedValue1,{{“%”, Text.Clean}})
in
ChangedType1
Kasper,
Great write-up on a practical use for Data Explorer. The example was right up my alley so I’ve been testing it out.
One question I have: When you first pulled in the W/L data from ESPN you hid all of the other columns because all you wanted was winning % and team names. There are columns in there for Home Record, Road Record, etc. that are in the 12-9 format on the webpage. Excel sees that value as a date and automatically changes it. If I wanted to keep that column but did not want it converted to a date what would I need to do in Data Explorer to transform that column? Thanks for your help!
Hi Bryan,
I believe the Data Explorer team made some changes to this in their current update to improve the typing, please update your data Explorer.
Kasper,
Does this still work with the current version of Data Exploer? I had gotten it to work before, but now my queries that worked previously are displaying errors. Has there been an update that has caused these steps to change from your original post? Thanks
Kasper,
The function below is no longer valid in Data Exploer: (page) => let
Source = Web.Page(Web.Contents(“http://espn.go.com/nba/salaries/_/page/” & Number.ToText(page))),
Data0 = Source{0}[Data]
in
Data0
Do you have any ideas on what’s changed or how to achieve the same result (getting data from paginated webpages into a single query). Thanks!
Bryan
Hi Bryan,
it looks like the web site has changed so the particular function doesn’t work anymore but the function structure still does. I recreated it by using
(page) => let
Source = Web.Page(Web.Contents(“http://espn.go.com/nba/salaries/_/page/” & Number.ToText(page))),
Data0 = Source{0}[Data],
ChangedType = Table.TransformColumnTypes(Data0,{{“RK”, type number}, {“NAME”, type text}, {“TEAM”, type text}, {“SALARY”, type number}})
in
ChangedType
That does the trick.
Thanks Kasper! Got it working now
Hi Kasper and other.
Maybe this is too custom specific and you there will not spend time on it – and that is of cause ok.
As I described earlier do like the NBA example and I am testing it against a Danish Real Estate database.
When I do invoke query 1 with any page e.g. page=10 the query return 40 lines (the website provide 40 lines per result set) in a snap – very fast!
Query 1 look like this:
= (page) => let
Source = Web.Page(Web.Contents(“http://www.boliga.dk/salg/resultater?so=1http://www.boliga.dk/salg/resultater?so=1&minsaledate=2006&p=” & Number.ToText(page))),
Data1 = Source{1}[Data]
in
Data1
The problem appears in the second query. In the query editor I would like to return 5000 pages. The script would look like this:
= let
Source = Table.Combine(List.Transform({1..5000}, Query1))
in
Source
If I run the query as:
Source = Table.Combine(List.Transform({1}, Query1))
then the editor will return the result fast and as expected.
If I add another page that is –
Table.Combine(List.Transform({1, 2}, Query1)) – then it works fine too.
However, adding more pages like – Table.Combine(List.Transform({1..12}, Query1)) – the result set is often not returned to the Excel sheet.
It seems to me that either 1) the webpage does not respond (I believe it does) og 2) Power Query fail combining many table sets.
Any idea how to debug this?
If you like I can forward the spreadsheet to those who would like to see the example.
Hope to get some troubleshooting idea – thanks in advance.
/Jens Ole
Hi Jens,
I have forwarded your question to the Power Query team.
Thanks,
Kasper