I got an interesting question about being able to figure out which measures and columns are being used for all tabular models in a server, either by known reports or by self service users using Power BI. To solve this I decided to use Power BI :).
To get the required information I decided to capture and parse the queries being that are being send to AAS and parse the results Power BI desktop over a period of time. In this post I describe how to do it.
Start capturing data
First I started capturing the queries in AAS using Azure Log analytics as I described here. This can also be done using profiler or XEvents as well but I decided to be modern :).
In log analytics I can see all the queries that get send:
For my analysis I don’t need all the columns and events, so let’s filter that down with log analytics to something like this:
| where OperationName == “QueryBegin”
| project TimeGenerated, TextData_s, DatabaseName_s
| where TimeGenerated > ago(30d)
You can change the time range to any range you want to whatever max range you have data for. You need to keep the TimeGenerated column as we want to filter by it.
This will get us the data we can use in Power BI:
To get this data into Power BI you can click Export, “Export to Power BI” in the query log screen.
Bringing it into Power BI
This gets you a txt file with the M expression that is downloaded to your PC. You can then paste the M query into a blank Power BI query. Running it gets me the data I want:
From this I need to parse the columns and measures that are used over the time period I captured the trace. This requires some tricks in Power Query (and it might not work for all scenario’s). These are the steps I took (you can download the full M query here):
- I got rid of all diagnostic queries by filtering out Select.
- The time column is not needed for the analysis so I remove it, you need it for log analytics to filter on time.
- Now here comes the main trick. I need to extract column, measure and table names from the query. I do this by replacing many special characters like ( ,[ ] EVALUATE by ~ and then split the columns. Thanks to Imke for the idea :). That starts to get us somewhere:
- Next I Unpivot and clean up the data a bit
- I filter to only show rows that start with ‘ or [. Which keeps all column, table and measure references.
- To add more information I add a column to show what the type of field it is, measure, table or column.
- Finally I group the table and add a count.
This gets me the data I want:
With the data I can start doing some reporting:
Now you have an overview of which fields are used and how often they are referenced. You can extend this solution much further of course, for example you could add a username to see who accessed what and how often or you could compare it with the schema that you can get from other discovers to see which fields are never used so you can clean them up.
You can download the full solution here as PBIT. When you open it up for the first time it asks for the URL, you can get it from the M query that you download from Log Analytics.