I got an interesting question about being able to figure out which measures and columns are being used for all tabular models in a server, either by known reports or by self service users using Power BI. To solve this I decided to use Power BI :).
To get the required information I decided to capture and parse the queries being that are being send to AAS and parse the results Power BI desktop over a period of time. In this post I describe how to do it.
Start capturing data
First I started capturing the queries in AAS using Azure Log analytics as I described here. This can also be done using profiler or XEvents as well but I decided to be modern :).
In log analytics I can see all the queries that get send:

For my analysis I don’t need all the columns and events, so let’s filter that down with log analytics to something like this:
AzureDiagnostics
| where OperationName == “QueryBegin”
| project TimeGenerated, TextData_s, DatabaseName_s
| where TimeGenerated > ago(30d)
You can change the time range to any range you want to whatever max range you have data for. You need to keep the TimeGenerated column as we want to filter by it.
This will get us the data we can use in Power BI:

To get this data into Power BI you can click Export, “Export to Power BI” in the query log screen.

Bringing it into Power BI
This gets you a txt file with the M expression that is downloaded to your PC. You can then paste the M query into a blank Power BI query. Running it gets me the data I want:

From this I need to parse the columns and measures that are used over the time period I captured the trace. This requires some tricks in Power Query (and it might not work for all scenario’s). These are the steps I took (you can download the full M query here):
- I got rid of all diagnostic queries by filtering out Select.
- The time column is not needed for the analysis so I remove it, you need it for log analytics to filter on time.
- Now here comes the main trick. I need to extract column, measure and table names from the query. I do this by replacing many special characters like ( ,[ ] EVALUATE by ~ and then split the columns. Thanks to Imke for the idea :). That starts to get us somewhere:

- Next I Unpivot and clean up the data a bit
- I filter to only show rows that start with ‘ or [. Which keeps all column, table and measure references.
- To add more information I add a column to show what the type of field it is, measure, table or column.
- Finally I group the table and add a count.
This gets me the data I want:

With the data I can start doing some reporting:

Now you have an overview of which fields are used and how often they are referenced. You can extend this solution much further of course, for example you could add a username to see who accessed what and how often or you could compare it with the schema that you can get from other discovers to see which fields are never used so you can clean them up.
You can download the full solution here as PBIT. When you open it up for the first time it asks for the URL, you can get it from the M query that you download from Log Analytics.
Great idea. Only concern would be columns referenced in measures that never appear in queries (e.g. helper columns in a date table). I suppose you could parse the measure definitions and bounce that list against the measures in this solution.
Very good idea. Was already getting queries to find slow running ones, but never thought to parse for the column names.
yeah, it would be great to bring in also the schema of the cube as another data source, identify all of the dependencies of all of the measures used ultimately to find out which columns/measures are never being used, so they can be removed from the model.
This was very helpful, however, I simplified this approach using RegEx is the original Kusto query which helps with simplifying the clean up in Power BI:
AzureDiagnostics
| where OperationName == “QueryBegin”
| project TimeGenerated, TextData_s, DatabaseName_s, ApplicationContext_s
| where TimeGenerated > ago(1h)
| extend MeasuresAndColumns = extract_all(@”([\[\’][\w ]+[\]\’]\.?[\[\’][\w ]+[\]\’])|\[(.*?)\]”,TextData_s)
You can then just parse the MeasuresAndColumns column in PBI.
I further combined this with the $SYSTEM.MDSCHEMA_MEASURES to try and isolate measures that are not being used.
Happy to receive feedback on the RegEx to improve this as it doesn’t cover all cases. Some columns will get labelled incorrectly as measures.