Determine columns you don’t need using DMV’s in Power BI

Today a quick one that I came across while writing a different blog post that I will blog later. I know we have talked about it again and again but a good best practice is to remove any high carnality columns you don’t necessary need. This trick is not new and has been blogged about before in different places but I wanted to emphasize it again due to the importance.

I was finishing up my next blog post and wanted to upload the sample file. While doing that I noticed the file was 150MB large. That is rather large for such a simple file. The largest table has 500,000 rows and none of them are unique. What is going on?

DAX studio

So I whipped open DAX Studio and connected to my Power BI desktop file. There I switched to DMV in the query pane. DMV stands for Dynamic Management Views and come to us from the Analysis Services world giving us a ton of interesting information. about our model.  

To figure out what is going on I started out by looking at the storage tables. You can double click the query name in the pane and it generates a query.

What I need here is to look at the row count field and find out which one is the largest. Row count will give us the unique row count for a column. Unique rows do not compress well and are responsible for a large percentage of the memory usage of the model. The good thing is that the query langue used for DMV’s is like SQL. Therefor I can change the query to the following: “select * from $SYSTEM.DISCOVER_STORAGE_TABLES order by rows_count desc”.

Running this query gives me the following result:

Immediately we spot the culprit, we have an ID column in the sales table that is responsible for the majority of the space needed. After removing the column the model size went from 156 MB to 13.6 MB. Talk about a simple optimization here! 

More DMV’s?

This is just the tip of the iceberg on DMV’s, you can also do things like:
– Getting all your measures with:
$SYSTEM.TMSCHEMA_MEASURES
– Determining the measure dependency: 
$SYSTEM.DISCOVER_CALC_DEPENDENCY
– Get all the tables: 
$SYSTEM.DBSCHEMA_TABLES
– Get all the columns: 
$SYSTEM.DBSCHEMA_COLUMNS

Finally as mentioned before you can write SQL like queries, for example if you want to get a list of all the columns in the dimDate table you can do: 
select * from $SYSTEM.DBSCHEMA_COLUMNS where table_name=’$Dimdate’

Pretty cool right!

Update: Daniel Otykier on twitter also suggests using TabularEditor’s Best Practice Analyzer. It considers if columns are being referenced in any DAX expressions (even RLS filters), if they participate in relationships, or are used as hierarchy levels etc. Then you know for sure if you can remove a column. Great idea!

Finally you can also use VertiPaq Analyzer to analyze the data in your model in more details. Go check it out!

2 Replies to “Determine columns you don’t need using DMV’s in Power BI

  1. Hi Kasper,

    I am new to DAX Studio and an intermediate user of PowerBI. I’m working through Dax Studio to find inefficiencies and found this post very helpful. However; I can’t find a simple query in Dax Studio to show fields/columns/attributes in my PowerBI report that I’m importing and whether they are or aren’t being used in my report. I think I want what you’ve mentioned above “Daniel Otykier on twitter also suggests using TabularEditor’s Best Practice Analyzer,” but I’d like to do it in Dax Studio. Thanks for any assistance.

    1. Hi Carly, thanks for stopping by. Unfortunately you cannot do it in DAX studio. The model doesn’t know which fields are being used by the report. Daniels tool does allow you to see if there are hidden columns that are not referenced by any other calculation in the model though the DISCOVER_CALC_DEPENDENCY described above.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.