Today a quick one that I came across while writing a different blog post that I will blog later. I know we have talked about it again and again but a good best practice is to remove any high carnality columns you don’t necessary need. This trick is not new and has been blogged about before in different places but I wanted to emphasize it again due to the importance.
I was finishing up my next blog post and wanted to upload the sample file. While doing that I noticed the file was 150MB large. That is rather large for such a simple file. The largest table has 500,000 rows and none of them are unique. What is going on?
So I whipped open DAX Studio and connected to my Power BI desktop file. There I switched to DMV in the query pane. DMV stands for Dynamic Management Views and come to us from the Analysis Services world giving us a ton of interesting information. about our model.
To figure out what is going on I started out by looking at the storage tables. You can double click the query name in the pane and it generates a query.
What I need here is to look at the row count field and find out which one is the largest. Row count will give us the unique row count for a column. Unique rows do not compress well and are responsible for a large percentage of the memory usage of the model. The good thing is that the query langue used for DMV’s is like SQL. Therefor I can change the query to the following: “select * from $SYSTEM.DISCOVER_STORAGE_TABLES order by rows_count desc”.
Running this query gives me the following result:
Immediately we spot the culprit, we have an ID column in the sales table that is responsible for the majority of the space needed. After removing the column the model size went from 156 MB to 13.6 MB. Talk about a simple optimization here!
This is just the tip of the iceberg on DMV’s, you can also do things like:
– Getting all your measures with:
– Determining the measure dependency:
– Get all the tables:
– Get all the columns:
Finally as mentioned before you can write SQL like queries, for example if you want to get a list of all the columns in the dimDate table you can do:
select * from $SYSTEM.DBSCHEMA_COLUMNS where table_name=’$Dimdate’
Pretty cool right!
Update: Daniel Otykier on twitter also suggests using TabularEditor’s Best Practice Analyzer. It considers if columns are being referenced in any DAX expressions (even RLS filters), if they participate in relationships, or are used as hierarchy levels etc. Then you know for sure if you can remove a column. Great idea!
Finally you can also use VertiPaq Analyzer to analyze the data in your model in more details. Go check it out!