A lot has already been written on aggregations and it is one of the most exciting features of Power BI in a while. The guys at SQLBI did a session on it and my colleague Phil Seamark wrote an amazing host of articles with many details and tips and tricks on it. So why another article? Most people think aggregations is only applicable to petabyte scale data sets. That doesn’t have to be the true though, in this blog post I wanted to give my take on it and why it might be applicable to your current, smaller, model.
Here are 5 reasons you want to look at using aggregations:
- This is the obvious one. The data you want to analyze or report on doesn’t fit in memory. Imagine your users want to be able to see the details of every transaction and you have billions of rows. With aggregations you can store the aggregated values by product \ date etc in memory. You then leave the underlying transactions in the data source. However the magic here is that your users won’t even notice as it is one model and report.
- You want your data to be “real time” but not use DirectQuery all the time. When you are using DirectQuery for 100% of your queries are running against your data source. This might get you into performance and concurrency problems. With aggregations you can load a small set of aggregated data in memory that you can update and reload every minute or so. That aggregated data can take care of the majority of queries but will still be quick to load. You will have to be careful to make sure the data is in sync.
- Potentially you want to save cost. By determining what data is needed for 95% of the reporting cases you can trim down the data stored in memory. When data is loaded in memory it cost money when using Premium. Of course by offloading some queries to the data source you are adding load there too. So it depends on the ROI if this one is worth it.
- You want to optimize certain calculations. With aggregations you can use aggregation tables for DistinctCount calculations and use DirectQuery for lowest level calculations. Not just DisctinctCount will work but many other calculations too. This might be interesting to investigate but your mileage might vary. The team told me at one point we might have Aggregations over VertiPaq. When this happens performance optimizations using aggs will be even more interesting.
- Now the final one is to go read these blogs by Phil Seamark and try to understand what he is doing there. They show so many new ways This will allow you to solve many interesting scenario’s and allow you to”
I hope this has given you some idea’s for using aggregations even when you don’t have petabyts of scale.
Thank you so much for sharing such amazing content and it was so helpful.