I was working on a PoC that needed a measure where I had to use some more advanced DAX functions. In this case I had two fact tables and the customer wanted to have the sales of a specific store, but only if they had a booking in the same period.
To cut straight to the DAX, this formula does the trick:
=SUMX( FILTER(salesperstore,
COUNTROWS(FILTER(Bookings,
MONTH(Bookings[FlightDate]) = MONTH(salesperstore[DateKey]) &&
YEAR(Bookings[FlightDate]) = YEAR(salesperstore[DateKey])))
> 0)
, salesperstore[SalesAmount]
)
This was a very slow statement and ran for 23 seconds, so i decided to try and tune it.
What happens in this statement: do a SUM of salesperstore[SalesAmount] of a filtered Salesperstore table, only the values of Salesperstore table are returned where the number of rows of a filtered table of Bookings on the same month and year of the current salesperstore context are greater than zero. The SUMX will create a copy of the filtered table and loop through it and evaluate every row and only sum the values that are true.
I started by replacing the SUMX with a calculate. The calulcate doesn’t loop through every row but rather filters the dataset. This made a quicker function:
=Calculate(sum(salesperstore[SalesAmount]),
FILTER(salesperstore,
COUNTROWS(FILTER(Bookings,
MONTH(Bookings[FlightDate]) = MONTH(salesperstore[DateKey]) &&
YEAR(Bookings[FlightDate]) = YEAR(salesperstore[DateKey])))
> 0)
)
But this still was very slow, and i knew why, the FILTER functions use the table supplied to the function and loops through the entire table to return the requested result. I complained on twitter on how slow it was when Vidas send me a message that you should avoid using a filter over an entire table. He got a tip from Marius Dumitru from the PowerPivot team how to solve this, check it out here.
The explanation of the tip is that FILTER is very slow when done over a entire table. So this means in this case that it loops through the Bookings table and salesperstore table while creating copies of the entire table in memory while we only need one column. And that is the key to tuning this query. We don’t want to drag with us the entire table to determine the filter, we only want to have the specific column. This can be done by the VALUES function, as BOL defines it:
Returns a one-column table that contains the distinct values from the specified column. In other words, duplicate values are removed and only unique values are returned. This function cannot be used to return values into a cell or column on a worksheet; rather, you use it as an intermediate function, nested in a formula, to get a list of distinct values that can be counted, or used to filter or sum other values.
And that is exactly what we are going to do, we replace the parameter of filter to be a column of a table instead of an entire table:
=Calculate(sum(salesperstore[SalesAmount]),
FILTER(values(salesperstore[DateKey]),
COUNTROWS(FILTER(Values(Bookings[FlightDate]),
MONTH(Bookings[FlightDate]) = MONTH(salesperstore[DateKey]) &&
YEAR(Bookings[FlightDate]) = YEAR(salesperstore[DateKey])))
> 0)
)
This brought the execution time of the query with 100% processor usage (on my dual core laptop) from 23 seconds to 2 seconds. So as you can see it is well worth to take a look at your query’s. Most of the time a lot of performance profit can be gained when you use other formula’s, especially when dealing with SUMX and FILTER. I can understand that end users will not so easily be able to tune it like this, so we need to educate and help them with this.
Hi,
I’ve found your article very interesting, it gives a simple in depth information thru an everyday example of a big performance issue in PowerPivot. I’m finishing to read my second book about PP and I haven’t found this clever approach on the subject “performance” in any of these.
Thanx!
Can this be used with the ALL() function, i.e. when filtering ALL(‘Date’) .. not sure, can’t seem to get it to work 😮