A scenario that is quiet common is the following, as part of my analysis I want a flat list of all Customers and their OrderNumbers filtered by a parameter, like year or who purchased particular products etc etc.
Lets take a scenario where I want to see the customer and their orders that I can filter by year and month.
We have the following schema:
Lets start and look at what happens in an Excel PivotTable. I drag in LastName and SalesOrderNumber as rows into a PivotTable. Excel will take a long time to return results even though there are only 60.000 rows in FactInternetSales and 18.000 in DimCustomer. Excel will eventually give you an message that will ask if you want to see a subset of the results. If you click OK Excel will give you results. But it is probably not what you expect, you get a Cartesian product where the Pivot Table will show you ALL the SalesOrderNumbers for ALL the Customer Lastnames. Not the Ordernumbers for each person. This can result in potentially huge results that will take a long time to return. Since a huge number of rows will be returned performance usually will be quiet bad.
You will need to add a measure to PivotTable for the filtering to take place. This is often not very understandable for the users who just want to see the values that are joined together. If you add the measure you get the results you expect:
This result set will also give you much better performance since its no longer a crossjoin.
The reason behind needing a measure this is that SSAS doesn’t have the concept of inner joins like SQL has. PivotTables will only show values on rows and columns where for the combination where the value the measure does not return blank. This is the explicit design by Excel for PivotTables. Most important is to realize that the measure is used to determine if the values for Lastname and SalesOrderNumber are going to show up for each combination of values. Lets say you were to use “Sum of Cost” as measure and the cost column is empty in the FactInternetSales table for some customers, you will NOT see the SalesOrderNumber for that Lastname that we would see if we were to use “Sum of SalesAmount”.
Power View behaves differently, when you put the same the fields into a table in Power View you do get results for each SalesOrderNumber for each LastName, not a crossjoin but use some sort of join. This gives much better performance than doing the same in Excel.
But how does this works and why? Analysis services does not have a way to get relationships without measures. Well what Power View automatically (magically) does is look at the relationships between the tables of the fields used in the report and in the DAX query that it sends it automatically creates a measure in the table on the many side of the relationship to force the values to show up as we expect. The measure is not persisted in the model and just used for this visualization.
This is the query it generates:
EVALUATE
FILTER(
KEEPFILTERS(
GENERATE(
KEEPFILTERS(VALUES(‘DimCustomer'[LastName])),
CALCULATETABLE(
FILTER(
KEEPFILTERS(VALUES(‘FactInternetSales'[SalesOrderNumber])),
NOT(
ISBLANK(CALCULATE(COUNTROWS(‘FactInternetSales’)))
)
)
)
)
),
OR(
NOT(ISBLANK(‘DimCustomer'[LastName])),
NOT(ISBLANK(‘FactInternetSales'[SalesOrderNumber]))
)
)
ORDER BY
‘DimCustomer'[LastName], ‘FactInternetSales'[SalesOrderNumber]
So the Power View behavior looks like what makes more sense and what most customers expect, but it is using a fake measure. When you are to put a real measure in the combination of rows could change so this could be strange to some users. It does provide an out of the box performant experience that I personally like.
In my recent experience, I found that it works fine when you put just the Sales Order Number with columns of lower cardinality (like last name here). But the moment you bring in another column of the same or higher cardinality (like if you had a SalesOrderNumber and SalesOrderDescription), it completely kills the performance. I guess the reason is that the measure is evaluated multiple times.
Quoted from Performance Tuning of Tabular models whitepaper – “Another performance bottleneck often seen in DAX queries generated by Power View is that measures are evaluated multiple times.
Typically the innermost part of a query will perform a FILTER and a NOT(ISBLANK()) on the calculated measure. Then one or more ADDCOLUMNS functions will reevaluate the measure in the proper filter context. This type of query displays the correct results on the screen but does prevent the VertiPaq cache from being hit during the multiple evaluations of the same measure, causing performance to be slower than expected.
“. So I am currently telling my clients to avoid using Power View for returning large datasets or for detailed operational reporting. What is your view on it?
Great call out Kasper. I’ve been experiencing the frequent need to shift users to Power View for any flat table style reports, sourced from a SSAS cube, as Excel is geared around aggregation style analysis.
As you would know, user can indeed choose to show results in a tabular form in Excel but it still relies on a measure being present, and still doesn’t give that simple experience of generating rows and columns.
Flat table reports still have their place in reporting, and ironically after being ushered over to Power View for these reports, users are looking for the “Export to Excel” feature!!
I am mostly developing tabular models, and the drillthrough capabilities need to catch up with Multi-D. I would love to have custom drillthroughs defined as the default is very poor.