Aggregations and Time Series calculations in Power BI

I have received a couple of questions about Aggregations in Power BI and whether they can be used to cover time series calculations such as Year to Date, Quarter to Date, and Month To Date. The answer is yes. Since time series calculations break down into calculations over a series of days, an aggregation table defined at day level with the basic summarization methods (min, max, sum, count) and the right relationship with a Date dimension, can answer Year to Date, Quarter to Date, and Month To Date calculations.

Let’s take a quick look at one such calcualtion and how it can be covered with an aggration. I am going to use the same version of AdventureWorks sample database and Power BI model that I used in my previous blog post on aggregations, with a few changes.

Sample Queries

My goal is to cover the following two queries that both return Year to Date Sales Amount by Date so that they can be answered from an Aggregation table vs. the main Sales table:

The first query uses the DATESYTD function to generate a set of dates to sum Sales Amount over.

EVALUATE
SUMMARIZECOLUMNS (
    'Date'[Date],
    "Sales Amount", 
    CALCULATE(SUM ( 'Internet Sales'[Sales Amount] ) ),
    "YTD Sales Amount",
    CALCULATE(SUM ( 'Internet Sales'[Sales Amount] ),
              ALL('Date'),
              DATESYTD('Date'[Date])
              )
)
ORDER BY 'Date'[Date]

The above query can also be written using the TotalYTD function instead and returns the same results.

EVALUATE
SUMMARIZECOLUMNS (
    'Date'[Date],
    "Sales Amount", 
    CALCULATE(SUM ( 'Internet Sales'[Sales Amount] ) ),
    "YTD Sales Amount",
    TOTALYTD(SUM ( 'Internet Sales'[Sales Amount] ), 'Date'[Date])
)
ORDER BY 'Date'[Date]

If you were to graph these results, you would see something like this:

Creating the Aggregation table

I created an aggregation table at Day level in AdventueWorks database using the following SQL code:

SELECT	fs.DateKey,
		SUM([Sales Amount]) AS [Sales Amount]
INTO dbo.Sales_Agg4_By_Day
FROM [dbo].[vFactInternetSales] fs
GROUP By fs.DateKey

This table has 1,024 rows which is much smaller than the original sales table with 60,398 rows.

Power BI Model

I imported this table into the model and connected it to the Date table on DateKey. Notice how in the model below, this relationship is “1 to many” with the aggregation table on the many side. I will explain this in more details in the next section.

Relationship between the Sales_Agg4_By_Day and Date tables

When I first connected the Sales_Agg4_By_Day and Date tables, Power BI created a 1-to-1 relationship between the two tables with a bi-directional cross filter. I had to change this before I could configure the aggregation table, otherwise the interface did not allow it and gave a series of warning messages, including:

An aggregation table cannot have a bidirectional filter.
An aggregation table cannot be referenced by a one to one relationship.

Table storage modes

I explained table storage modes in detail in my previous blog post on Aggregations. Here everything is the same as it was in that post with the exception of the Date table:

  • The Internet Sales and Sales Territory tables are in Direct Query storage mode.
  • The Product tables are in Dual storage mode.
  • The Sales_Agg4_By_Day table has Import storage mode.

The Date dimension table is in Dual storage mode which is a change from my previous post where it was left in Direct Query. Here, its storage mode must be set to Dual; otherwise, even after the Sales_Agg4_By_Day table is configured as an aggregation, it won’t be used to cover queries. This is because the relationship between the Sales_Agg4_By_Day and Date tables is considered to be a weak relationships:
One side in Import, the other side in Direct Query.

It is interesting that in this case, Power BI won’t stop you from configuring the aggregation, it just won’t be used. If you use DAX Studio or another tool to check the usage of the aggregation, you will see a failed attempt at using the aggregation table and the only message you get is that no mapping was found for the DateKey column even though you have a relationship in place. Even though there is no mention of a weak relationship in the reason for the miss, the weak relationship is the main culprit here.

It is important to remember:

Agggreation hits based on relationships are only considered when the tables involved in the query have the same storage mode or one side is at least set to Dual where it can change its nature on demand.

Configuring the Sales_Agg4_By_Day table

Configuring the Sales_Agg4_By_Day table is easy. The following picture shows the details:

Since there is already a relationship between the Date and Sales_Agg4_By_Day tables, there is not need to fill out the row for DateKey in the aggregation properties window.

Profiling the sample queries

I used DAX Studio to connect to the Power BI model. I ran the sample queries after configuring the aggregation with the Server Timings option selected so that I can see the details of how the query was answered. (Alternatively, you could use SQL Server Profiler/Extended Events and run a trace against your PowerBI model to monitor the same event but that is a lot more work.)

Both queries were able to use the aggregation table instead of the main sales table. The following shows the details of the “Aggregate Rewrite Attempt” event. It shows that Power BI was smart enough to use the aggregation after it broke down the TotalYTD and DatesYTD functions into a series of dates.

Conclusion

Time series functions that break down to a series of dates, can be covered by an aggregation table defined at day level.

Aggregation Precedence in Power BI: How does Power BI choose between candidate aggregation tables with the same Precedence?

Precedence is one of the aggregation properties that you can define on an aggregation table in Power BI. In a model with multiple aggregation tables, you can use Precedence to define the order in which aggregation tables will be considered by Power BI to answer queries. The higher the Precedence number, the sooner the aggregation table will be considered. Very simple and easy to understand. but the question is:

What does Power BI do when there are multiple aggregation tables configured with the same Precedence value and they can all answer the same query? Which one is considered first? Does it choose the smallest one? or is there another rule in place?

Spoiler Alert: It does not seem to choose the smaller aggregation table.

Background

In my previous post on Aggregations in Power BI, I used an example where the model had two aggregation tables:

One table at Product and Year level with 288 rows:

CREATE TABLE [dbo].[Sales_Agg1_By_Product_Year](
	[ProductKey] [int] NOT NULL,
	[Calendar Year] [smallint] NOT NULL,
	[Sales Amount] [money] NULL,
	[Unit Price Sum] [money] NULL,
	[Unit Price Count] [int] NULL,
	[InternetSalesRowCount] [int] NULL
) ON [PRIMARY]


INSERT INTO dbo.Sales_Agg1_By_Product_Year
SELECT	fs.ProductKey,
		dd.[Calendar Year],
		SUM([Sales Amount]) AS [Sales Amount],
		SUM([Unit Price]) AS [Unit Price Sum],
		COUNT([Unit Price]) AS [Unit Price Count],
		COUNT(*) AS InternetSalesRowCount

FROM [dbo].[vFactInternetSales] fs
JOIN [dbo].[vDimDate] dd
ON fs.DateKey = dd.DateKey
GROUP By fs.ProductKey, dd.[Calendar Year]

One table at Year level with 5 rows:

CREATE TABLE [dbo].[Sales_Agg2_By_Year](
	[Calendar Year] [smallint] NOT NULL,
	[Sales Amount] [money] NULL
) ON [PRIMARY]

INSERT INTO dbo.Sales_Agg2_By_Year
SELECT	dd.[Calendar Year],
		SUM([Sales Amount]) AS [Sales Amount]
FROM [dbo].[vFactInternetSales] fs
JOIN [dbo].[vDimDate] dd
ON fs.DateKey = dd.DateKey
GROUP By dd.[Calendar Year]

Obviously, before configuring the second aggregation table, Power BI used the first and only aggregation table available to answer the test query.

Before configuring the second Aggregation table.

After configuring the second aggregation table, Power BI picked the second one.

After configuring the second aggregation table.
Before setting the Precedence property.

Since the second table was smaller, this seemed like a desired behavior but the question is: Did Power BI pick it because it was smaller or was there another factor in place? Unfortunately as you will see shortly, there seems to be something else at play.

Before I show you that, for reference, when I set the Precedence property of the larger aggregation tableto 10, Power BI chose the larger aggregation table to answer the respective queries.

After configuring the second aggregation table.
After setting the Precedence property of the first aggregation table to 10.

This last example was only used to demonstrate that the Precedence property can direct Power BI to consider one aggregation table over another one.

In a real case, if there is more than one aggregation table configured and they can all be used to answer the same query, you do want the smallest aggregation table to be considered first.

What does Power BI do when aggregation tables have the exact same Precedence

To find out why Power BI picked the second aggregation table in the absence of precedence (0), I added a third aggregation table to the mix.

CREATE TABLE [dbo].[Sales_Agg3_By_ProductCategory_Year](
[ProductCategoryKey] [int] NOT NULL,
[Calendar Year] [smallint] NOT NULL,
[Sales Amount] [money] NULL
) ON [PRIMARY]

INSERT INTO dbo.Sales_Agg3_By_ProductCategory_Year --49 rows
SELECT	dp.[ProductSubcategoryKey],
		dd.[Calendar Year],
		SUM([Sales Amount]) AS [Sales Amount]
FROM [dbo].[vFactInternetSales] fs
JOIN [dbo].[vDimDate] dd
ON fs.DateKey = dd.DateKey
JOIN [dbo].[DimProduct] dp
ON fs.ProductKey = dp.ProductKey 
GROUP By dp.[ProductSubcategoryKey],
	 dd.[Calendar Year]

This table has 49 rows. It is smaller than the first aggregation table with 288 rows but larger than the second one with 5 rows.

I configured the third aggregation table similar to others.

Since the query in question, is by Calendar Year, I did not configure ProductCategoryKey and did not connect the ProductCategoryKey column between the Sales_Agg3_By_ProductCategory_Year and the Product Category tables in the model.

EVALUATE 
SUMMARIZECOLUMNS(
	'Date'[Calendar Year],
	"Sales Amount", SUM ('Internet Sales'[Sales Amount])
)

Multiple Aggregation tables with the same Precedence

After adding and configuring the third aggregation table in the model, the test query was answered from the third aggregation table:

It seems like Power BI chose the last configured Aggregation table to answer the query.

(Note: As expected, connecting ProductCategoryKey between the Sales_Agg3_By_ProductCategory_Year and Product Category tables in the model, does not make any difference in this example. This is because that column is not in use for the query in question.)

I repeatetest the same test as before but this time I changed the order of the opeations. I added Sales_Agg3_By_ProductCategory to the model first, configured it, then added Sales_Agg2_By_Year and configured it. As I was suspecting at this point, Power BI chose Sales_Agg2_By_year which was the last table added to the model.

At this point I had yet another question in mind: In the absence of different Precedence properties, is it the order in which aggregation tables are added to the model or the order in which they are configured that affects the order in which they are considered to answer queries?

To find out, I tested the following combinations:

Agg2 configured 1st.
Agg3 configured 2nd.
Agg 2 configured 2nd.
Agg 3 configured 1st.
Agg 2 added first. Close and Apply.
Agg 3 added second. Close and Apply.
Query uses Agg3.Query uses Agg3.
Agg 3 added first. Close and Apply.
Agg 2 added second. Close and Apply. 
Query uses Agg2.Query uses Agg2.

The aggregation table last added to the model wins in all combinations.

A couple of notes:

(1) Agggration table one was not considered in either combinaton.

(2) I repaeted the same test with the Precedence set to 10 instead of 0 and got the same results.

Conclusion

It seems like Power BI is favoring the last aggregation table added to a model when there is a tie in the Precedence property between them. At the moment, there are not many details available about the inner workings of how Power BI decides which aggregation tables to use to answer queries if there are more than one candidate table available in the model. It is important to note, the dataset I worked with in this blog post was very small. It is not clear if the same scenario would happen in an actual large dataset.

SQL Server Analysis Services (SSAS) Multidimensional, (the other system I used agggrations extensively in) has a complex set of rules when considering aggregations to answer queries. For example, one of the rules was the 1/3 rule. If an aggregation was larger than 1/3 of the main table (fact table) it would not be created at all. I wonder if Power BI uses a similar set of rules based on either the number of rows in a table or the table’s compressed size. Hopefully Power BI team will provide more details over time.

Regardless of which rules Power BI uses to choose agggration tabltes, always check the important queries you intend to answer from the agggration tables, to make sure the aggregation table you intended to be used is the one being used. You can use either DAX Studio as I showed in my previous post or Extended Events for this.