I will be speaking at #DataWeekender on Saturday May 2nd.

I will be presenting “Using Power Apps in Power BI reports: Enabling writeback”. Join me to learn how to build a simple Power Apps app that can be integrated into a Power BI report to update data and have it refresh the Power BI report automatically after the data updates to reflect the changes.

Register here: https://www.dataweekender.com/

Here are the slides:

Previous blog post on this topic here.

Using Power Apps in Power BI reports: Enabling writeback

The Power Apps visual is a powerful tool for adding writeback and other functionality from Power Apps into Power BI reports. This visual makes it possible for users to take action directly from Power BI reports and update the data in real time from the reports which then can be immediately reflected back in the reports. This ability to edit data directly from a Power BI report, brings another layer of functionality to Power BI reports that was not previously possible.

The Power Apps visual first became available as a custom visual in 2018 and then as one of the default visuals as of the October 2019 release of Power BI Desktop.

The Power Apps visual provides an important functionality to refresh a Power BI report page automatically which eliminates the need for the end user to manually refresh the page by clicking on the Refresh option from the Power BI menu to see changes in the data.

In this post, I will show you how to add a simple app to a Power BI report to update the data in the report and have the app automatically refresh the page. All of this can be done with a few lines of code thanks to all the work that has been done in Power Apps to make the app creation experience extremely user friendly and relatively easy to learn.

Design

Before we get started, let’s take a look at the overall design. Our goal is to create a Power BI report and add a Power Apps app (referred to as “app” for the rest of this post) so that we can update the report’s data and have the page refreshed automatically to reflect the changes we make.

In order for the page refresh to reflect the changes immediately, the table being updated by the app, must have its storage mode set to DirectQuery. If its storage mode is set to Import, the changes won’t be reflected with a page refresh in Power BI service until the dataset (data imported in the model) is updated by a dataset refresh. To read more about table storage modes, see this.

Process

Step 1: The user makes some selections in the Power BI report which provides context (current filter and row selections) to the app. This way the app can have some fields prefilled so that when the user wants to update records, the user does not have to type everything form scratch.

Step 2: The user submits the updates by clicking on a button(or icon) in the app. This will update the underlying SQL table.

Step 3: The app triggers a page refresh through a function called PowerBIIntegration.Refresh(). This function is the key to this design. In fact, the Power Apps have been available for use in Power BI for a while, but the integration part is a new feature (within the last year) and takes their usefulness to a whole new level.

Step 4: Once the page is refreshed, the user will see newly updated data reflected in the report.

Data Source

The examples shown here are based off of the AdventureWorksDW sample database. I added a sample sales table based on FactInternetSales. To keep things simple, this table has all the fields that the report needs. It does not contain any foreign keys to other tables. This way the focus of this post can remain on the report/app integration. (To learn more about setting up an app that uses more than one SQL table, see this post by Brian Knight.)

CREATE TABLE [dbo].[FactSmallSales](
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [tinyint] NOT NULL,
	[ProductName] [nvarchar](250) NULL,
	[ProductSubcategoryName] [nvarchar](250) NULL,
	[CustomerName] [nvarchar](250) NULL,
	[SalesAmount] [money] NULL,
	[OrderDate] [date] NULL,
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[FactSmallSales] ADD  CONSTRAINT [PK_FactSmallSales] PRIMARY KEY CLUSTERED 
(
	[SalesOrderNumber] ASC,
	[SalesOrderLineNumber] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]
GO

This table has a primary key on SalesOrderNumber and SalesOrderLineNumber. I will explain later why this is needed for the simple app to work. You can work with a table without a primary key in Power Apps but you have to do more work to update it vs. if you work with a table that has a primary key.

Insert into [dbo].[FactSmallSales]
Select
top 1000 
SalesOrderNumber,
SalesOrderLineNumber,
EnglishProductName  as ProductName,
dpsc.EnglishProductSubcategoryName as ProductSubCategoryName,
dc.LastName  + ', ' + dc.FirstName as CustomerName,
SalesAmount,
OrderDate
from [dbo].[FactInternetSales] fs 
Join DimProduct dp on fs.ProductKey = dp.ProductKey
join DimProductSubcategory dpsc on dpsc.ProductSubcategoryKey = dp.ProductSubcategoryKey
join DimCustomer dc on dc.CustomerKey = fs.CustomerKey

Power BI report

Once you have the FactSmallSales table ready, open Power BI Desktop and use Get Data to point to the FactSmallSales table. When prompted for storage mode, make sure you choose DirectQuery; otherwise the refresh won’t work as expected. Note: The model can have other tables in Import or Dual storage modes but the table to be updated must be in DirectQuery.

The following picture shows a sample report I created. I have one chart showing Sales by ProductSubCategory and a card showing total sales. The table visual at the bottom shows details of FactSmallSales.

At this point, it is very tempting to click on the Power Apps visual from the visualizations and add it; however, it is very important that you follow the recommended order of operations and deploy the report to Power BI service first.

After you add the visual in the service and connect it to a Power Apps app, you can download a copy to your machine and save it. Even then, the Power Apps visual cannot refresh a page in Power BI Desktop. The page refresh only works in the service.

Make sure to deploy the report to Power BI service first.

Adding a Power Apps app

Once you open the report in Edit mode in Power BI service, click on the Power App visual.

The visual provides very clear instructions on how to configure it.

First select the fields you want to make available to the app. In this case, select all the fields from FactSmallSales since we want the user to be able to edit all the fields in this table.

This next step is critical. You are given a choice of using an existing app or creating a new one.

A new app will always work properly for both data integration between the Power BI report and the app as well as the automatic page refresh by the app.

The existing apps fall into two categories and only one will work properly for the page refresh. If you watch videos posted online, you will see that some people can get an existing app to work while others say that you must create a new app.

  • An existing app only works if it were previously created from the Power BI service.
  • An existing app that was not previously created from Power BI will not integrate properly with Power BI. If you try to use such an app, Power BI will actually open the app and make some changes and it appears that it is going to work but even if you get the data integration part to work by manually adding some code, the page refresh will not work. The PowerBIIntegration.Refresh() is not available for apps previously created in Power Apps.

The moral of the story is: Always create a new app or use an app that you have previously created from Power BI service.

Here we will create a new app. To continue, click on Create New. This will open Power Apps Studio with a newly created app. The app has one screen with a Gallery on it that shows one item per row of FactSmallSales. You can preview the app by clicking on the play button on the top right corner of the studio.

As you can see, you already have a fully functional app showing one field (Customer Name) from the table. At this point, if you save the app and go back to Power BI, you will have an app that will interact with the report. If you click on the bar chart in the report, the app will show the customers for that ProductSubCatergory. This works because Power BI is passing context to the app.

In Power Apps Studio, take a look at what is created for you. On top of Screen1, you see PowerBIIntegration. This is the connector that makes it possible for the context to flow from Power BI to the app.

If you take a closer look at the Gallery, you will notice that its Items property is set to ‘PowerBIIntegration’.Data.

Let’s make some changes to the app to make it more useful and be able to update some records.

First let’s work on the Gallery. A gallery is very easy to work with. You make changes to the first item in the Gallery and the rest of it will mimic the changes. In this case, you can drag the bottom of the first item to make it larger and then copy and paste the Customer Name field several more times. Since each item in the gallery gets its data from the source of the Gallery (in this case PowerBIIntegration.Data), you can change the source for each field from CustomerName to a new field by modifying its Text property. There is inttelisence built in that helps you choose the fields.

The following picture shows how I designed the first tile of the Gallery.

Next we need to add a second screen to edit selected items from the Galley. Go to the Home tab and click on New Screen and choose Form.

This creates a new screen with one Form for editing. To learn more about Forms in Power Apps see this

A form has two important properties:

  • Data Source: This is the table that will get updated.
  • Items: This is where the form gets its fields populated from.

Note: Even though the fields are already coming from the Power BI report, you must connect the form to your SQL data source for it to be able to write changes back to the table

To set the Data Source, click on the Connect url in the form. This will guide you to create a new connection. Choose SQL database and connect to FactSmallSales.

(Reminder: If you are working with an On-premises database, you need to have an entry for the database in the On-premises Data Gateway)

Once you make the new connection, choose the connection in the Data Source drop down. Next click on Edit Fields and choose all the fields. The order you click on the fields is the order in which they will be added to the form. You can change that later from the Fields window.

At this point, you get a nice screen with a form with all the fields on it but the form is empty. To get the form to prepopulate from the selected item from the Gallery from the other screen, you need to set the form’s Items property. With the form selected, find the Items property and set it to Gallery1.Selected. This will populate the form.

You can change the title of the form by clicking on it. Change it from “Title” to “Edit Sales Order”.

So far we have a form that is showing the selected item from the Gallery. Let’s get the form to save its values to FactSmallSales. If you click on the checkmark icon on the top right corner of the form, you will see that Power Apps has added a line of code to its OnSelect property:

SubmitForm(EditForm1)

This one line of code is all you need to save the data back to the table! This is amazingly simple! This function works well if you have a primary key on the table. You can add more advanced code for error handling and other cases where you want to update many records at once (See Patch), but this one line of code is sufficient when you are working with one table with a primary key.

Once the data is saved, we want the app to refresh the Power BI report and the app to navigate to the first screen. Modify the OnSelect property to look like this.

SubmitForm(EditForm1);PowerBIIntegration.Refresh();Navigate(Screen1);

PowerBIIntegration.Refresh() is what refreshes the page. (If you get red squiggly lines under this function, you are most likely editing an old app that cannot refresh the page.)

Navigate is a is very simple function and self explanatory. You can read more about it here. It lets you move between screens.

The last thing left to do on this screen is to set the OnSelect property of the X icon to let the users cancel out of the edit form if they change their mind. Set the OnSelect property of the X icon to Navigate(Screen1).

This finishes our work on the second screen. The last step left is to add a way for users to navigate from Screen 1 to Screen2.

On Screen1, select the gallery. Click no the “>” icon on the first item in the gallery. Set its OnSelect property to Navigate(Screen2).

To make the first screen look nicer, you can add a header to it. The easiest way to do this is to select the header from Screen2 and copy it over to screen 1 and change its text to “Sales Orders”.

You are almost ready to use the app in your Power BI report. You must first save the app and publish it. Go to File and choose Save and follow the instructions.

The creator of the app gets access to the app by default. If you want other users to use the app, you must give them access by sharing it.

Now if you go back to your Power BI report, the Power Apps visual will load your app. You report will look like this:

Select the first item from the Gallery and click on the “>” icon to go to the Edit screen. Change its ProductSubCategory to “Test Bikes” and change its sales amount to 327800.27. Click on the checkmark icon to save the changes.

Your report will update to reflect the changes. You will have a new bar in the chart for “Test Bikes” and the SalesAmount shown in the card updates to $4M.

This is a very useful feature for users who want to analyze data and update data all from Power BI.

Future app changes

As long as you keep the Power BI report and the Power Apps windows open, you can go back and forth between them and make changes to the app and see the changes back in the report once you save the app.

If you make a change to the app and don’t see your changes immediately in the Power Apps visual, save the report, close it and reopen it.

For all future edits to the app, you should start the process by editing the Power BI report and then editing the Power Apps visual to get to the Power Apps studio. This is required if you are adding new fields to the app. Additionally, this way you will have data in the Power Apps studio while you work on the app.

Limitations

The Power Apps visual has some limitations.

  • In order to add new fields to the app, the app must be edited from Power BI Service after changes to the Power BI report fields.
  • Can’t trigger a refresh of Power BI reports from Power BI Desktop.
  • Can’t filter a Power BI report. It receives context from a report but cannot send context back.
  • Not supported in Power BI Report Server.

Always Publish the report to PBI Service first.

Summary

The order of operations matter to get the automatic page refresh to work:

The following shows the different scenarios I tested with the results:

Report Created In Power BI DesktopReport Published and  Edited In ServiceNew App from Power BI ServiceExisting Power App Not Created from Power Apps visualExisting Power APP Created from Power Apps visualPage Refresh Works?
YesNoNo
YesYesYesYes
YesYesYesNo
YesYesYesYes

To summarize:

  • You must first publish the report to the service, open the report from the web interface and edit it.
  • Add the Power Apps visual.
  • Select fields to add them to the visual.
  • Follow the visual’s instructions.
  • Save and share the App.

Download Sample Code

I have uploaded the Power BI report and app that I used for this post to my GitHub repository. Once you open the reports, you must go to Edit Queries –> Data Source settings or follow the prompts if you click on Apply Changes to point the reports to your database. Similarly you have to connect the Power Apps sample app to your database for it to start working. Go to Screen2 and click on the Connect url on the form and follow the instructions.

Useful links and videos

Aggregations and Time Series calculations in Power BI

I have received a couple of questions about Aggregations in Power BI and whether they can be used to cover time series calculations such as Year to Date, Quarter to Date, and Month To Date. The answer is yes. Since time series calculations break down into calculations over a series of days, an aggregation table defined at day level with the basic summarization methods (min, max, sum, count) and the right relationship with a Date dimension, can answer Year to Date, Quarter to Date, and Month To Date calculations.

Let’s take a quick look at one such calcualtion and how it can be covered with an aggration. I am going to use the same version of AdventureWorks sample database and Power BI model that I used in my previous blog post on aggregations, with a few changes.

Sample Queries

My goal is to cover the following two queries that both return Year to Date Sales Amount by Date so that they can be answered from an Aggregation table vs. the main Sales table:

The first query uses the DATESYTD function to generate a set of dates to sum Sales Amount over.

EVALUATE
SUMMARIZECOLUMNS (
    'Date'[Date],
    "Sales Amount", 
    CALCULATE(SUM ( 'Internet Sales'[Sales Amount] ) ),
    "YTD Sales Amount",
    CALCULATE(SUM ( 'Internet Sales'[Sales Amount] ),
              ALL('Date'),
              DATESYTD('Date'[Date])
              )
)
ORDER BY 'Date'[Date]

The above query can also be written using the TotalYTD function instead and returns the same results.

EVALUATE
SUMMARIZECOLUMNS (
    'Date'[Date],
    "Sales Amount", 
    CALCULATE(SUM ( 'Internet Sales'[Sales Amount] ) ),
    "YTD Sales Amount",
    TOTALYTD(SUM ( 'Internet Sales'[Sales Amount] ), 'Date'[Date])
)
ORDER BY 'Date'[Date]

If you were to graph these results, you would see something like this:

Creating the Aggregation table

I created an aggregation table at Day level in AdventueWorks database using the following SQL code:

SELECT	fs.DateKey,
		SUM([Sales Amount]) AS [Sales Amount]
INTO dbo.Sales_Agg4_By_Day
FROM [dbo].[vFactInternetSales] fs
GROUP By fs.DateKey

This table has 1,024 rows which is much smaller than the original sales table with 60,398 rows.

Power BI Model

I imported this table into the model and connected it to the Date table on DateKey. Notice how in the model below, this relationship is “1 to many” with the aggregation table on the many side. I will explain this in more details in the next section.

Relationship between the Sales_Agg4_By_Day and Date tables

When I first connected the Sales_Agg4_By_Day and Date tables, Power BI created a 1-to-1 relationship between the two tables with a bi-directional cross filter. I had to change this before I could configure the aggregation table, otherwise the interface did not allow it and gave a series of warning messages, including:

An aggregation table cannot have a bidirectional filter.
An aggregation table cannot be referenced by a one to one relationship.

Table storage modes

I explained table storage modes in detail in my previous blog post on Aggregations. Here everything is the same as it was in that post with the exception of the Date table:

  • The Internet Sales and Sales Territory tables are in Direct Query storage mode.
  • The Product tables are in Dual storage mode.
  • The Sales_Agg4_By_Day table has Import storage mode.

The Date dimension table is in Dual storage mode which is a change from my previous post where it was left in Direct Query. Here, its storage mode must be set to Dual; otherwise, even after the Sales_Agg4_By_Day table is configured as an aggregation, it won’t be used to cover queries. This is because the relationship between the Sales_Agg4_By_Day and Date tables is considered to be a weak relationships:
One side in Import, the other side in Direct Query.

It is interesting that in this case, Power BI won’t stop you from configuring the aggregation, it just won’t be used. If you use DAX Studio or another tool to check the usage of the aggregation, you will see a failed attempt at using the aggregation table and the only message you get is that no mapping was found for the DateKey column even though you have a relationship in place. Even though there is no mention of a weak relationship in the reason for the miss, the weak relationship is the main culprit here.

It is important to remember:

Agggreation hits based on relationships are only considered when the tables involved in the query have the same storage mode or one side is at least set to Dual where it can change its nature on demand.

Configuring the Sales_Agg4_By_Day table

Configuring the Sales_Agg4_By_Day table is easy. The following picture shows the details:

Since there is already a relationship between the Date and Sales_Agg4_By_Day tables, there is not need to fill out the row for DateKey in the aggregation properties window.

Profiling the sample queries

I used DAX Studio to connect to the Power BI model. I ran the sample queries after configuring the aggregation with the Server Timings option selected so that I can see the details of how the query was answered. (Alternatively, you could use SQL Server Profiler/Extended Events and run a trace against your PowerBI model to monitor the same event but that is a lot more work.)

Both queries were able to use the aggregation table instead of the main sales table. The following shows the details of the “Aggregate Rewrite Attempt” event. It shows that Power BI was smart enough to use the aggregation after it broke down the TotalYTD and DatesYTD functions into a series of dates.

Conclusion

Time series functions that break down to a series of dates, can be covered by an aggregation table defined at day level.

Aggregation Precedence in Power BI: How does Power BI choose between candidate aggregation tables with the same Precedence?

Precedence is one of the aggregation properties that you can define on an aggregation table in Power BI. In a model with multiple aggregation tables, you can use Precedence to define the order in which aggregation tables will be considered by Power BI to answer queries. The higher the Precedence number, the sooner the aggregation table will be considered. Very simple and easy to understand. but the question is:

What does Power BI do when there are multiple aggregation tables configured with the same Precedence value and they can all answer the same query? Which one is considered first? Does it choose the smallest one? or is there another rule in place?

Spoiler Alert: It does not seem to choose the smaller aggregation table.

Background

In my previous post on Aggregations in Power BI, I used an example where the model had two aggregation tables:

One table at Product and Year level with 288 rows:

CREATE TABLE [dbo].[Sales_Agg1_By_Product_Year](
	[ProductKey] [int] NOT NULL,
	[Calendar Year] [smallint] NOT NULL,
	[Sales Amount] [money] NULL,
	[Unit Price Sum] [money] NULL,
	[Unit Price Count] [int] NULL,
	[InternetSalesRowCount] [int] NULL
) ON [PRIMARY]


INSERT INTO dbo.Sales_Agg1_By_Product_Year
SELECT	fs.ProductKey,
		dd.[Calendar Year],
		SUM([Sales Amount]) AS [Sales Amount],
		SUM([Unit Price]) AS [Unit Price Sum],
		COUNT([Unit Price]) AS [Unit Price Count],
		COUNT(*) AS InternetSalesRowCount

FROM [dbo].[vFactInternetSales] fs
JOIN [dbo].[vDimDate] dd
ON fs.DateKey = dd.DateKey
GROUP By fs.ProductKey, dd.[Calendar Year]

One table at Year level with 5 rows:

CREATE TABLE [dbo].[Sales_Agg2_By_Year](
	[Calendar Year] [smallint] NOT NULL,
	[Sales Amount] [money] NULL
) ON [PRIMARY]

INSERT INTO dbo.Sales_Agg2_By_Year
SELECT	dd.[Calendar Year],
		SUM([Sales Amount]) AS [Sales Amount]
FROM [dbo].[vFactInternetSales] fs
JOIN [dbo].[vDimDate] dd
ON fs.DateKey = dd.DateKey
GROUP By dd.[Calendar Year]

Obviously, before configuring the second aggregation table, Power BI used the first and only aggregation table available to answer the test query.

Before configuring the second Aggregation table.

After configuring the second aggregation table, Power BI picked the second one.

After configuring the second aggregation table.
Before setting the Precedence property.

Since the second table was smaller, this seemed like a desired behavior but the question is: Did Power BI pick it because it was smaller or was there another factor in place? Unfortunately as you will see shortly, there seems to be something else at play.

Before I show you that, for reference, when I set the Precedence property of the larger aggregation tableto 10, Power BI chose the larger aggregation table to answer the respective queries.

After configuring the second aggregation table.
After setting the Precedence property of the first aggregation table to 10.

This last example was only used to demonstrate that the Precedence property can direct Power BI to consider one aggregation table over another one.

In a real case, if there is more than one aggregation table configured and they can all be used to answer the same query, you do want the smallest aggregation table to be considered first.

What does Power BI do when aggregation tables have the exact same Precedence

To find out why Power BI picked the second aggregation table in the absence of precedence (0), I added a third aggregation table to the mix.

CREATE TABLE [dbo].[Sales_Agg3_By_ProductCategory_Year](
[ProductCategoryKey] [int] NOT NULL,
[Calendar Year] [smallint] NOT NULL,
[Sales Amount] [money] NULL
) ON [PRIMARY]

INSERT INTO dbo.Sales_Agg3_By_ProductCategory_Year --49 rows
SELECT	dp.[ProductSubcategoryKey],
		dd.[Calendar Year],
		SUM([Sales Amount]) AS [Sales Amount]
FROM [dbo].[vFactInternetSales] fs
JOIN [dbo].[vDimDate] dd
ON fs.DateKey = dd.DateKey
JOIN [dbo].[DimProduct] dp
ON fs.ProductKey = dp.ProductKey 
GROUP By dp.[ProductSubcategoryKey],
	 dd.[Calendar Year]

This table has 49 rows. It is smaller than the first aggregation table with 288 rows but larger than the second one with 5 rows.

I configured the third aggregation table similar to others.

Since the query in question, is by Calendar Year, I did not configure ProductCategoryKey and did not connect the ProductCategoryKey column between the Sales_Agg3_By_ProductCategory_Year and the Product Category tables in the model.

EVALUATE 
SUMMARIZECOLUMNS(
	'Date'[Calendar Year],
	"Sales Amount", SUM ('Internet Sales'[Sales Amount])
)

Multiple Aggregation tables with the same Precedence

After adding and configuring the third aggregation table in the model, the test query was answered from the third aggregation table:

It seems like Power BI chose the last configured Aggregation table to answer the query.

(Note: As expected, connecting ProductCategoryKey between the Sales_Agg3_By_ProductCategory_Year and Product Category tables in the model, does not make any difference in this example. This is because that column is not in use for the query in question.)

I repeatetest the same test as before but this time I changed the order of the opeations. I added Sales_Agg3_By_ProductCategory to the model first, configured it, then added Sales_Agg2_By_Year and configured it. As I was suspecting at this point, Power BI chose Sales_Agg2_By_year which was the last table added to the model.

At this point I had yet another question in mind: In the absence of different Precedence properties, is it the order in which aggregation tables are added to the model or the order in which they are configured that affects the order in which they are considered to answer queries?

To find out, I tested the following combinations:

Agg2 configured 1st.
Agg3 configured 2nd.
Agg 2 configured 2nd.
Agg 3 configured 1st.
Agg 2 added first. Close and Apply.
Agg 3 added second. Close and Apply.
Query uses Agg3.Query uses Agg3.
Agg 3 added first. Close and Apply.
Agg 2 added second. Close and Apply. 
Query uses Agg2.Query uses Agg2.

The aggregation table last added to the model wins in all combinations.

A couple of notes:

(1) Agggration table one was not considered in either combinaton.

(2) I repaeted the same test with the Precedence set to 10 instead of 0 and got the same results.

Conclusion

It seems like Power BI is favoring the last aggregation table added to a model when there is a tie in the Precedence property between them. At the moment, there are not many details available about the inner workings of how Power BI decides which aggregation tables to use to answer queries if there are more than one candidate table available in the model. It is important to note, the dataset I worked with in this blog post was very small. It is not clear if the same scenario would happen in an actual large dataset.

SQL Server Analysis Services (SSAS) Multidimensional, (the other system I used agggrations extensively in) has a complex set of rules when considering aggregations to answer queries. For example, one of the rules was the 1/3 rule. If an aggregation was larger than 1/3 of the main table (fact table) it would not be created at all. I wonder if Power BI uses a similar set of rules based on either the number of rows in a table or the table’s compressed size. Hopefully Power BI team will provide more details over time.

Regardless of which rules Power BI uses to choose agggration tabltes, always check the important queries you intend to answer from the agggration tables, to make sure the aggregation table you intended to be used is the one being used. You can use either DAX Studio as I showed in my previous post or Extended Events for this.

Aggregations in Power BI

Aggregations became generally available in Power BI in July 2019. Aggregations are a modeling optimization technique that can improve user experience when dealing with large datasets. Combined with Power BI composite models, they make it possible to analyze big datasets efficiently by allowing high level analytical queries to be answered from memory while sending more detailed queries back to the source database. The trillion-row demo by Christian Wade is one example of this scenario that shows how aggregations can make it possible for big datasets to be analyzed in Power BI. Even though this demo is based on a trillion rows (quarter peta bytes) of data, any dataset that is expensive to cache in memory because of space, CPU, or time, can benefit from aggregations. Additionally, sometimes even Power BI reports built off of smaller datasets can benefit from faster load time if they can use aggregations.

The concept of aggregations

The idea of Aggregations is not a new concept in the world of data. In fact, you may have used them in the past without knowing it. Essentially an aggregation is a summary object created from a detail object and it can answer some summary reports in the place of the detail object.

Summary tables in SQL Server

For example, consider the following example. Let’s assume you have a Sales table containing sales amounts for 5000 products sold at 2000 stores for 10 years and that every product is sold at every store on every day. This results in a table with 3.65 billion rows. If you were to write a SQL query against this data that returned Sales for each Store for each year, the query would take a long time to run. Instead, you could create a smaller summary (aggregation) table that already stores the data at a higher level than the original table and query that. For example, assuming there are 200 Product Groups, a summary table by Product Group, Year, and Store would have 104,000 rows which is a small fraction of the main table. Querying this table is much faster than the main table.

Even though this design improves end query performance, it makes the report development more complex. The report developer should decide which table to use for each report. The high level queries can use the summary table whereas more detailed queries must go back to the detail table.

In this design, the Aggregation table needs to be maintained as part of the overall database maintenance. When the main table is updated, the aggregation table needs to update as well.

Aggregations in SQL Server Analysis Services multidimensional (SSAS MD)

In SSAS MD, the predecessor to SSAS Tabular, aggregations are explicitly defined as actual objects that became part of the database. When designed properly, aggregations could significantly improve end user query performance. For example in larger cubes, you could see a difference between 1 minute vs. 5 seconds in query response times. Even though SSAS’s MD storage engine is disk based (vs. Memory base in Tabular), the size of aggregation files on disk is a fraction of the size of the measure group (fact table) files so they can be loaded into memory faster and searched faster.

Like the previous SQL server example, theses aggregations have a cost in terms of disk storage space and the amount of time it takes to process them to keep them updated. The more aggregations are created, the more queries they cover but they will take more space and need more time to process. There is also a point when adding more aggregations can have a negative impact on performance. This is like having too many indexes on a SQL table. I had a case where removing one aggregation that was automatically created by the wizard, improved performance because the queries could hit another manually designed aggregation.

Ultimately the SSAS developer must come up with the right balance of aggregations and make sure that they are being used. In SSAS, this is an advanced task that many developers did not want to get into. If you used the wizard, you had to have all attribute counts specified correctly, the correct relationships in place, … and even then, you may have not got what you wanted. There were many rules in place, that even if you requested an aggregation using hints in the wizard, the engine could have refused to create it. Usually there was a good reason for this decision as the engine weighted out the cost of using the potential aggregation vs. going directly to the fact objects; however, if this was not the case, you could manually create the aggregation. For complex cubes (lots of dimensions), sometimes it was better to skip the wizard entirely and manually create a custom set of aggregations that covered a certain set of queries.

Once aggregations are created, report developers or end users do not need to know about their existence. The engine automatically uses them to answer queries when possible.

At the time of writing this post, SSAS Tabular does not offer aggregations; however, since SSAS Tabular is the engine behind Power BI, it would make sense to expect aggregations to make their way into SSAS Tabular and Azure Analysis at some time in future.

Aggregations in Power BI

In Power BI, Aggregations start as tables just like any other table in a model. They can be based off a view or table in the source database, or created in Power BI with Power Query. They can be in Import or Direct Query storage mode.

Once in the model, these tables can be configured so that the engine can use them instead of a detail table to answer queries when possible. The process of creating and configuring aggregations in Power BI is significantly easier than the process of creating aggregations in SSAS multidimensional.

Once an aggregation table is configured, it becomes hidden from end users. Report developers and end users don’t know that it exists and don’t need to change anything in how they query the dataset.

At this time, aggregation tables can only be used with detail tables that are in Direct Query storage mode themselves. It is not possible to configure aggregation tables the work with details tables that are in Import storage mode themselves.

Like SQL Server Summary tables and aggregation objects in SSAS multidimensional, aggregations in Power BI have a maintenance cost. Once they are configured, it is important to make sure they are actually being used.

Comparing Aggregations in different products

The following table compares aggregations as described in previous examples.

  Power BI SSAS MD SQL Server
Summary object explicitly called an Aggregation Yes Yes No
Maintenance Cost Yes Yes Yes
Reports must be aware of aggregations to use them. No No Yes
Modeler has full control over their design. Yes No: if designed by using the Wizard.
Yes: if designed manually.
Yes
Easily configured Yes No N/A
Modeler can control the order in which aggregations are considered by the engine to answer queries Yes No N/A
Modeler can create a set of aggregations to approximately cover a set of reports at once No Yes N/A
Modeler can create a set of aggregations based on a set of input queries to be covered (Usage Bases Optimization) No Yes N/A

Setting up aggregations in Power BI

For the rest of this blog post, I use the AdventureWorks sample database to review a couple of examples that show how to configure aggregation tables in Power BI and I will use DAX Studio to see how they are being used.

I created a few views on top of a the fact/dimension tables to simplify the model and take out the clutter. I also created two summary tables in the SQL database to be used as aggregation tables. You can download this version of AdventureWorks, the queries I use, and the pbix file from here.

This is how the model looks like:

The Internet Sales table has 60,398 rows, the Sales_Agg1_By_Product_Year has 288 rows, and the Sales_Agg2_By_Year has 5 rows.

Here is the SQL code I used to create the two aggregation tables:

CREATE TABLE [dbo].[Sales_Agg1_By_Product_Year](
	[ProductKey] [int] NOT NULL,
	[Calendar Year] [smallint] NOT NULL,
	[Sales Amount] [money] NULL,
	[Unit Price Sum] [money] NULL,
	[Unit Price Count] [int] NULL,
	[InternetSalesRowCount] [int] NULL
) ON [PRIMARY]


INSERT INTO dbo.Sales_Agg1_By_Product_Year
SELECT	fs.ProductKey,
		dd.[Calendar Year],
		SUM([Sales Amount]) AS [Sales Amount],
		SUM([Unit Price]) AS [Unit Price Sum],
		COUNT([Unit Price]) AS [Unit Price Count],
		COUNT(*) AS InternetSalesRowCount

FROM [dbo].[vFactInternetSales] fs
JOIN [dbo].[vDimDate] dd
ON fs.DateKey = dd.DateKey
GROUP By fs.ProductKey, dd.[Calendar Year]

CREATE TABLE [dbo].[Sales_Agg2_By_Year](
	[Calendar Year] [smallint] NOT NULL,
	[Sales Amount] [money] NULL
) ON [PRIMARY]


INSERT INTO dbo.Sales_Agg2_By_Year
SELECT	dd.[Calendar Year],
		SUM([Sales Amount]) AS [Sales Amount]
FROM [dbo].[vFactInternetSales] fs
JOIN [dbo].[vDimDate] dd
ON fs.DateKey = dd.DateKey
GROUP By dd.[Calendar Year]

For the rest of this blog post, I will start with all the tables in Direct Query storage mode and configure the two aggregation tables. I will use DAX studio to see if the queries I plan for the aggregations to cover get answered from them. Then I will move the aggregation tables into memory by changing their storage mode to Import. You can read more about storage modes and composite models here. In short, the Direct Query storage mode keeps the actual table data in the data source, the Import storage mode loads the table data into Power BI’s in-memory engine called VertiPaq. Finally there is the Dual storage mode where a table can behave either as a Direct Query table or as an Import table based on a query. In Dual, a copy of the table data is loaded into VertiPaq but it is not always used.

Example 1: No Aggregations defined

To establish a baseline, let’s first take a look at what happens when a query runs against the model that is entirely in Direct Query and no Aggregation tables have been configured yet.

Here is a sample query that returns Sales Amount for each ProductKey.

EVALUATE 
 SUMMARIZECOLUMNS(
     'Product'[ProductKey],
     "Sales Amount", SUM ('Internet Sales'[Sales Amount])
 ) 

If you run this query in DAX Studio (make sure the Server Timings option is selected in the top ribbon), you will see the following:

This query generates a SQL event which sends a SQL query back to SQL Server. This is because this model is in Direct Query. If you look at the details of the query, you can see that it is joining the views that Internet Sales and Product are based off of.

Example 2: Detail and Aggregation tables in Direct Query mode.

Even though the ultimate use case of an aggregation table is when it is in Import mode, as you will see, queries can benefit from an aggregation table even if it is in Direct Query itself.

Configuring Sales_Agg1_By_Product_Year

You can configure an aggregation table by right clicking on any table in the model and choosing Manage Aggregations. In the pop up menu, you will see one row for each column of the aggregation table. Not all rows need to be filled out for an aggregation table to work.

Example of an Aggregation based on relationships

To cover the query mentioned before, the only row that needs to be filled out is the Sales Amount. Here you are telling Power BI desktop that if a query comes in asking for Sales Amount column from the Detail table of Internet Sales, it can use the Sum of Sales Amount from the Aggregation table of Sales_Agg1_By_Product_Year.

The reason this is all that is needed to cover the previous query is that the aggregation table is connected via a relationship to the Product table. In the absence of relationships you have to use a different method that I will explain later in another example.

Now if you run the same query in DAX Studio, you will see that it runs in 27 ms vs. 49 ms. You can see that even in such a small dataset, being able to use a smaller table to answer the query made a difference in total run time. In this example, the detail table had 60,398 rows and the aggregation table had 288 rows. You can imagine what a different in run time it would make if the detail table has millions of rows and the aggregation table has thousand of rows.

This time a RewriteAttempt event is generated before the SQL event. In this case, the Rewrite is successful.

In the details of the Rewrite Attempt you can see how Power BI was able to make the match.

If you look at the SQL event now, you will see that this time it used the Sales_Agg1_By_Product_Year instead of the detail table. You can also see how it is able to join the aggregation table with vDimProduct based on productkey.

At this point with this simple configuration, the aggregation table is able to answer any query asking for Sales Amount by any of the Product, Product Subcatergory and Product Category dimensions. This is possible because of the existing relationship between the aggregation table and Product.

Example of an Aggregation Miss

If you are wondering how an aggregation miss looks like, if you run the following query, it will result in a miss since Order Quantity is not included in the aggregation table. Note that I did this for showing how a miss looks like. In real life, you would typically include all quantity columns from the detail table in the aggregation table.

EVALUATE 
 SUMMARIZECOLUMNS(
     'Product'[ProductKey],
     "Order Quantity", SUM ('Internet Sales'[Order Quantity])
 ) 

If you run this query in DAX Studio, you will see the following:

This time the RewriteAttempt failed expected. If you look in the Details, you will see the following telling you it could not find any information about Order Quantity in the aggregation table:

Obviously, the SQL event reverts back to using the detail table in this case.

Example of covering an Average calculation with an aggregation

If you look at the drop down for the aggregation summarization options, it does not include an option for Average. It only includes Sum,Min,Max,Count, and Count table rows. (Ignore GroupBy for now)

So how would you cover the following query?

SUMMARIZECOLUMNS(
	'Product'[Product Name],
	"Avg Unit Price", AVERAGE('Internet Sales'[Unit Price])
)

Thankfully Power BI is able to use these basic summarization functions to cover a lot more queries than what you may think it can. If it can break a calculation into these basic calculations, then it can use an aggregation to answer the more complex calculation. In the case of Average, it is able to break it down to Sum/Count. As long as you provide a Sum and Count entry in the aggregation table, it is able to use it.

Consider the following two queries:

Query 1:

EVALUATE 
SUMMARIZECOLUMNS(
	'Product'[Product Name],
	"Unit Price Sum", SUM ('Internet Sales'[Unit Price])
)

Query 2:

EVALUATE 
 SUMMARIZECOLUMNS(
     'Product'[Product Name],
     "Unit Price Count", COUNT('Internet Sales'[Unit Price])
 ) 

The aggregation table can be easily configured to cover each one of the above queries. All you have to do is to fill out the rows for Unit Price Sum and Unit Price Count in the Aggregation properties as shown below:

Once this is done, both of the queries are able to benefit from the aggregation as shown in the following pictures:

Now if you run the Average query, you will see how Power BI is able to break it down into Sum/Count.

If you look in the Details, you will see the following:

Finally note that you cannot use the “Count table rows” in place of defining an summarization for Unit Price Count. Even though the two are semantically the same in this example, Power BI looks for a count summarization to calculate Average.

Example of an aggregation not based on relationships

So far the queries we looked at have been using an attribute of the Product table. Now consider this query:

EVALUATE 
SUMMARIZECOLUMNS(
	'Date'[Calendar Year],
	"Sales Amount", SUM ('Internet Sales'[Sales Amount])
)

If you run this query at this point, it will not use the Sales_Agg1_By_Product_Year since this aggregation table does not have a relationship with the Date table. The aggregation table is at Year level whereras the Date table is at Day level so it is not possible to connect the two. This is the design situation you would run into with some of the big data systems.

In order to define aggregations not based off of relationships, you must use the GroupBy summarization. Note that the Detail table here is set to the Date table vs. Internet Sales. The GroupBy summarization has another use case later that will use Internet Sales as the Detail table.

If you run the previous query after you configure the GroupBY for Calendar Year, you will see the following:

Example of covering a Distinct Count calculation with an aggregation

The Distinct Count summarization if not available as a built in summarization; however, you can use the Group By summarization to implement it. Remember that Group By in SQL is the same as Distinct.

Consider the following query:

EVALUATE 
SUMMARIZECOLUMNS(
	'Product'[Color],
	"Distinct Products Sold", DISTINCTCOUNT( 'Internet Sales'[ProductKey])
)

To cover this query, you must add a Group By summarization for ProductKey to Sales_Agg1_By_Product_Year:

Note how the Group By for Calendar Year has the Date table as its Detail Table whereas the Group By for ProductKey has Internet Sales as its Detail Table. This is because the Group By is serving a different purpose in each scenario. In the latter case, its purpose is to provide a distinct count of ProoductKey.

If you run the previous query in DAX Studio, you will get a match. The following shows the data request:

If you look at the SQL query submitted, you will see that it uses the ProductKey column from Sales_Agg1_By_Product_Year

Perfromance Consideration: This method can be useful in scenarios where there are billions of rows in the detail table and two to five million distinct values in the column to be distinctly counted. If there are more than two to five million distinct values, this can affect query performance, meaning the aggregation will not help a query run faster.

Example 3: Adding a second Aggregation table, all tables still in Direct Query mode.

You can have more than one aggregation table in a model based on the same Detail table. For example, consider the second aggregation table of Sales_Agg2_By_Year. This table only has 5 rows compared to 288 rows of Sales_Agg1_By_Product_Year. For a query asking for Sale Amount by Year, it would be much faster if Power BI could use this smaller table. Keep in mind that this just an example and in real life you will not see a tangible difference between using a table with 5 rows versus using table with 288 rows. You would see a difference if the Sales_Agg1_By_Product_Year had 288,000 rows.

Configuring Sales_Agg2_By_Year

Before configuring Sales_Agg2_By_Year, if you run the following query you will see that it will be answered by Sales_Agg1_By_Product_Year,

EVALUATE 
SUMMARIZECOLUMNS(
	'Date'[Calendar Year],
	"Sales Amount", SUM ('Internet Sales'[Sales Amount])
)

Now let’s configure Sales_Agg2_By_Year. This aggrgation table does not have relationship with the Date table in the model so the Calendar Year column must be configured using the GroupBY summarization as shown below:

Now if you run the same query as before, you will see that it will be answered from Sales_Agg2_By_Year.

Aggregation Precedence

You can control the order in which, aggregation tables are considered by Power BI to answer queries with a precedence property on each aggregation table.

For example, let’s say that for some reason you wanted the previous query to be answered from Sales_Agg1_By_Year_Product_Year. Note that this does not makes sense since it is a larger table but I am only using this as an example.

You can set the precedence property on Sales_Agg1_By_Product_Year. The higher the number the earlier it will be considered by Power BI.

Now if you run the previous query, it will be answered from Sales_Agg1_By_Product_Year

Example 4: Detail table in Direct Query mode and Aggregation table in Import mode.

So far all tables have were in Direct Query mode. In most cases, you would get the best benefit from aggregation tables if you can load them into memory.

You can change the storage mode of a table by right clicking on the table in the Model view and getting the properties. You will see Storage Model under Advanced.

You can change the storage mode from Direct Query to Import. Note that this is not a reversible change.

Example of an Aggregation Hit

Let’s change the storage modes for both Aggregation tables to import. Power BI will give a set of warnings when you do this. It will warn you that this is not a reversible change and that it may take some time for the data to load. It will also suggest that you change the storage mode of the tables connected to Sales_Agg1_By_Product_Year to Dual. If they are not set to dual, the relationship between them and the aggregation table will be considered weak and the aggregation table will not be used to answer queries by the set of Product tables.

In order for an aggregation to be considered to answer a query based on relationships, the aggregation table and tables related to it that are involved in the query must have the same storage mode or one side has to be set to dual.

The following shows all the combinations that a relationship is considered Strong and an aggregation table is considered:

Many Side (Aggregation table) One Side (Dimension table)
Dual Dual
Import Dual or Import
Direct Query * Dual or Direct Query *

* Both sides in Direct Query have to be from the same data source.

 Finally note that Many-to-many relationships are always considered weak regardless of the storage mode of tables.

Once you change the storage mode of the Sales_Agg1_By_Product to Import, if you run the following query:

EVALUATE 
SUMMARIZECOLUMNS(
	'Product'[Color],
	"Sales Amount", SUM ('Internet Sales'[Sales Amount])
)

You will see that this time, the RewriteAttempt succeeds similar to before as expected. Also note how this query ran in 5 ms vs.

This event is followed by a Scan event instead of a SQL event. This is beause the Aggregation table is in import mode. If you look at the Details you will see the following code which is the internal language Power BI uses to query its in memory data.

Example of an Aggregation Miss

To see what an Aggregation Miss looks like, run the following query:

EVALUATE 
SUMMARIZECOLUMNS(
	'Product'[Color],
	"Order Qty", SUM ('Internet Sales'[Order Quantity])
)

Order Quantity does not exist in the Aggregation tables so this results in a RewriteAttempt failure.

Note how this query took 101 ms versus 5 ms for the previous query that was answered from the aggregation table in memory.

The RewriteAttempt is followed by a SQL event.

Summary

Aggregations are a great optimization technique that can improve end user experience significantly even for datasets that are way smaller than a trillion rows.

Currently there are some limitations. The main one is that they can only be used on top of detail tables that are in Direct Query. Hopefully this will be lifted in future.

Aggregations are not useful for all calculations. For example, they cannot answer queries with calculations that iterate the data row by row. In these cases, a potential workaround is to see if the calculation that is being done row by row can be done with a calculated column instead.

Aggregations have a cost no matter how/where they are created so their actual usage should be monitored. This can be done with DAX Studio for individual queries or with monitoring the ‘Query Processing\Aggregate Table Rewrite Query’ event with SQL Server Profiler/Extended Events for a larger set of queries over a period of time.

Finally, even though Aggregations are a great technique for improving query performance, following best practices for a good model design should be the first step in optimizing a model.

Video

Here you can watch a video of a presentation I did for PASS BA Virtual Chapter on the same topic covered in this post.

A simple Power BI report to help you choose which PASS Summit 2019 activities to attend.

PASS Summit 2019 is about to start. There are many social ane educational events during this week. You can find a complete list here. It is somewhat difficult to get an overall look of events per day. I put together this simple PBI report to get a list of events by day/time/sponsor.

Here is a screenshot of the report:

See the report in Power BI Service here.
The PBIX file can be downloaded here.

Note: There are a few more community events that are not listed on PASS’s website. The best way to keep up with these events is to follow #sqlfamily and #PASSSummit on Twitter.