Looking forward to SQL Saturday Atlanta #521

This Saturday I will be teaching MDX basics. This one hour session is geared towards BI developers who are comfortable with SQL queries and have an good understanding of data warehouse concepts.

I will post the slide deck soon.

sqlsat521_web

Advertisements

MDX keyword colors are missing in SQL Server Data Tools for Visual Studio 2015 Preview

MDX keywords are not changing color in the calculation script in SQL Server Data Tools – preview version for Visual Studio 2015. Hopefully this is a temporary issue and will be fixed in the release version. Having different keyword colors than the rest of the code makes it much easier to read, write and maintain the calculation script especially in larger implementations.

SSDT Preview for VS 2015

The following picture shows how the Adventure Works calculation script looks like when using SSDT Preview for VS 2015 at the time of writing this post. Everything is in black with a grey background.

SSDT_2015_No_MDX_Keyword_Highlighting

I am using SQL Server 2016 Release Candidate (RC) 1 and the latest version of  the SSDT which can be found here: https://msdn.microsoft.com/en-us/library/mt204009.aspx. The latest version of SSDT tested for this post is 14.0.60316.0. Please note that unlike previous versions there seems to be only one single download/install available for SSDT. Previous versions made a distinction between SSDT and SSDT for Business Intelligence (SSDT-BI).  Previously SSDT only supported SQL Server Relational database projects and SSDT – BI supported all business intelligence project types. This does not seem to be the case at least with the preview version.

SSDT – BI for VS 2013

Compare the keyword colors between the above picture and the following picture which shows the calculation script in SSDT – BI for VS 2013:

SSDT_2013_With_MDX_Keyword_Highlighting

This script is much easier to read! Fortunately previous versions of SSDT are still available for download from here: https://msdn.microsoft.com/en-us/library/mt674919.aspx. Since SSDT – BI for VS 2013 seems to work fine with SQL Server 2016 (RC) 1, that’s what I will be using for SSAS development until the keyword colors are fixed in SSDT for VS 2015.

SSDT – BI for VS 2012

A similar problem existed with SSDT – BI in Visual Studio 2012. With SSDT – BI 2012, the script changed the color of comments and all the MDX keywords to green! (See next picture).  As a result most developers opted to use Visual Studio 2008 even against SQL Server 2012 to avoid using the black and grey script. (SSDT – BI was called Business Intelligence Development Studio or BIDS in VS 2008.)

This issue was reported to Microsoft Connect and is currently marked as closed: https://connect.microsoft.com/SQLServer/feedback/details/841790/ssdt-2012-analysis-services-cube-project-calculation-page-color-formatting

I installed SSDT -BI for VS 2012 to see if the problem is fixed as reported by Microsoft Connect. The following shows the Adventure Works calculation script in SSDT -BI  for VS 2012:

SSDT_2012_With_Partial_MDX_Keyword_Highlighting

It seems like the green MDX keyword colors still exist at least in the version I was able to download.

Versions of SSDT for VS tested for this post

The following pictures show the version information for the different SSDT that I tested for this post. As mentioned previously, there seems to be only one download of SSDT available for VS 2015 that has combined the database project and BI project support.

SSDT for VS 2015

Here you can see the BI version under SQL Server Analysis Services and SQL Server Reporting Services and the relational database project version under SQL Server Data Tools.

SSDT_Feb_2016_Preview_For_VS2015_Version_Info

The following pictures show the types of project that become available with the download of the latest version of SSDT for VS 2015:

 SSDT – BI for Visual Studio 2013

I have not installed the non-BI SSDT.

SSDT_VS2013_Version_Info

 SSDT – BI for Visual Studio 2012

Here I have not installed the non-BI SSDT.

SSDT_VS2012_Version_Info

 

 

 

Setting the slice for SSAS partitions

When SSAS 2005 came out the recommendation for setting the slice for MOLAP partitions changed. This is what the “SQL Server Best Practices Article” said: “For MOLAP partitions, you do not have to specify the slice because the server will be able to figure out the relevant partitions after the data has been processed.” Most people have seen the slice property of a partition which you can see by looking at the properties of a partition in BIDS or SSMS. In addition to this slice property, behind the scenes when SSAS processes a partition it records which attribute members exist in each partition and stores this data as ranges of Data IDs (internal surrogate keys) for these attribute members into an XML file. This is called auto-slice. When queries come along, it checks this file for each partition to see if it contains relevant data and whether or not to scan it from disk if not already in memory. 

It sounds great but it does not always work as expected. There are cases that when a query comes to SSAS with specific attributes in the where clause (for example a particular year in question),  SSAS scans extra partitions with other unwanted data (for example other years that the query did not ask for).  As a result,  many people now recommend to set the slice manually to avoid the possibility of extra unwanted partition scans. Setting the slice also safeguards against loading wrong data into a partition. Even though a robust ETL and row  count QA should not let such a scenario happen, if it does happen and the data in the partition does not match the slice specified, the partition processing will fail.

My recommendation is to go with the auto-slice and do not set the slice initially, then watch for extra partition scans. If you see that these extra partition scans are happening frequently and are causing an impact on query performance, then you should consider setting the slice manually if possible.

In this blog I provide a couple of scenarios that these extra partition scans can occur and show you how to read the related XML files and Data IDs. At the end I will give an example when setting the slice manually may not be always possible.

 

Unwanted partition scans

These are partitions that their data is not required to answer the current query. The unwanted partition scans can happen in at least the following cases:

(1) Partitions with Less than 4096 rows

Sometimes a partition has fewer than 4096 rows.  4096 is the default number for SSAS to build indexes for a partition and it is defined by IndexBuildThreshold parameter in the msmdsrv.ini file. When the number of rows falls below 4096 SSAS does not build any indexes for the partition. In the absence of indexes, information about ranges of data ids of related attributes for the partition is not stored by SSAS. If you see unwanted partition scans this could be the case. However, since these partitions are small (unless if you have many of them being scanned together) the impact on performance is usually negligible. Most fact tables in a data warehouse have more than 4096 rows so this is not very common. However it is possible to see this condition temporarily even with larger fact tables if you load them in smaller chunks for example daily. In this case, the first couple days of a month, the row count could be less than 4096 and as a result until the partition count gets larger than 4096, there will be no indexes and auto-slice information available. On the other hand chances are that most users ask for the most recent data in their queries so this kind of partition would be part of most queries anyways.

 

(2) Partitions with overlapping data ids

When indexes are built for each partition, ranges of data ids are recorded for each related attribute with a min and a max value. There are cases where the ranges of data ids for one attribute can have overlap between two or more partitions.  Mosha gives several examples of how this overlap can happen. He provides a query to look at the data ids generated by SSAS in MDX. I used this query to look at the data ids for several of the Date dimension attributes in AdventureWorks on my machine. You can also see the ranges of Data IDs recorded for each partition by SSAS by looking at the info XML files created for each partition.  I will go over the info file and the results of the queries from AdventureWorks next.

You can find these XML files under the data folder where SSAS stores the cube information. In AdventureWorks, the [Internet Sales] measure group is partitioned by Calendar Year. Interestingly the slice property is set for each partition. I removed all the slices from all Fact Internet Sales partitions to investigate how the auto-slice works.

The following picture shows the path to the info file for “Fact Internet Sales – CY 2008” partition from the AdventureWorks sample database for SSAS 2012. This partition has more than 4096 rows and you can see many files under this folder. Most of these files are index files (out of scope for this blog). There is also an info file in this folder. The extension of this file is a number that represents some kind of internal versioning to SSAS. Note how the number matches the prefix to all the other index files. This number increases every time the measure group is processed.

 

image

 

When you open the info file, you can see the Data IDs recorded for each related attribute. You can see the data id ranges under the tag section of <MapDataIndices>. Note that the partitioning dimension for Fact Internet Sales is the Date dimension in the cube and it is connected to Order Date in the fact table.

 

image

 

  If you look at the folder for partition “Internet Sales – CY 2005” , you see far fewer files than you see under the folder for the partition “Internet Sales – CY 2008”.

image

 

Plus the info file does not contain any <MapDataIndices> tags. There are no ranges of Data IDs recorded. The reason is that the row count for the “Internet Sales – CY 2005” partition is below 4096.

 

select

count(*) from [dbo].[FactInternetSales]

where

[OrderDateKey] between 20050101 and 20051231 –Returns 1013 rows.

 

In the following section I go over the ranges of Data IDs stored for the following attributes: Calendar Year, Calendar Quarter, Calendar Month and Calendar Week. As you will see soon there is no range overlap cases until we get down to the Week level.

 

(1) Calendar Year

The following query returns the Data IDs of the [Calendar Year] attribute.

 

WITH

MEMBER [Measures].CalendarYearDataID AS

DataId

([Date].[Calendar].CurrentMember)

SELECT

[Measures].[CalendarYearDataID] ON 0

, [Date].[Calendar Year].Members ON 1

FROM

[Adventure Works];

Here are the query results:

image

Note how the data IDs are not in the order you would expect, they don’t have the same order as the natural order of the years. If you look at the info files for the 2007 and 2008 partitions, this is what you find:

 

The following is a section of the info file for the 2007 partition.

image

 

The following is a section of the info file for the 2008 partition.

image

 

Let’s assume a query comes in asking for data for all of 2008.

 

SELECT

{[Measures].[Internet Sales Amount]} ON COLUMNS,

{[Product].[Product Line].[Product Line].Members} ON ROWS

FROM

[Adventure Works]

Where

[Date].[Calendar Year].&[2008]

 

If you run this query with a clear cache, the only partition that should be scanned is  the “Internet Sales – CY 2008” partition. However the partitions for 2006 and 2007 get scanned as well. This is because they have less than 4096 rows and no indexes and slice information is stored for them. (I removed their slice property as well, else in the absence of the index files, SSAS can still use the Slice property).

 

image

 

Note that the overlapping of the ranges of Data IDs cannot happen for an attribute level that you use only of member of which to partition with.  In the above example, there is only one year in each partition so the ranges for Year Data IDs  will not overlap, since they only contain one member. Setting the the slice at the Year level manually would however prevent the other small partitions from getting scanned.

 

(2) Calendar Quarter

 The following query returns the Data IDs of the [Calendar Quarter] attribute.

 

WITH

MEMBER

[Measures].CalendarQuarterDataID AS 

DataId

([Date].[Calendar].CurrentMember)

SELECT

[Measures].[CalendarQuarterDataID] ON 0

, [Date].[Calendar Quarter].Members ON 1

FROM

[Adventure Works];

 

Here are the query results:

image

The Data IDs for Calendar Quarters are in the same order as you expect them to be, all Quarters in 2007 have Data IDs less than Data IDs of Quarters in 2008. This means there is still no chance for a range overlap to happen for Calendar Quarter either.

If you look at the info files for the 2007 and 2008 partitions, this is what you find:

 

The following is a section of the info file for the 2007 partition.

 image

The range has a min value of 10 and a max value of 13 as expected.

 

The following is a section of the info file for the 2008 partition.

 image

The range has a min value of 14 as expected but note how the max value is 16 instead of 17. Why? The last day with data in Fact Internet Sales for 2008 is  July 31. The last Quarter is empty and hence is not recorded in the range.

Select

max(OrderDateKey) from [dbo].[FactInternetSales]

where

[OrderDateKey] between 20080101 and 20081231

–Retuns 20080731

 

(3) Calendar Month

The following query returns the Data IDs of the [Calendar Month] attribute.

WITH

MEMBER

[Measures].MonthDataID AS

DataId

([Date].[Calendar].CurrentMember)

SELECT

[Measures].[MonthDataID] ON 0

, [Date].[Calendar].[Month].Members ON 1

FROM

[Adventure Works];

 

Here are the query results:

image

 

The Data IDs of Months are the same order as you expect them to be. For example all Months in 2007 have Data IDs that are less than Data IDs of all months in 2008.  If you look at the info files you will see the corresponding Month Data IDs stored. The last Month Data ID stored for 2008 belongs to July.

 

The following is a section of the info file for the 2007 partition.

image

The range has min value of 26 and max value of 37 as expected.

 

The following is a section of the info file for the 2008 partition.

image

The range has a min value of 38 as expected. The max value is 44 which is the Data ID of July.  July is the last month with data in this partition.

Let’s assume a query comes in asking for data for January of 2008 which is 38. In this case, the “Internet Sales – CY 2008” partition will be scanned because 38 fits between its min and max range for Month data IDs.  Partitions “Internet Sales – CY 2008” and “ “Internet Sales – CY 2008” will also be scanned since they have no auto slice information stored. (I removed the slice property as well.)

 

(4) Calendar Week

This is where it gets interesting: You will see how the ranges of Week Data IDs recorded for multiple partitions overlap.

The following query returns the Data IDs of the [Calendar Week] attribute.

 

WITH

MEMBER

[Measures].WeekDataID AS

DataId

([Date].[Calendar Week].CurrentMember)

SELECT

[Measures].[WeekDataID]  ON 0

, [Date].[Calendar Week].[Calendar Week].Members ON 1

FROM

[Adventure Works];

 

Here are the query results:

imageimage 

The Data IDs of the [Calendar Week] are not what you expect. Weeks 1 of all years are listed first, then weeks 2 and so on.  If all partitions had row counts larger than 4096, the following ranges would have been stored for Data IDs of [Calendar Week] for each partition. We know this is not true in AdventureWorks and nothing is stored for partitions of 2005 and 2006. For partitions of 2007 and 2008, you can see how the ranges formed by Min and Max values have overlap. A query asking for the first week of 2008 with a Data ID of 5, only needs data from the 2008 partition however the Data ID of 5 fits in the Min and Max values of both of the partitions, as a result 2007 will be scanned as well (with a cold cache). Partitions 2005 and 2006 will also be scanned since they don’t have any ranges stored for them (and I removed the slice property).

 

Partition

Min Calendar Week Data ID

Max Calendar Week Data ID

Internet Sales – CY 2005 2 314
Internet Sales – CY 2006 3 315
Internet Sales – CY 2007 4 316
Internet Sales – CY 2008 5 317
Internet Sales – CY 2009 6 318
Internet Sales – CY 2010 7 319

 

This is what you find in the info files:

 

The following is a section of the info file for the 2007 partition.

image

 

The following is a section of the info file for the 2008 partition.

image

Once again the Max is 185 not 317. 185 is the ID of the last week with data in 2008.

If you have frequent queries coming at week level  and you can confirm that these extra scans are putting a burden on your system, then follow Mosha’s blog to specify the slice at the week level manually.

 

When setting the slice makes partition processing fail

 

If you set a slice property manually to a member that does not exist in a dimension yet, the partition processing fails with this message:

 

"Errors in the OLAP storage engine: The restrictions imposed on partition slice where violated."

 

SSAS takes what is specified as a slice and turns it into a set by running StrToSet(“The Slice Specified”,CONSTRAINED) on it. The Constrained parameter is what does not allow members that don’t exist to be part of the set. 

For example, the following query runs successfully.

 

–This works

Select {[Measures].[Internet Sales Amount]} ON COLUMNS, StrToSet("{[Date].[Calendar Year].[CY 2008]}",CONSTRAINED) ON ROWS

FROM

[Adventure Works]

–Returns $9,770,899.74

 

The following query fails:

 

–This fails.

Select {[Measures].[Internet Sales Amount]} ON COLUMNS, StrToSet("{[Date].[Calendar Year].[CY 2013]}",CONSTRAINED) ON ROWS

FROM

[Adventure Works]

The above query fails with the following message:

 

Executing the query …

Query (3, 2) The restrictions imposed by the CONSTRAINED flag in the STRTOSET function were violated.

Execution complete

 

Typically partition are created by some sort of program or XMLA script that runs along with ETL in a Production environment and creates new partitions as new data arrives. The slice property can be set programmatically at the same time if needed. Some implementations don’t have this mechanism in place for one or another reason and the partitions are created manually in a development environment and then moved to a production environment. In this case usually several future partitions are created to catch future data for several months or years until someone recreated the partitions. Now assume that the Date dimension that is used for partitioning is only populated up until the current date. There are no future months, quarters, … dates in it. In this scenario, it is not possible to set the slice for future partitions as their processing fails with the messages shown previously.

 

Bottom Line

Auto Slice works fine in most cases, the rest of the time you have to watch for unwanted partitions scans and consider setting the slice if you see extra partition scans that have a significant impact on performance.