I had the honor of speaking at PASS summit this year and I had a great experience. Thank you PASS organizers, board and volunteers for managing and running this event. Thank you everyone who joined my session at PASS Summit. It was great to hear your feedback and know that you found the material useful. I hope you find this blog post useful and would love to hear your comments.
This blog post is a high level summary of the session and points to the sample material accompanying the session. You can find the slides and sample code files at PASS website here. Search for SSAS to find my session.
You can also find sample code here: https://github.com/ShbWatson/PASSSummit2017
In this post I show you how to built a performance monitoring tool for SQL Server Analysis Services (SSAS) using free out of the box windows tools and Power BI Desktop. Along the way, I will share with you some of the metrics that you can measure to evaluate a SSAS server’s performance and health.
Why monitor SSAS for performance?
In many cases, SSAS works efficiently with default settings right out of the box. However, when you have large databases, substantial number of concurrent users, insufficient resources on your server, or when best practices are not followed during SSAS database design, you can run into performance bottlenecks and problems. In these scenarios, you need to know what to measure and how to measure them, what’s normal for your environment (benchmark), and you need to have some amount of historical measurements to be able to see the events that lead to a certain bad performance/failure point. Once you have this data, you can improve your server’s performance by addressing the problem(s).
Here is an example of what I ran into with a client:
One day I got emails from users saying they cannot connect to a SSAS server. My first thought was that the SSAS service had crashed and was in a stopped mode or that the server’s CPU was so busy that no one could get connections. This was a very busy server answering about 3 queries per second. However, by the time I logged into the sever (15 minutes after the initial user reports) the server looked fine to me. The SSAS service was running. The CPU usage was around %20. SSAS’s memory usage was around 30GB. This server had around 160 GB of RAM. There were about 20 users happily querying. Something had happened to the SSAS service but it had graciously (minus some lost connections and upset users) recovered to a good state by the time I got there. But what?
Let’s think about this: I said that the SSAS service was using 30 GB of RAM on a server that had 160 GB RAM. The question is: Is this a good thing or a bad thing? In the case of this sever, the normal SSAS memory usage should have been about 80 GB to 90 GB so the 30 GB was not a good place to be. The 30 GB was an indicator that SSAS had dumped a large amount of memory (along with user connections) and now was coming back up.
In this case if I had benchmarks I would have known what the normal expected numbers should be and if I had history I would have seen the memory trend along with lost connections and be able to correlate that with other pieces of information (perhaps a trouble query or another process starving SSAS’s cleaner thread) to figure out how to prevent this from happening again.
Why build a tool from scratch vs. buy a commercial tool?
If you work in an environment where you already have access to a commercial tools for monitoring SSAS, then use it. SentryOne and Quest are examples of vendors that offer such tools. At the time of writing this post, I have not used/evaluated them enough to be able to comment on their strengths and weaknesses.
If you work in an environment where you cannot purchase a commercial product and are stuck with what comes with Windows for free, then you have the option of putting a reporting tool together using free Windows tools and Power BI desktop. This was my motivation for putting this session together. I also like the flexibility of having Power BI as my dashboard because I can customize it.
I had used Windows Performance Monitor (Perfmon), SQL Server Profiler and Extended Events, and SSAS Dynamic Management Views (DMVs) all individually. I wanted to have one central tool to use to get all the information I could get from these separate tools. In this post I’ll show you how I collected information from all the above sources and stored it in a SQL Serve database. Once all this data was in a SQL Server database, I built a Tabular SSAS model on top of it. Having all this information in a Tabular model allowed me to quickly put together a set of reports in Power BI Desktop.
Let’s get started by learning about SSAS as a windows process.
Task Manager: Quick review of SSAS as a process
SSAS runs as a process on Windows and it requests resources from Windows. You can find SSAS’s process using Task Manger. Using Task Manager, you can quickly get a status of how much resources it is using. If you have benchmarks to compare to and know your SSAS’s normal numbers, you can tell right away if your SSAS is under any kind of resource pressure. you should open task manager on your server throughout a typical working day and get yourself familiar with the numbers (while users are not complaining).
Once you open task manager, open the Performance tab and then Resource Monitor. The SSAS process is called msmdsrv. This name is the same for both multidimensional and tabular instances. The following screenshot shows two instances of SSAS running on my test machine. I have SSAS multidimensional running as the default instance and SSAS tabular running as a named instance. Of course, you should not see this on a production server. It is not recommended to put the two instances on the same server.
The default instance, in this case the multidimensional server, shows as MSSQLServerOLAPService. The named instance shows as MSOAL$NameOfYourInstance. In this case, the tabular server is my named instance and it is showing under MSOLAP$ABI_TAB.
Using Task Manager you can get some information about the resources that SSAS is using. For example, you can see how much CPU an instance of SSAS is using. If the CPU usage is pegged at close to %90 to %100 for a long periods of time, you at least have a CPU bottleneck. It is also important to monitor the server’s CPU usage because there could be other processes running and consuming CPU not allowing SSAS to run smoothly. (Think middle of the day virus scanners or perhaps SQL server or SSIS running on the same server.)
Under the memory tab, you can see how much memory SSAS is using. As long as this number is within an acceptable range (below or around Low Memory Limit and below High/Hard Memory Limit…more on this later), you know that your server is in good place. Note that if SSAS’s memory is abnormally low similar to the example I mentioned before, you are most probably witnessing the moments after a memory dump or service crash.
You can also get an idea of how much SSAS is accessing disk. This tab is more useful if you have a multidimensional instance. Since SSAS multidimensional is disk based, when you go to this tab you can see which measure group /aggregation files are being read from disk. If you see some files constantly being read from disk, this can potentially lead you to problem areas in your design. (For example, large intermediate many to many measure group and dimensions that don’t fit in memory.) In the case of a tabular server with enough memory, you should only see disk activity during processing and initial loading phases.
Another useful point to remember, if you suspect someone is running a sync/process command against one of your databases in the middle of the day, you can use the disk tab which will indicate writes being committed. (Hmm… no one would know if I quickly did a sync of my changes into the production server in the middle of the day 🙂 )
Windows Performance Monitor
Most of the information you can see in Task Manager is also available from Windows Performance Monitor (Perfmon). You can use Perfmon to (1) collect information about the state of a SSAS server and its host machine over a period time with a predefined schedule and save it to a SQL Server database.
Before I we get started with Perfmon, I would like to mention that you can get to Perfmon two different ways which will impact what menu options you get:
(1) Open Pefrmon by going to Windows button and search for Perfmon.
(2) Open Perfmon by going to Windows button and searching for mmc.exe. This will open Microsoft Management Console. Then you can add Perfmon as an add-in. The benefit of getting to Perfmon this way is that you get the option of saving your current list of Perfmon counters on a machine and then you can load them back from the file in an instant versus having to add them one by one. If you don’s see the Save/Open option under the File menue, you have opened Perfmon directly.
Once you add Perfmon, you can see it in the console and if you click on Performance Monitor you can start adding counters.
Note: mmc might be already associated with SQL Server Configuration Manager. In this case, when you do the search for mmc go to the containing folder of the mmc.exe and create a shortcut for it. For example on my machine, I can find mmc.exe at C:\Windows\System32.
Using Perfmon Counters
Once you have Perfmon open, you can add counters by clicking on the green plus sign.
Note: By default, you see the counters for whatever machine you are running the mmc on. You can change the machine to a remote machine but you need to be part of that machine’s Performance Log Users group.
When you click on the plus sign, the default counter group is Processor. You can click on a whole group and add it or you can expand the group and select individual counters from the list.
In the above screenshot I chose all the Processor metrics for all of my logical CPUs (_Total vs. an instance of 0..4). My test machine has 4 logical processors.
Now I get a nice graph started that shows me all the processor counters:
This graph display is not very usable once you have more than a few counters. Fortunately you can switch to Report mode by clicking on the third icon from the left in the menu:
Now you can see the counters in a list:
SSAS Perfmon Counters
You can find a complete list of SSAS Perfmon counters here:
Many of the counters apply to both multidimensional and tabular instances while some are specific to only one of the two instances. For example, there aggregation counters that only apply to multidimensional and there are VertiPaq counters that only apply to tabular. Regardless of which instance they apply to, they all show in Perfmon for both instances.
Everyone has a slightly different list of counters they like to use. Here is a list of Perfmon counters that I like to start with for monitor SSAS during query processing.
To understand these counters, I give you some background information about how SSAS processes queries and handles memory management. (I skip looking at SSAS during cube/model processing but you can use the same methods described here to log counters that relate to processing time.) If you are already familiar with these concepts and are only interested to see how I got these metrics into a SQL Sever database, then skip ahead to the “Perfmon Data Collectors” section.
If you like to read more about SSAS Perfmon counters here are a couple of blogs to learn more SSAS Perfmon Counters:
SSAS multidimensional Query Processing Architecture
At a high level, SSAS has two main parts responsible for answering queries: Formula Engine and Storage Engine.
When a query comes to SSAS, it is sent to Formula Engine first. The Formula Engine is responsible for performing calculations and it is single threaded. The Formula Engine has a cache that stores the frequently accessed calculations and dimension data. The fastest possible run time for a query is when it can be answered from this cache. If a query cannot be answered from the Formula Engine cache, the Formula Engine generates a set of requests to the Storage Engine to get the data it needs to do its calculations.
The Storage Engine is multi-threaded meaning it can fetch multiple data sets at the same time. The Storage Engine has its own cache which holds the most frequently/recently used data sets (sub cubes). (Note: There are some rules about which data sets (sub cubes)can be cached. Read this article by Thomas Kejser about arbitrary shaped sub cubes here.) The second best scenario for a query is when its data is available in the Storage Engine cache. If the data is not available in this cache, the data has to be fetched from disk.
SSAS multidimensional is a disk based system. You can actually see measure group and aggregation files if you look at the Data folder under SSAS using Windows Explorer. Measure group files are significantly larger than aggregation files. (with some exceptions of badly designed aggregations: The aggregation wizard has a %30 rule that if an aggregation is larger than 30% of the size of a measure group it is not worth creating but you can get around this rule by designing aggregations by hand. )
It is much faster to read aggregation files vs. measure group files. If you have useful aggregations built they can improve your query performance greatly. However, if you have hundreds of unusable aggregations built, they only add overhead to your queries since the Storage Engine has to consider each one to see if it can be used. Even if an initial set of aggregations are useful, they have to be maintained over time. Changes to the data/design can make some of the aggregations obsolete. Aggregations are similar to indexes on in SQL server. If you add too many indexes on a table you can actually hurt query performance. As a result, in SSAS multidimensional, it is worth it to take a look at how many aggregations are actually getting hit and used. If this number is close to zero you have unusable aggregations. To save processing and query time, you should consider redesigning them or removing them (some queries cannot benefit from aggregations: For example if you are overwriting fact level data in a cube script using Scope statements.)
One caveat to remember is that the measure group and aggregation files may be sitting in Windows File System cache vs. on disk. This becomes important when you are tuning queries for performance and want to clear the caches. You have to clear the SSAS caches and Windows File System cache to get accurate results. See this blog post by Chris Webb https://blog.crossjoin.co.uk/2009/03/04/analysis-services-and-the-system-file-cache/
The Formula Engine and Storage Engine caches have a limited size. SSAS manages which calculations/data sets get to stay in the caches based on a set of rules. For example, as the caches get full, the least recently used calculations / data sets are flushed. Also when SSAS’s overall memory consumption goes above a certain limit (Low Memory Limit, see Memory Metrics section), a Cleaner thread wakes up and starts cleaning the caches starting with the least frequently used calculations/data sets.
Note: The caches are not just one Formula Engine cache or one Storage Engine cache. The caches are separate objects per measure group per security role and if you have dynamic user security defined, the caches are specific to each user. Akshai Mirchandani at the Microsoft clinic at PASS Summit this year reminded me!
SSAS Multidimensional Memory Management
SSAS Multidimensional controls its memory using a couple of settings that you can access using SSMS.
When you get the properties, you can see these settings:
These numbers are by default less than a 100 which means they are percentages of the total physical memory on the server. For example on a server with 100 GB of memory, Low Memory Limit is 65 GB. By default, the Total Memory Limit is 80% of the total amount of physical memory on the server.
If these values are above 100 they become actual bytes so be careful if you want to specify exact values vs. percentages. 128 means 128 bytes not 128 MB.
Every one of these settings has a corresponding counter in Perfmon. Here is the mapping:
These settings control when the cleaner thread wakes up and how aggressively it does its job. SSAS uses an economy model to manage memory. Initially memory is free and every data set/calculation can stay in memory. Once the memory consumption reaches the Low Memory Limit, the memory gets a price and it is no longer free to stay in memory. Every time a data set / calculation answers a query, it earns income. Every time the cleaner thread wakes up and checks in with a data set/calculation in memory, the data set/calculation has to pay tax. (Reminds me of Sheriff of Nottingham from the original Disney version of Robin Hood!)
If a data set/calculation does not have any money left to pay the tax because it has not been recently used to earn money or if the price of memory has increased exponentially (inflation!), it has to leave. As memory pressure increases the cleaner thread asks for more money every time it wakes up.
You can actually look at the current price of memory in Perfmon and get an idea of how much memory pressure you have. The counter is called Cleaner Current Price. I find it very interesting to watch this number and graph it on a busy server. The price of memory is normalized to 1000. When SSAS first starts the price of memory is zero. Once overall memory reaches Low Memory Limit the price reaches 2. The price of memory increases linearly between Low Memory Limit and half way between Low Memory Limit and Total(High) Memory Limit. From there it increases exponentially reaching 1000 when the memory reaches Total Memory Limit.
Under normal circumstances the cleaner does a great job and you should see SSAS’s memory hovering below Low Memory Limit or between Low Memory Limit and Total Memory Limit.
If for some reason, the cleaner cannot keep up with cleaning the memory, once the SSAS memory reaches the Hard Memory Limit which by default is 0 meaning half way between Total Memory Limit and total physical RAM on the sever, then all connections are forced to close and everyone will get error messages. This is when you get the calls from upset users. This should not normally happen, however; I have witnessed it happen with SSAS 2014 with an older service packs under certain conditions which happened to exist on the server I was working at. The problem had been fixed with the next cumulative update.
Back to the story I mentioned at the beginning of this post, by the time I got to look at the server, SSAS had already dumped most of its memory along with user connections and had started to come back up (Think the bottom left part of the above graph). In this case while I was thankful that the SSAS service had not completely crashed and stopped. But since it had not crashed it had not generated dump files so there wasn’t any evidence of what had happened (at least that I could find!) If I knew what the normal memory consumption for that SSAS server was supposed to be or if I had at least half an hour of memory logs similar to the above graph, I would have known what had happened.
Before moving to the Tabular concepts, I would like to mention one more Perfmon Counter that I like to monitor. It is called Cleaner Memory Shrunk KB/Sec. In spite of its name, this counter is a cumulative counter. It starts as zero when SSAS starts and keeps going up every time the cleaner cleans more memory. If you graph this counter, you can tell the points in time that the cleaner has woken up. If you take a look at this counter on a server in the morning and then check again at the end of the day, you can get an idea of how much memory was cleaned from the cashes.
Finally it is important to make sure the server itself has enough memory. SSAS is unaware of the available memory on the server and it works off of the above settings. If you have SQL Server, SSIS and bunch of other applications running on the same machine, you should monitor the overall system memory to make sure SSAS has enough memory available. In this case you should visit the above memory limits and set them appropriately. By default SSAS assumes its memory can grow up until %65 of the total physical RAM before it should start cleaning. If there is another software taking half the RAM then this will cause a performance hit for SSAS. You can monitor a machine’s available memory by watching the Memory: Available KBytes counter.
SSAS Tabular Query Processing Architecture
SSAS Tabular shares some of its architecture with SSAS multidimensional. Similar to SSAS multidimensional, when a query arrives it is sent to the Formula Engine first. The Formula Engine has a cache which holds the most frequently accessed calculations but it only caches MDX calculations. It also holds some materialized data sets. If a query cannot be answered using what is available in the Formula Engine cache, the Formula Engine generates a set of data requests and sends them to the Storage Engine.
The Formula Engine is still single-threaded and the Storage Engine is multi-threaded. The Storage Engine in SSAS Tabular is either VertiPaq which is a column based in memory storage system or it is Direct Query where the data is sitting in the source database. The memory Perfmon counters that I mention in this post only apply to Vertipaq.
If the VertiPaq engine holds all the data in memory, then what does a cache mean? It is true that VertiPaq data is stored in memory (assuming you have enough memory, more on this later) but VertiPaq compresses data in each column before it stores it. The cache holds uncompressed data so if a query can be answered from the cache, it will have a better performance since it will require less CPU. In the case that VertiPaq data does not fit in memory, you have the option of enabling SSAS to page data to disk which is the default behavior. Even though this option is there you should not rely on it for a production server since it will hurt performance because of the added I/O. It is important that you monitor how much data you are paging to disk. The VertiPaq Paged KB shows the amount of VertiPaq data (Pages means it is split into pages, it does not mean it has paged to disk). You can monitor the Memory: Pages/Sec counter to get the overall memory pages to disk.
SSAS Tabular Memory Management
Similar to SSAS multidimensional, SSAS tabular uses an economy model to manage the caches. At a high level, once overall system memory reaches a certain limit (Low Memory Limit), a cleaner thread wakes up and starts cleaning the least frequently used data sets and calculations out of memory. As the memory pressure increase the cleaner gets more aggressive. You can see these limits if you get the properties of the SSAS server in SSMS. They have the same default values as in multidimensional.
In addition to these settings, SSAS tabular has a couple of more settings:
Since VertiPaq is a memory based engine, there has been some changes in the way the cleaner thread runs so that it does not interact badly with the Vertipaq data. Here is what the above settings mean:
VertiPaqPagingPolicy controls whether SSAS can page data to disk if needed. By default it is set to 1 which means paging is allowed.
VertiPaqMemoryLimit has two different meanings based on the value of the VertiPaqPagingPolicy:
If VertiPaqPagingPolicy = o then VertiPaqMemoryLimit is the maximum size to which the VertiPaq data can grow. If the VertiPaq data needs more memory, an out of memory error is thrown. You can try this on a development server. For example if you have 32 GB of memory on a server and your database is taking 16 GB of memory, if you set VertiPaqPagingPolicy to 0 and try to process the database, you will most likely get an out of memory error. You can even try to submit a large query to the database while it is being processed to get the error faster! If you want to make it even worse, drop the VertiPaqMemoryLimit to a much smaller number (only on a development box!)
If VertiPaqPagingPolicy is set to 1 (default) then VertiPaq memory can be paged to disk. The VertiPaqMemoryLimit is no longer a memory limit but it is the maximum amount of VertiPaq memory that is visible to the cleaner process. In other words, this limit is only used for calculating price of memory by the cleaner to decide how aggressively it needs to clean memory.
For example, let’s assume the server has 100 GB of physical RAM and VertiPaqPagingPolicy is 1. With the default memory settings, this translates to the following numbers:
Low Memory Limit = 65 GB
Total Memory Limit = 80 GB
Hard Memory Limit = 90 GB
VertiPaqMemoryLimit = 60 GB
Let’s assume all non-VertiPag data takes 5 GB of memory. These are things such as sessions, connections, global variables and other non-data settings that SSAS needs for its internal operations.
As queries come in, SSAS’s memory increases. You can track SSAS’s memory by looking at Process/Private Bytes (filtered to msmdsrv). This counter shows the total amount of memory used by SSAS. Not all of this memory is visible to the cleaner.
Let’s assume memory keeps growing to 150 GB with VertiPaq data taking 145 GB. (This is possible since paging to disk is allowed). How much memory does the cleaner thread see now? At this point the cleaner sees 5 GB of non VertiPaq data plus 60 GB. It sees only 65GB which is just at Low Memory Limit so it starts cleaning.
Below you can see a rough estimate of how I think this would look like if you graphed it. The darker blue line shows the SSAS’s total memory. The lighter blue line shows the memory visible to the cleaner for the purpose of calculating the price of memory. (Disclaimer: I have not actually had this scenario recorded on a real server!)
Playing with memory settings
I learned a lot about SSAS’s memory management by changing memory settings on a test machine (Never on Prod!). For example, you can drop the VertiPaq Limit to zero and see how it impacts the Memory Usage KB.
Here is a screenshot of Perfmon from my test server where I intentionally dropped the memory limits very low to simulate a memory pressure situation:
Notice how the price of memory is 882 but nothing has been shrunk yet. Also notice how Memory Usage KB is above Memory Limit Low KB but below Memory Limit High KB.
And then I ran some queries:
Note how the price of memory increases to 1000 and how the Cleaner Memory Shrunk KB/Sec jumped up. Now Memory Usage KB is above the Memory Limit High KB. At this point the server is frantically cleaning memory but is not in panic mode yet since Memory Usage KB has not reached the Memory Limit Hard KB yet.
When you look at your server, if you find Cleaner Current Price to be constantly high, you should look into adding more memory to your server or reviewing your databases design to see if you can reduce their size. For example, eliminate columns that are not actually being used and optimize your design. Think about reducing cardinality in a column if you can: Split DataTime columns into a Date column and a Time column. I once left a datetime column in a model and very soon it became the largest consumer of memory in my model. A great source to learn these best practices is the SQLBI site by Marco Russo and Alberto Ferrari.
Perfmon Data Collectors
It is great that you can see all these counters in Perfmon, but how do you save this to a SQL Server table over time? You can use Perfmon Data Collectors to do this. Perfmon Data collectors can collect counter values over time using a predefined schedule and stop criteria.
To create a Data Collector, right click on Data Collectors Set and choose “New”.
Then give it a name and choose “Create manually (Advanced)”.
From the next screen, choose “Performance counter”.
The following screen allows you to choose the list of counters.
You can set the sampling interval here. If you are troubleshooting a problem then you should choose a shorter sampling interval. If you are monitoring use over a longer period of time you should choose a longer sampling interval or have a plan to retire/manage older sampled data else the size of your data can grow very fast.
Follow the default setting from the next couple of screens. We will change this data collectors target to a SQL Server table shortly so you can ignore the disk folder in the next screen. If you were to write the results to a comma separated file, this is where it would write it to.
Fortunately we can write the results directly to a SQL Serve table. In order to do this, first we need to create an ODBC data source. This is very easy to do! I leaned how to do this by reading this blog post by Michael Otey.
Create an ODBC Data Source
Search for ODBC Data Sources from Windows Search. Once it opens up, under System DSN, click on Add. For the drive, choose “SQL Server”.
In the next screen, you get to name your ODBC data source and choose the SQL Server where it will write to. In this case, I chose my SQL Server instance called ABI1.
I kept the default settings for the connection properties so that Windows NT authentication is used. In the last screen, you get to choose the database that the results will be written into. In this case, I chose an empty database called Test.
Follow the default setting for the rest of the screens and do a Test connection at the end to make sure your ODBC data source was created properly.
Now back to the Data Collector, get the propertied of the Data collector (not the Data Collector Set).
Change Log format from Binary to SQL. As soon as you do this, a Data Source Name drop-down becomes active. From the drop down, choose the ODBC data source you just created. In this case, I chose Test.
There is one last step you need to do before you can start the Data Collector set. This time right click on the Data Collector Set (not the Data Collector itself) and get the properties. If should look like this:
This is an important step. You have to change the Run As from System to an account that has write permissions to your SQL database. Once you have this set up, you can right click on the data collector set and choose start. A green arrow shows next to it that indicates it started successfully. At this point, the data collector is writing to your SQL database based on the interval you gave it. If you check the database, you will see three tables are created for you. If you don’t see the tables, chances are that the account you provided does not have the required permissions.
dbo.CounterData: This table contains one row per counter per each sampling interval.
dbo.CounterDetails: This table has a list of all the counters that are being recorded.
dbo.DisplayToID: This table contains one row per Data Collector.
Now you can record any of the Perfmon counters over time to your SQL database.
Extended Events (Collect Query Statistics)
Extended Events (XEvents) is a light-weight tracing tool that became available for SSAS starting with version 2012. XEvents traces have a much smaller footprint than SQL Server Profiler traces so they are suitable for monitoring production boxes. Before SSAS 2016 you had to use scripts to start and stop XEvents traces. Starting with SSAS 2016, you can start and delete XEvents traces from SSMS which makes it a lot easier to use.
You can read this blog from Adam Saxton about Extended Events. Here I will only explain settings that are related to our goal here to collect query statistics. The only event that we need is called Query End. This is true for both multidimensional and tabular instances. This event will provide statistics such as query duration, cpu time, information about the user who ran the query, the database it was run against, etc. If you collect this information over time, you can get a benchmark for AVG Query Duration for your SSAS server.
To start an Extended Events, connect to your SSAS Server and look under the Management folder. (If you don’t see this folder, you are working with a previous version of SSAS and your only option is to use scripts. You can use a development version of SSAS 2016 to generate sample scripts and then edit the server name.) Right click on Sessions and choose New Session.
Give the session a name.
Click on Events and type “Query End” in the box under Event Library. The list by default filters all the events that have Query End in their name. Choose “Query End” and use the right arrow to move it to Selected events.
Click on Data Storage. Here you have to choose at least one Target which is where the results will be stored. You have three choices:
Event_File: Writes results to a file.
Event_Stream: You can see the results in SSMS using Watch Live Data.
Ring_Buffer: Results are written into a temporary memory bases structure that will be erased when SSAS service restarts. I have not used this option myself but I can see how it can be useful if you have a program that can directly read from memory.
For the Event_File, pay attention to where the file is being written to. By default, the files are written in to the Log directory of your SSAS instance.
Once you click on OK, the trace starts. If you run a couple of queries and either open the file or use Watch Live Stream in SSMS, you can see queries coming in as they are answered by your SSAS instance.
In order to stop the trace you have to delete it. I really wished there was an option to pause a trace and start it similar to SQL Server Profiler.
How to read from an Extended Event (XEL) file
You can open an XEL file from SSMS and see individual events. You can also use SQL to read from XEL files. Using SQL, you can read the file using sys.fn_xe_file_target_read_file function. See this example:
SELECT CAST (event_data AS XML) AS XE
FROM sys.fn_xe_file_target_read_file(‘C:\Program Files\Microsoft SQL Server\MSAS13.ABI_TAB\OLAP\Log\Test Events.xel’, NULL, NULL, NULL)
This query returns one XML row for each event in the file or files specified. From here, you can get each element from each row by using the following method:
This may seem complicated at first but the syntax is pretty much the same for extracting different columns from the event. Look at sample code posted at PASS Summit’s site or my GitHub.
Finally you can put this into a view. I also ended up creating two views to split today’s queries from historical queries so that I can use them in partitions of the corresponding table in my SSAS model. This made it possible to only process today’s queries vs. all time throughout the day.
In order to use these files long term for a production server, you need a way of managing them and deciding how long you want to keep them. A traditional ETL approach with SSIS might work best if you plan to archive data for a long period. Using the method in this post works fine for a couple of days worth of data (depending on how many queries your server gets) but beyond that, reading from the view will get slow since it has to read from the files every time.
Dynamic Management Views
Dynamic Management Views (DMVs) are query structures that you can access by running DMX queries against SSAS. They provide useful information about the current state of sessions/connections/traces/memory allocation, etc. The DMX format is similar to SQL but is not the same. For example, you cannot join DMVs together. For the purpose of this work, you only need a Select * from DMV query.
In order to use the DMV data in SQL, you can use OpenQuery and a LinkServer. Once you have a LinkServer, you can use an OPENQuery similar to what’s shown in the above picture in a view.
I have used the following DMVs and created views for them in my database:
$System.Discover_Object_Memory_Usage : To get information about which columns/databases are taking the most amount of memory. You’ll be surprised to see how some columns can grow much faster than others. For example, a dateTime column can grow very large. (Follow best practices 🙂 )
$System.Discover_Sesssions: To find out who’s connected to your server.
$System.Discover_Traces: To find out which traces are running against your server. For example, SQL Server Profiler sometimes crashes and leaves a trace running before you have a chance to stop/delete it. Any trace that runs against the sever has a (hopefully small) impact on the server so you don’t want to see a lot of traces running. You may find that you or someone has forgotten to turn off the FlightRecorder trace on a produciton server.
Putting it all together: From a SQL Server Database to a SSAS Tabular Database
If you have followed the previous steps in this blog, at this point you have the following information available to you in a SQL Server database:
- Perfmon counters data in the form of 3 tables.
- Query information from the Extended Events trace in the form of views.
- Sessions/Connections/Users/Memory Holders info from DMVs in the form of view.
At this point you can directly go to Power BI desktop and create a dashboard. However, I chose to do the rest of the modeling and creating calculations in SSAS Tabular for two reasons:
- I have more experience with SSAS Tabular than Power BI at this point.
- With SSAS Tabular I am not limited in terms of how big the data can grow. I also have the option of using partitions to only process the most recent data for example, current day vs. all historical data.
Here is how my SSAS Tabular model looks like:
You can download this SSAS solution from my GitHub. In this model, I have two perspectives create to limit the view to tabular and multidimensional objects. Most of calculations in this model follow the same pattern. Remember that the [Perfmon Data] table has one row for each counter’s measurement. For example, if you have taken 100 measurements you will have 100 rows for Memory Usage KB. In order to get Memory Usage KB for a time period that is higher than your sample rate, you have to decide how to summarize it. In most cases, it makes sense to take the average.
To do this I created three base measures that I use in the rest of calculations:
- Counter Value SUM:= SUM(‘Perfmon Data'[Counter Value])
- Counter Value Min:= MIN(‘Perfmon Data'[Counter Value])
- Counter Value Max:= MAX(‘Perfmon Data'[Counter Value])
- Counter Value AVG:= AVERAGE((‘Perfmon Data'[Counter Value]))
- Count Of Rows:=COUNT(‘Perfmon Data'[Counter Value])
Most of the rest of calculations, use [Counter Value AVG]. For example, here is the calculation for Memory Usage KB:
This calculation is doing a simple pivot on the Perfmon Data table. It is fetching the [Counter Value Avg] for the rows in [Perfmon Counters] where the Counter Name is “Memory Usage KB” and the Object Name is MSOAP$ABI_TAB:Memory. The [Object Name] criteria is necessary since I am collecting information from two instances so there are two Memory Usage KB counters being collected.
Most of the calculations in this model follow the above pattern, just the counter name and object change.
To make creating reports on the dashboard easier, I have added a couple of columns to my date and time tables to flag Current Day, Last 15 Minutes, etc. I used the code in this blog by Joshua Walker to generate my Data and Time tables.
Power BI Desktop
Once all the modeling was done, creating a set of Power BI reports was easy.
The following screenshot shows one of the reports where I could see SSAS’s CPU usage vs. the box’s CPU usage over the last 15 minutes at the time. The spike was during processing a very small database. I could also see that the Avg Query Duration for that day was higher than historical data I had collected.
I have uploaded the template to PASS Summit’s site as well as my GitHub.
The following screenshot shows object memory usage report. I had limited the top left graph to the top 5 largest items using a filter on a Rank measure that I created in the tabular model.
The following screenshot shows the Sessions report where you can see the active sessions and some information about the last command they ran. You can also see the historical trend of connections by day.
The following screenshot shows the queries report. Here you can see the number of queries by day along with AVG Query Duration by database.
On the right hand side, I used the Forecast feature of Power BI Desktop. This was really easy to do. All I had to do was to go to create a line chart and then go to the Advanced tab and enable the Forecast! I saw this done earlier in the conference during a demo.
Keep in mind that if you set your data collector to run every day and collect a large number of counters every second, your data volume is going to explode over a few days. You have to think about archiving this data if you have a need to keep it beyond a few days. For example, I chose to mark certain days as benchmark days (by adding a column to my dimDate table) and only keeping those benchmark days past 30 days. Note that you don’t need to be taking samples every second from a server that is not in trouble. Perhaps you can have a data collector wake up every day in the morning for about an hour and take measurements every 15 seconds and then go to sleep.
SSAS 2017 Perfmon counters are missing
Shortly before my session, someone posted on Twitter that they cannot see SSAS 2017’s Perfmon counters. I installed SSAS 2017 (both multidimensional and tabular) and none of the counters showed in Perfmon. I tried removing and reloading the SSAS counters as it had been suggested by several people and I was not able to load them. At this point this is a known issue and based on Kasper de Jonge the SSAS team is working on fixing this.
You have to keep an eye on your server to make sure adding a monitoring solution is not adding unacceptable overhead. Check the CPU usage with and without the monitoring trace/data collectors running. Both the Extended Events traces and data collectors should have little impact on any sizable sever but you have to see how they act on your server.
I would like to thank all my #SQLFamily friends who supported me. Special thanks to Leila Etaati (@leila_etaati) and Reza Rad (@Rad_Reza) for answering all my Power BI questions, to Sabrina Motta (@Sabmotta) and Sergio Zenatti (@)SergioZenatti) for attending my session and being my supporting friendly faces in the audience :), to Patrick Leblanc and other folks at SQL clinic for answering my SSAS and Perfmon questions, and to Chris Webb for reviewing my slides and demos.