1

I am working on a server that houses both an OLTP workload and a data warehousing/reporting workload. The OLTP system requires sub-second (in the milliseconds) response time while the reporting workload has a threshold of minutes. The issue is some users will run multiple reports during peak transactional activity in our OLTP environment, thereby slowing the OLTP queries down. On their own, the OLTP queries tend to run 20-30ms, but when many reporting/data warehouse queries are fired up simultaneously the OLTP queries begin to run in the 4-5 second range. Our bottleneck appears to be CPU, as this is where all of our waits are.

I would of course like to separate the workloads on two different servers, but I’m wondering if we could get a quick improvement via use of Resource Governor.

I’m considering two approaches: (1) set a minimum allotment of CPU to the OLTP system so that it is guaranteed 25% resources, or (2) max and cap out the reporting\data warehouse queries so they cannot take more than 50% of the CPU.

Does anyone here have any practical experience with Resource Governor that would guide me in one of these two directions? Or is there another usage pattern that you’d suggest instead?

4
  • I think this is too broad / opinion based to work here, but your overall approach is sound. We can't speak to whether or not this will work for your specific situation but if you lack the resources / time to setup a dedicated reporting copy this could help hold you over. I'd personally still lean towards setting up a dedicated reporting copy eventually as resource governor only has so much granularity/configurability. Commented May 14, 2020 at 21:24
  • Thanks for your input. I get it that this is a bit opinion based here but wondering what people thought on this front. It is hard to find some info on the subject (i.e., cap out the heavy queries vs guarantee the important ones). We may even go with both but we would like to start off simple first and build up. I was leaning towards guaranteeing the min to our more important workload as a starting point. I wanted to hear alternative arguments (or supporting statements). Commented May 15, 2020 at 0:42
  • 1
    Set up one method, see if it works. If not, tune it, change it. There are no "best practices" for stuff like this because every environment and load pattern is different. What works for one environment won't work for another. Commented May 15, 2020 at 12:50
  • We are going to test out method 2 first. Commented May 15, 2020 at 17:17

1 Answer 1

2

Supporting a hybrid workload requires more than just resource governance.

The docs call this scenario Real-Time Operational Analytics, and it's a bunch of features all working together.

First you need row versioning to ensure that the report users don't create blocking in the database. So set the database to READ COMMITTED SNAPSHOT or force the reports to run with SNAPSHOT isolation.

With locking solved, you then need resource governance and isolation to prevent the report users from using excessive server resources. For many workloads the OLTP part doesn't need data file physical IO, but does depend on log file IO. So physical separation of data and log can help prevent IO from reports from interfering with OLTP. For resource governance I would start by capping the reporting workload's CPU, IOPS, Memory, and perhaps maxdop.

The next thing you need is to make the reporting workload less expensive. For this use some combination of updatable nonclustered Columnstore indexes, filtered indexes, indexed views, and Columnstore compression delay.

And the final thing is that you need to use Query Store to monitor the workloads and provide visibility into the plans for the report developers.

2
  • 1
    Thank you for your feedback. We are going to test and we are gong to go the max\cap route first. We would like to start slower so plan for us is to have a soft max and a hard cap after that. So, for example, the research workload can max at 50% cpu and I know that is a soft max. We would cap so that reserarchers cannot consume more than 75% CPU. Commented May 15, 2020 at 17:17
  • And create some test scenarios where the research workload generates queries that generate table scans and large memory grants. Commented May 15, 2020 at 17:23

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.