Is Your SQL Server Overindulging on Parallelism?


By Steve Le Monnier, 28 September 2020

Recently Corefocus was asked to help a company with a poorly performing SQL Server that was starting to affect business operations. This was despite the server being reasonably well spec’d.


Troubleshooting performance issues are always a challenge, difficult to analyse retrospectively, and very subjective without a baseline of what normal looks like.


When troubleshooting SQL Server performance there are always the usual suspects.


  • Does the SQL Server only have a single TempDB data file?

  • Are databases growing at 1 Mb a time, or growing by a percentage that makes each iteration exponentially longer than the last?

  • Are Indexes and Memory buffer pools correctly set?


Having checked and ruled out all the above we were left looking for a more significant and deep-rooted cause.  Thankfully, SQL Server comes packed with statistics and performance counters on just about everything.


Running sys.dm_os_wait_stats allows us to review the wait times for various locks and performance counters used within the SQL Engine. By sorting these results from the longest waiting event to the least allowed us to target our investigation.


However, we had a problem because the longest 

waiting event in the list was CXPacket, an indication that the SQL Engine has used parallel query execution processing, a term more commonly referred to as parallelism, and this is a good thing. It indicates that the SQL Engine has decided to utilise multiple CPU’s and processing threads to get the job done.


To understand what the CXPacket performance counter represents, we need to use an 

analogy to explain what the SQL Engine is doing.


Pretend you’re a teacher in a classroom with a dozen students. You’ve been given a task by the head teacher to count the number of times the school is mentioned in the 500-page education assessment report that has just been dumped on your desk.


Rather than scanning the pages and counting these references yourself, you decide it would be “fun” to get your students to do the work for you.

So, you randomly distribute multiple pages of the report to each student and ask them to report back how many times the school is mentioned. Meanwhile you pull out todays newspaper and do the crossword puzzle, while you wait for the students to report back their individual counts.


Once all the students have reported back, you add up the results and report back to the head the total, accepting their thanks for a difficult job well done!


This analogy is no different to the SQL Engine. The CXPacket performance counter is the time the query spent waiting for all the CPU’s to process their individual part of the query and report back. So, CXPacket waits are normal and a good sign that the SQL Server was using the full power of those multi-core CPU’s, but in this instance, there was a problem with the size of the CXPacket number that didn’t look right.


The SQL Server had been operating for about 10 days since the last reboot. During this time the CXPacket counter had spent over 28 hours waiting for parallelism queries to complete. That’s over a day waiting in ten-days of operation!

After investigating further this was indeed the root cause of the client’s poor SQL Server performance. But how can something that is supposed to be good, actually end up being bad?


The problem lies with the default settings that ship with every copy of SQL Server. Some of these settings made perfect sense back in the nineties, but now, with faster modern hardware they don’t, and can be detrimental to performance.


The default threshold for deciding when a query should be processed using parallel processing was set so low, that it resulted in every query deciding to go parallel, even when the cost in waiting (the CXPacket) was longer than processing the query without invoking parallelism.


The default “cost threshold for parallelism” within the SQL Server configuration is always set to 5. It’s widely suggested that organizations with multi-CPU enabled servers (physical or virtual) should set this value to 50, and then undertake another performance review once a new baseline position has been calculated at this level.

If your SQL Server is performing poorly, then review your “max threshold for parallelism” and “MAXDOP” settings. Microsoft has recently changed the advice for these to now reflect virtualised SQL Server. You can find this advice in Microsoft KB article #2806535.


As with anything you should always test in a non-production environment. That being said, adjusting the cost threshold for parallelism can make a significant performance improvement on your system by a simple on-the-fly change, and at no time do I recommend leaving this at the default value.


Database Steve