I recently ran into a situation where a new SQL Server would crash hard every time it would get under a load.
Here is a synopsis of what we were seeing:
This is a physical server and has 512GB of RAM installed. We have SQL Server 2016 installed, and fully patched (SP2 CU15 at this time). When load testing the server, it would start throwing errors such as:
“Failed allocate pages: FAIL_PAGE_ALLOCATION”
“There is insufficient system memory in resource pool ‘default’ to run this query.”
“Failed to allocate BUFs”
It would then write a memory dump to the log, and in most cases the server would become completely unresponsive and would have to be rebooted.
In our testing, we began experimenting with different Max Memory settings in SQL Server. We found that if we set Max Memory to 400GB everything ran fine, but if we set it to 425GB it would crash. We then honed it in to discover that 410GB of RAM usage was the magic number, once SQL Server hit that it would crash. Every time.
A little bit of math showed us that this was also 80% of total server RAM. So as soon as SQL Server’s RAM usage hit 80%, the crash would happen. We did multiple checks on the hardware, and even engaged Dell support and couldn’t find any issues there.
After a lengthy, and very unproductive, engagement with Microsoft support, I inadvertently discovered the problem. While looking at some settings in the SQL Configuration Manager I noticed that SQL Server Analysis Services (SSAS) had been installed on this box by mistake. Even though we were not using it, the SSAS service was active and running. I disabled the SSAS service and tried another load test on the server, and was shocked to find that this time everything worked correctly!
We ran numerous tests after that, some with SSAS enabled and some with it disabled. We got the exact same results every time: SQL Server would run perfectly fine when SSAS was disabled, and crash at 80% RAM usage when SSAS was running. Every time.
By this time our case had been escalated to tier 3 support at Microsoft, and they proved to be extremely helpful. In reporting our findings regarding SSAS, here is what they were able to determine:
In short, you are seeing this memory pressure issue when SSAS is running (even if it is not being used) because by default SSAS allocates 20% of physical RAM on the machine for its minimum working set. That combined with SQL Server’s Max Server Memory being above 80% of physical RAM and having LPIM enabled (which prevents the OS from paging this memory to free up more physical RAM) can lead to memory exhaustion at the OS level and result in the exact symptoms you described.
To summarize, our problem was a result of the combination of the following:
- SSAS was installed and running (when it shouldn’t have been)
- The default SSAS config is to take 20% of the server’s RAM
- SQL Server Max Memory was set above 80%.
- Lock Pages In Memory (LPIM) was enabled
In our case, the solution was simple: Get rid of SSAS since we didn’t need it. Once we did that, everything ran fine and we haven’t had this problem since.
But What If I Need SSAS?
If you do need SSAS on your SQL Server, there a few options:
- Don’t run SSAS and SQL Server on the same box. Microsoft even recommends having them separated.
- Do the math and figure out the 80% point and set SQL Server’s Max Memory below that.
- If you need more RAM for SQL Server, adjust SSAS’s Maximum Working Set to be less than 20%
- Disable Lock Pages In Memory (LPIM) so that SQL and SSAS and Windows will all play nicely together and share the RAM