Last week I spent a good bit of time dealing with a corruption issue, and as it turns out I may actually have found a bug in SQL Server 2017. I say “may” because even though I have been working a Microsoft SQL Server PFE (Premier Field Engineer) and providing information to the SQL Server Engineering Team, I have not yet gotten a definite confirmation that this is in fact a new bug, although I have not been able to find any mention of this particular problem anywhere else.
As part of my Azure migration, I am moving from SQL Server 2014 to SQL Server 2017. Since all of our databases use TDE, the new enhancement in backup compression of TDE databases (released in SQL Server 2016) is of particular interest to me. While none of our databases are very large, I like the idea of getting backups as small as possible since you have to pay per GB of storage in Azure.
The problem that I was having was that using backup compression on my TDE database would cause the backups to become corrupt. The backup would complete successfully, but RESTORE VERIFYONLY would throw the following error:
“Damage to the backup set was detected.”
Not only would RESTORE VERIFYONLY throw an error, but attempting to restore the backup resulted in an error as well:
“RESTORE detected an error on page (1:92454)”
The page being referenced is in the clustered index of the sys.sysmultiobjrefs table. Not good! I did quite a bit of testing around this, and wound up posting my results on the DBA Stack Exchange site hoping to find some help. I’m not going to go back through all of the technical details here, since everything is listed in my DBA Stack Exchange (DBA.SE) post. In the comment thread on that post (which has since been deleted by the moderators) I was contacted by Microsoft SQL Server PFE Sean Gallardy.
Sean was a huge help in further troubleshooting this issue, and we wound up continuing our investigation of this issue over email and off of DBA.SE. Sean was able to pinpoint the problem, to the point I was then able to create or prevent the corruption at will. I later came back and updated the DBA.SE post with our findings.
It looks like this issue is with databases that have had SHRINK operations run on them. In my case, I was taking a copy of one of our production databases on SQL Server 2014 (which is already encrypted with TDE), running DBCC SHRINKFILE on both the data and log files, then taking a backup and restoring it on my new SQL 2017 Server. (The reason for the shrink was to reduces the size to make transferring the backup faster.)
As a test, I restored a copy of the database that I did not run DBCC SHRINKFILE on, and it did not have the corruption problems when compressing backups. Everything worked as expected.
So, to summarize, the results of my testing are as follows:
- Normal backup/restore operations on this “shrunken” TDE database work correctly in SQL 2017
- Compressing backups of the “shrunken” TDE database seem to cause corruption in the sys.sysmultiobjrefs table. I later tried this on other databases, and while I always ended up with corruption, it wasn’t always a sys table. Sometimes the corruption would be in regular user tables instead
- Compressing backups of the regular TDE database (not having DBCC SHRINKFILE run) worked correctly and did not report corruption
As I said, I do not know if this is a confirmed bug in SQL Server 2017, but I have sent my findings to Microsoft for them to look over.
But, the moral of this story is: DO NOT SHRINK YOUR DATABASES! EVER! 🙂