I have a confession. When I was a developer, I didn’t understand indexes in SQL Server. I rarely added them to my tables, and when I did there was not really any logic behind my decision, other than it was usually just because I thought a column might be queried a lot. But after a few years working as a DBA I have a much better understanding of indexes and realize just how big of a mess I was making before! In this installment of my Index Basics series, I’m going to cover some of the costs associated with the indexes you add to your tables, which is something I think a lot of developers don’t really have a good knowledge of. (I know I didn’t!)
When you create an index on a table, SQL Server actually creates a copy of the table’s data needed to populate the index. This is not a temporary copy either, the indexes are written to disk and the data in them is kept up to date with the data in the table.
At my day job I work with a heavily used OLTP database. Because the data in a lot of the tables is regularly modified via INSERT, UPDATE, and DELETE statements, this means the data in the indexes on those tables is also modified. For example, if one of the tables has 5 nonclustered indexes on it, this means that there are 6 copies of that data (the 1 table + the 5 indexes). Every time a new record is INSERTED into that table, it is also inserted into those 5 indexes, turning 1 INSERT statement into 6 actual INSERTS. The same thing applies to DELETES, when you remove a record from the table it also has to be removed from all of the indexes on that table. UPDATES work a little differently, and will only UPDATE the indexes that contain the data that was modified.
When you create an index, SQL Server only creates a copy of the data in the columns specified in the index, not a copy of the entire table. Take the following table for example:
CREATE TABLE [dbo].[Users] (
[UserID] INT IDENTITY PRIMARY KEY NOT NULL,
[FirstName] VARCHAR(50) NOT NULL,
[LastName] VARCHAR(50) NOT NULL,
[Email] VARCHAR(50) NOT NULL,
[Phone] VARCHAR(50) NOT NULL,
[Address] VARCHAR(50) NOT NULL,
[City] VARCHAR(50) NOT NULL,
[State] CHAR(2) NOT NULL
If we were to add the following index on just a few columns of that table, SQL Server would only copy the data from the columns specified in the index: FirstName, LastName, and Email*
CREATE NONCLUSTERED INDEX [IX_Users] ON [dbo].[Users] ([FirstName],[LastName],[Email]);
(*Note: The Clustered Index key is also stored in your nonclustered index, but that is a topic for another time.)
With the above index, the following scenarios could occur:
- If we ran an UPDATE statement to modify the Phone or Address column in our dbo.Users table, the IX_Users index would not need to be modified as part of that UPDATE (because those columns are not in the index).
- If an UPDATE came in that modified the Email column, the IX_Users index would be modified as part of the UPDATE.
- If an INSERT or DELETE were to occur on our dbo.Users table, that data for the IX_Users index would also have to be added/removed.
So as you can see, the more indexes you have on a table the more work SQL Server has to do to keep them up to date, and all of this additional overhead could impact the performance of your INSERTS, UPDATES, and DELETES. That’s why it is important to choose your indexes wisely, and not over-index your tables.
Another hidden cost associated with indexes is the ongoing maintenance of them. Over time, your indexes will become fragmented from all of the data modifications. Fragmented indexes perform slower, so it is important to have a regular maintenance process that removes the fragmentation and keeps your indexes nice and tidy. (If you don’t have a DBA handy to take care of this for you, I highly recommend Ola Hallogren’s maintenance script)
However, index maintenance can be a resource intensive operation, so you want to make sure you are doing it during a scheduled maintenance window. You also want to make sure that you are only doing maintenance on the indexes that you are using, which means getting rid of any unused indexes.
Unused indexes are unnecessary overhead, and need to be cleaned up whenever possible. Even though SQL Server may never use an index for your query, it still has to maintain the data in the index, it still has to perform maintenance on the index, and the index is sitting there taking up valuable hard drive space in your database and backups. By regularly checking your index usage statistics, you can determine which indexes are used heavily, and which ones you need to get rid of.
The Bottom Line…
Well designed indexes can improve query performance, but there are costs associated with them, such as hard drive space and ongoing maintenance. Unused indexes can become a drain on your performance, so it is important to regularly keep them cleaned up. There is more to indexes than just speeding up queries, and if you are not careful you can cause yourself some unnecessary grief.