Monday, January 12, 2009

Why is SQL Server scanning the clustered index when I have an index on the column I am filtering on?

I ran into a very simple SQL Server performance related question the other day. A puzzled DBA was staring at a simple query “SELECT * FROM table1 WHERE col_n = x”. Since this was a fairly common query that was executed very frequently he had appropriately created an index on col_n. Yet, for some reason SQL Server was simply ignoring the index on col_n and instead scanning the clustered index. So the puzzling question on this DBAs mind was “why is SQL Server scanning the table? Isn’t the SQL Server Query Optimizer smart enough to see that it will be more efficient to use the col_n index?

The answer lies in the fact that the SQL Server Query Optimizer is smarter than that (albeit not as smart as I wish it would be). You see, when deciding on a particular execution plan the Query Optimizer has to use an actual value for the parameter x to estimate the number of rows that will be returned on each step. It further assumes that the rows you are looking for are randomly distributed and that it will need to do a page read for every row being returned in addition of the reads it needs to do on the index pages. Depending on this number it makes a determination whether it will be more efficient to just scan the whole table and pull out the rows it needs or go to the col_n index first to get a list of addresses and then go and pull those rows.

So, now that we know why should we just leave it there since it appears that SQL Server is correctly picking the most efficient path? Not so fast! Remember all that evaluation is being done based on a certain value of parameter x. It could be that for most of the values of parameter x scanning the table is more efficient. In that case you can simply drop the index on col_n if it is not needed for any other queries and leave it at that. However, it could be that for 99% of the possible values of x it would be a lot more efficient to utilize the col_n index – it just so happened that unfortunately when generating the plan the value of x happened to fall on that 1% for which scanning is more efficient. In this case you have to options:
  • simply force SQL Server to dispose the bad plan that it has cashed and generate a new one; OR
  • use a HINT to force SQL Server to use the index. The danger with this however is that with time as the data changes the use of that index may not be optimal so I would recommend that you avoid using HINTS whenever possible and let SQL Server do its job.

One last note: it is often not necessary to return all the columns – in other words instead of SELECT * FROM table1 WHERE col_n = x you may only need to return a couple of columns like SELECT col_1, col_2 FROM table1 WHERE col_n = x in which case it could be worth it to include col_1 and col_2 in the col_n index. That way SQL Server would not need to go to the table at all but instead get all it needs from the index pages. In certain scenarios where you have a rarely updated but very frequently queried table the above approach of including other columns in the col_n index may make sense even if the query returns all the columns.

No comments: