Thursday, March 17, 2011

Why is SQL Server doing a clustered index scan?

This is a simple real life scenario I ran into recently. Client is experiencing performance issues with SQL Server. A quick investigation reveals that a certain table, let's call it MyTable which contains over 50 million rows seems to be the source of contention. In particular there is a frequently run, innocent looking query that is causing the trouble. The query looks something like this

SELECT c1, c2 FROM MyTable WHERE MyTable_PK BETWEEN 5 and 9

The execution plan shows that the SQL Server query optimizer has decided to do a clustered index scan on the primary key (terribly expensive on a large table like this). Why can't the query optimizer "see" that an index seek would be significantly more efficient than the scan? Upon closer examination I realized that an implicit conversion of the PK is happening. Now I can see the light – I check the table structure and realize that the Primary Key column has been defined as a varchar and that is being compared with integer values. That is what's "tricking" the query optimizer into picking the wrong plan. A little tweak:

SELECT * FROM MyTable WHERE MyTable_PK IN (' 5', '6', '7',' 8', '9')

Success - now instead of scanning 50 million rows the query optimizer has picked the right plan, that is, it is doing a quick index seek and getting the rows we need in a fraction of the time. All the contention disappears – everybody is happy!

Note: the real query was much more complex and the tweak is not necessarily as simple, but the point here is to highlight the fact that if the type of the index column does not match the type of the values with which it is being compared then the index is rather useless.