>>Optimizing Database Queries

When it comes to understanding a how a query is resolved there is nothing better than using the ‘Show Execution Plan’ feature of SQL Query analyzer. I am certain you are aware that using OR operators in a SQL where clause can be potentially devastating to the performance of the query. In general when using the OR operator the SQL query engine tends to ignore indexes and revert back to using table scans.

There are basically two alternatives to improve this situation. In my particular case one worked slightly better than the other, but I would urge you to make sure that you test your results before assuming one solution is generally better than the other. Things are seldom what they seem in the land of optimization.

The first and most common solution is the use of a union to resolve the ‘OR’ expression, this technique has been around for a number of years and is very well publicized so I am going to just briefly gloss over it.

The following overly simple example should suffice to explain the concepts.

Given a table of 1,000,000 rows, with two numeric columns value1 and value2, each holding random values between 1 and 1000. The columns are individually indexed. We will evaluate and attempt to optimize the following contrived query.

select value1, value2 
  from numbers
 where ( value1 = 1 AND value2 < 100 ) 
    OR ( value2 = 500 AND value1 < 75 )

As it stands, this query takes a total of 10 seconds to return 155 rows on my very underpowered little SQL server box. Using the UNION technique to try to optimize these results gave the following query.

select value1, value2 from numbers where value1 = 1 and value2 < 100
UNION ALL
select value1, value2 from numbers where value2 = 500 and value1 < 75

In this query, the first select returns a result set containing the records matching the criteria of the left side of the OR operator in the original query. The second select statement returns a second result set containing records that match the expression to the right of the OR operator in the original query. Then a UNION ALL operation is performed between these two result sets, yielding a new result set that matches the results from the original query. Only this time the query optimizer uses the relevant indexes to perform the individual queries improving the performance by a staggering 70% bringing the queries execution time from 10 seconds down to 3 seconds.

There is another technique that can be applied to the original query that will improve on the performance. That is to convert the original query’s where clause, which is in disjunctive normal form (DNF) to conjunctive normal form (CNF). To explain disjunctive and conjunctive normal form is beyond the scope of this post, but in very basic terms, the OR operator would be the root of the expression tree for an expression in disjunctive normal form, while the AND operator is at the root of the expression tree in conjunctive normal form. To convert this to conjunctive normal form there is a law called DeMorgan’s law, which can be applied to convert the expression from disjunctive to conjunctive normal form. To apply DeMorgan’s law to the original query, take the where clause and break it up into its unique atoms (literals).

a -> value1 = 1
b -> value2 < 100
c -> value2 = 500
d -> value1 < 75

Using these atoms, rewrite the Boolean expression where addition (+) represents OR and multiplication represents AND operator as follows.

           ( ab ) + ( cd )

Applying DeMorgan’s law the following expansions can be made

ð ( ab + c )( ab + d )

ð ( a + c )( a + d )(b + c)(b + d)

The resulting query for the above expansion is

select value1, value 2
  from numbers
 
where ( value1 = 1 OR value2 = 500 )
   AND ( value1 = 1 OR value1 < 75 )
   AND ( value2 < 100 OR value2 = 500 )
   AND ( value2 < 100 OR value1 < 75 )

This version of the query yielded better performance than the original query since the query optimizer was able to use the indexes available to the query. It did not however match the performance of the UNION optimization, which used the indexes and entirely removed the dependency on the OR operator.

Still the results given even by the UNION solution did not feel satisfactory to me, so I decided to alter the existing indexes and see what the result would be if the index covered the query. For an index to cover a query, the index should include the columns being retrieved by the query, so I replaced the two individual indexes with a single composite index consisting of both value1 and value2 columns. This is where the real excitement came in, suddenly all versions of the query were returning there results in sub second times, even on my under powered overworked SQL server box. With the covering index the most efficient query in terms of query cost was the conjunctive normal form with the UNION optimization coming in a very close second and the original query performing the worst. But the execution times told a different story. In this case the original query was completing in 180ms, the conjunctive normal form optimization in 650ms and the UNION optimization 600ms.

Conclusion

Whenever I am consulted on the performance of a database query, the first thing I ask is what do the indexes look like. In variably the other party looks strangely at me and assures me that there are indexes. On inspection of the indexes, sure enough there are indexes that on every column that is ever considered in a query. But are they the correct indexes?

And that is the point of this exercise, to demonstrate that before considering complex optimization techniques go back to the basics. All to often I come across code that I can barely read because of the number of obscure optimizations applied, making the code difficult to understand and even more difficult to debug. These optimizations only temporarily conceal the underlying problem, which will eventually comeback to haunt your software.

 

 


This Blog Hosted On: http://www.DotNetJunkies.com/
文章来源:http://dotnetjunkies.com/WebLog/chris.taylor/archive/2004/02/15/7268.aspx
posted @ 2004-02-16 08:42  dudu  阅读(904)  评论(0)    收藏  举报