【SQL Server Inside the Optimizer】: Constructing a Plan - Part 1
For today’s entry, I thought we might take a look at how the optimiser builds an executable plan using rules. To illustrate the process performed by the optimiser, we’ll configure it to produce incrementally(adv. 递增地;增值地) better plans by progressively(adv. 渐进地;日益增多地) applying the necessary rules.
Here’s a simple query (using the AdventureWorks sample database) that shows the total number of items in the warehouse for each of a small number of products:
SELECT P.ProductNumber, P.ProductID, total_qty = SUM(I.Quantity) FROM Production.Product P INNER JOIN Production.ProductInventory AS I ON I.ProductID = P.ProductID WHERE P.ProductNumber LIKE N'T%' GROUP BY P.ProductID, P.ProductNumber;
Heaps of work is done by SQL Server to parse(vt. 解析;从语法上分析) and bind the query before it hits the query optimiser, but when it does, it arrives as a tree of logical relational operators, like this:

The optimiser needs to translate this into plan that can be executed by the Query Processor. If the query optimiser did nothing more than translate the logical relational operators to the first valid form it found, we would get a plan like this:

This executable plan features two full table scans, a Cartesian(笛卡儿积) product, a Filter on the two predicates, and an Aggregate. This is a long way from being an optimal(adj. 最佳的;最理想的) query plan, but it does produce correct results. (In case you are wondering, the Compute Scalar is there to ensure that the SUM aggregate returns NULL instead of zero when no rows are processed).
Matching and Applying Rules
The optimiser found this plan by replacing logical operations with one of the physical alternatives(n. 替代选择(alternative的复数);可供选择的事物) it knows about. This type of transformation is performed by an implementation rule within the optimiser. For example, the logical operation “Inner Join” was physically implemented by Nested Loops (other implementation rules exist for Merge and Hash join).
To get from the logical tree to the executable plan shown, a total of five implementation rules were successfully matched and run by the optimiser:
- GET to Scan
- JOIN to Nested Loops
- SELECT to Filter
- Group By Aggregate to Stream Aggregate
- Group By Aggregate to Hash Aggregate
The first rule replaces the logical GETs with table scans. The second rule implements the logical JOIN using Nested Loops as mentioned before. The third converts all the predicates (including the join predicate) into a Filter operator. The fourth and fifth rules represent two alternative strategies to physically perform the aggregation. In this case, the optimiser chose a Stream Aggregate over a Hash Aggregate, based on costing.
Ok, but no-one would buy SQL Server if it produced that sort of plan on a regular basis. Luckily, there are many other rules that the optimiser can use; in fact there are nearly four hundred(n. 一百;许多) in total. Rest(vt. 使休息,使轻松;) assured(adj. 确定的;自信的) that the query optimiser fought hard to resist(vi. 抵抗,抗拒;忍耐) producing such a dismal(adj. 凄凉的,忧郁的;阴沉的,沉闷的) plan: more than one dozen(n. 十二个,一打) separate rules had to be turned off to do so.
As well as more implementation rules, there are a number of exploration(n. 探测;探究;踏勘) and substitution(n. 代替;[数] 置换;代替物) rules. These transform([数] [电] 变换) some part of the logical request into an equivalent(adj. 等价的,相等的;同意义的) form, based on mathematical(adj. 数学的,数学上的;) equivalences or heuristics(n. 启发法;). A simple example of an exploration rule is 'Join Commute'. This rule exploits(v. 利用;) the fact that A JOIN B is the same as B JOIN A (for inner joins).
The optimiser also has simplification(n. 简单化;单纯化) rules and enforcer(n. 实施者;强制执行者) rules. Application of all these rules generates alternative strategies, the best of which will be incorporated(编入) in the final plan. As an aside, a single simplification rule was responsible for the transformation shown in my Segment Top post. That amazing transformation goes by the understated name "Generate Group By Apply".
Improving the Plan
An obvious deficiency(n. 缺陷,缺点;缺乏;) in the executable plan above is that it performs a Cartesian product of the two source tables followed by a Filter. It would be much more efficient(adj. 有效率的;有能力的;生效的) to evaluate(vt. 评价;估价;) the join predicate at the same time as performing the physical join. The rule that performs this task is called SELonJN (SELECT on JOIN). I should stress(vt. 强调;) that this is not the T-SQL SELECT statement, it is the relational SELECT operation: a filter applied to rows. By enabling the SELonJN optimisation rule, we get a better plan:

The Cartesian product has been replaced by a more normal-looking Nested Loops operator, which is a result of the join predicate being moved from the Filter into the join. In fact, the Filter operator has disappeared completely - where has the predicate “ProductNumber LIKE 'T%'” gone? It has also been pushed down the plan - all the way to the Product table scan.
That’s still not a great plan: we are scanning the whole Inventory table once for every row from the Product table that matches on the ProductNumber predicate. We need rules that know about non-clustered indexes - the basic rule we have relied(vi. (rely的过去式、过去分词形式)信任; 信赖; 依赖, 依靠) on so far simply translates a GET to a table scan. Enabling a couple more rules produces:

That's a little better, we are now using the correct non-clustered indexes but the predicates are being applied to a scan - not the index seek we might have hoped for. There are quite a few rules left to apply before we get a really efficient plan. I'll cover those next time.
浙公网安备 33010602011771号