【Complete Showplan Operators】Chapter 6: Spools – Lazy Spool

The Lazy Spool is actually very similar to the Eager Spool; the difference is just that Lazy Spool reads data only when individual(adj. 个人的;个别的;独特的) rows are required. It creates a temporary table and builds this table in a "lazy" manner(n. 方式;习惯;种类;规矩;风俗); that is, it reads and stores the rows in a temporary table only when the parent operator actually asks for a row, unlike Eager Spool, which reads all rows at once. To refer(vi. 参考;涉及;提到;查阅) back to some material I covered in the Eager Spool explanation, the Lazy Spool is a non-blocking operator, whereas(conj. 然而;鉴于;反之) Eager Spool is a blocking operator.
To highlight the Lazy Spool, we'll create a table called Pedido (which means "Order" in Portuguese). The following script will create a table and populate it with some garbage data.

IF OBJECT_ID('Pedido') IS NOT NULL
    DROP TABLE Pedido
GO
CREATE TABLE Pedido
    (
      ID INT IDENTITY(1, 1)
             PRIMARY KEY ,
      Cliente INT NOT NULL ,
      Vendedor VARCHAR(30) NOT NULL ,
      Quantidade SMALLINT NOT NULL ,
      Valor NUMERIC(18, 2) NOT NULL ,
      Data DATETIME NOT NULL
    )
DECLARE @I SMALLINT
SET @I = 0
WHILE @I < 50
    BEGIN
        INSERT  INTO Pedido
                ( Cliente ,
                  Vendedor ,
                  Quantidade ,
                  Valor ,
                  Data
                )
                SELECT  ABS(CHECKSUM(NEWID()) / 100000000) ,
                        'Fabiano' ,
                        ABS(CHECKSUM(NEWID()) / 10000000) ,
                        ABS(CONVERT(NUMERIC(18, 2), ( CHECKSUM(NEWID())
                                                      / 1000000.5 ))) ,
                        GETDATE() - ( CHECKSUM(NEWID()) / 1000000 )
        INSERT  INTO Pedido
                ( Cliente ,
                  Vendedor ,
                  Quantidade ,
                  Valor ,
                  Data
                )
                SELECT  ABS(CHECKSUM(NEWID()) / 100000000) ,
                        'Amorim' ,
                        ABS(CHECKSUM(NEWID()) / 10000000) ,
                        ABS(CONVERT(NUMERIC(18, 2), ( CHECKSUM(NEWID())
                                                      / 1000000.5 ))) ,
                        GETDATE() - ( CHECKSUM(NEWID()) / 1000000 )
        INSERT  INTO Pedido
                ( Cliente ,
                  Vendedor ,
                  Quantidade ,
                  Valor ,
                  Data
                )
                SELECT  ABS(CHECKSUM(NEWID()) / 100000000) ,
                        'Coragem' ,
                        ABS(CHECKSUM(NEWID()) / 10000000) ,
                        ABS(CONVERT(NUMERIC(18, 2), ( CHECKSUM(NEWID())
                                                      / 1000000.5 ))) ,
                        GETDATE() - ( CHECKSUM(NEWID()) / 1000000 )
        SET @I = @I + 1
    END
SET @I = 1
WHILE @I < 3
    BEGIN
        INSERT  INTO Pedido
                ( Cliente ,
                  Vendedor ,
                  Quantidade ,
                  Valor ,
                  Data
                )
                SELECT  Cliente ,
                        Vendedor ,
                        Quantidade ,
                        Valor ,
                        Data
                FROM    Pedido
        SET @I = @I + 1
    END

This is what the data looks like:

To understand the Lazy Spool, I wrote a query that returns all Orders where the Order value is lower than the average value of all the relevant(adj. 相关的;切题的;) customer's orders. That sounds a little convoluted(adj. 复杂的;费解的;), so let's just look at the query.

SELECT Ped1.Cliente, Ped1.Valor
FROM Pedido Ped1
WHERE Ped1.Valor < (
SELECT AVG(Ped2.Valor)
FROM Pedido Ped2
WHERE Ped2.Cliente = Ped1.Cliente)

Before we see the execution plan, let's make sure we understand the query a little better.
First, for each customer in the FROM table (Ped1.Cliente), the SubQuery returns the average value of all orders (AVG(Ped2.Valor)). After that, the average is compared with the principal(adj. 主要的;首要) query and used to flter just each customer's orders with values lower than their average.

So, now we have the following plans:

 

Graphical execution plan. 

|--Nested Loops(Inner Join)
|--Table Spool
| |--Segment
| |--Sort(ORDER BY:([Ped1].[Cliente] ASC))
| |--Clustered Index Scan(OBJECT:([Pedido].[PK__Pedido__6E01572D] AS [Ped1]))
|--Nested Loops(Inner Join, WHERE:([Pedido].[Valor] as [Ped1].[Valor]<[Expr1004]))
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1012]=(0) THEN NULL ELSE
|--[Expr1013]/CONVERT_IMPLICIT(numeric(19,0),[Expr1012],0) END))
| |--Stream Aggregate(DEFINE:([Expr1012]=Count(*),
|--[Expr1013]=SUM([Pedido].[Valor] as [Ped1].[Valor])))
| |--Table Spool
|--Table Spool

Text execution plan.

As we can see, the Spool operator is displayed three times in the execution plan, but that doesn't mean that three temporary tables were created. All the Spools are actually
using the same temporary table, which can be verifed if you look at the operator's hints displayed in the graphical execution plan below.

As you can see, the frst Spool hint has the Node ID equal to 2, and the other two operators are referenced to the Primary Node 2 as well. Now let's look at a step-by-step description of the execution plan so that we can understand exactly what it is doing. Note this is not the exact(adj. 准确的,精密的;精确的) order of execution of the operators, but I think you'll understand things better in this way.
The frst step in the execution plan is to read all of the data that will be used in the query,and then group that data by customer.

The Clustered Index Scan operator reads all rows from the Cliente and Valor (Client and Value) columns. So, the Input of this operator is the following:

… and the Output is just the Client and Value columns:

When the SORT operator receives the rows from the clustered index scan, its output is all the data, ordered by the Client column:

 

The Segment operator divides the data into many groups; in this case it receives all the rows, ordered by customers, and divides them into groups that share the same costumer.
So, the frst segment produced by this operator will be all the rows where "Client = 0". Given that the data is already sorted by customers, the operator just needs to read down the rows until it fnds a different value in order to create a segment.
When the value it is reading changes, it fnishes its job and the next operator immediately receives the segment of all the data for "Customer 0". This process will repeat until all segments are completely read. The fnal output of the Segment operator is a series of segments dividing all the data according to customer, such that each segment contains all the rows for a particular customer. In this walkthrough, we'll look at all the rows where "Client=0".

 

Here we get on to the Table Spool operator, working as a "Lazy" Spool. It will create a temporary table in the TempDb database, and store all data returned from the Segment operator; in this case, all the data for customer 0. The output of the Spool operator is just all data stored in the TempDb table.

The Nested Loops operator joins the frst and second parts of the execution plan, or rather, the principle(n. 原理,原则;) query with the subquery. As we now, the nested loops scan a table and join it with another table one row at time, and so for each row in the Table Spool (Item 4) the nested loop will join the result of Item 11. To give you a quick preview, this result will be the rows where the Value column in the Spool table is lower than the value calculated in the aggregation (Item 8 – the average value of the customer's orders). When this step is fnished, the Spool operator (Item 4) is called again, and it in turn calls the Segment operator, which reads another segment of rows (i.e. processes another customer). This cycle repeats until all rows are read.
Now, let's to go to the second part of this plan, which will run the SubQuery that returns the average order value for one costumer.


To start with, the execution plan reads the data from the Lazy Spool and passes the results to the aggregate to calculate the average. Remember that the rows in the Spool operator are currently only the rows for "Customer 0."

The Stream Aggregate operator will calculate the average of the value column, returning one row as an Output value.

The Compute Scalar operator, covered in an earlier chapter, will convert the result of the aggregation into a Numeric Datatype, and pass the Output row to the Nested Loops operator in Step 9.

The last Table Spool is used to once again read the "Client=0" rows from the Spool table, which will be joined with the result of the compute scalar. 

The Nested Loops operator performs an iterative inner join; in this case, for each row returned by the computed scalar, it scans the Spool table and returns all rows that satisfy the condition of the join. Specifcally, it returns the rows where the Value column in the
Spool table is lower than the value calculated in the aggregation.

.........................................................................................................................................................................................

Tip
If you create an index on the Pedido table covering the Client column and include the Value column,

you will optimize the query because the SORT operator will not be necessary, and it costs 62% of the whole query.

.........................................................................................................................................................................................

We saw that the Query Optimizer can use the Lazy Spool operator to optimize some queries by avoiding having to read the same values multiple times.

Because SQL Server uses the Spool Lazy, the SQL works with just one chunk(语块;文件块 ;分块;数据块) of the data in all operations, as opposed(adj. 相反的;敌对的) to having to constantly(adv. 不断地;时常地) fetch new data with each iteration(n. [数] 迭代;反复;重复). Clearly, that translates into a great performance gain. 

posted @ 2017-07-14 15:57  FH1004322  阅读(148)  评论(0)    收藏  举报