TPC-H 分析

TCPH的表结构

最小的且不变的,

- Region,Nation(FK,REGION记录属于哪个区域)

1万级别,

- Supplier(供应商),Skey,Name,地址,NKey(FK,记录所属国家key),电话,acctbal(Account Balance,账户余额)

10万级别,

Customer(15万),Skey,Name,地址,NKey(FK,记录所属国家key),电话,acctbal,MKTSEGMENT(marktsegment,细分市场)

Part(零件)(20万),Pkey,Name,MFGR(manufacturer,制造商),品牌,类型,大小,container(包装,容器),零售价

PartSupp(零件供应商关系表)(80万),Pkey,Skey,AVAILQTY(供应量),SupplyCost(供应价格,成本)

100万级别,

Orders(订单)(150万),Okey,Ckey(FK,记录订单来自客户的key),订单状态,订单总价格,订单优先级,收银员,发货优先级

LineItem(订单流水明细,订单,零件,供应商的关系表)(600万)

Okey,PKey,Skey,流水号,数量,extendedprice(扩展成本,单位成本*数量),折扣,税,退回标记,流水状态,

shipdate(发货日期),commitdate(预计到货日期),receiptdate(实际收货日期),shipinstruct(运输策略),shipmode(运输途径)

 

Pricing Summary Report Query (Q1) 

对于lineitem的groupby的能力

groupby, CP 1.3,small group by keys

表达式,

- CP 4.1a,d

- CP 4.2a,公共子表达式

 

Minimum Cost Supplier Query (Q2) 

找出特定区域中,为某些part提供最低供应价的suppliers

Join,CP 2.2(sparse foreign key),这里对于part的筛选会导致sparse

字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query

子查询,CP5.2: Moving Predicates into a Subquery

 

Shipping Priority Query (Q3) 

大于某日期未发货的订单,订单中未发货的的流水的总revenue,并按revenue,订单日期排序,并筛选customer的细分市场

Join,3表join,((customer,order),lineitem)

Groupby,

- CP1.2,考虑sortAgg,HashJoin,以order为probe side

- CP1.4

Locality,CP3.2,shipdate推导到orderdate

 

Order Priority Checking Query (Q4) 

每个订单优先级,客户收货有延迟的订单数量

子查询,

Groupby,CP1.2,CP1.4

 

 

Local Supplier Volume Query (Q5) 

Local Supplier,即supplier和customer属于同一个nation

一年内,某个region中,每个nation的满足local supplier的订单流水的revenue的和,并按revenue排序

Join,

- 6表连接,CP 2.3

CP2.4(late projection)

Forecasting Revenue Change Query (Q6) 

单纯的lineitem的过滤

常量折叠

算术表达式,CP4.1c,压缩执行,CP 4.2d: Evaluation Order in Conjunctions and Disjunctions.

Volume Shipping Query (Q7) 

两年间,两国之间的交易量

CP4.2b,join相关表达式下推

 

National Market Share Query (Q8) 

6表join,筛选出满足条件的流水的年份,折扣后金额,供应商名

非关联子查询,

表达式,

select
    o_year,
    sum(case
        when nation = '[NATION]'
        then volume
        else 0
    end) / sum(volume) as mkt_share
from (
    select
        extract(year from o_orderdate) as o_year,
        l_extendedprice * (1-l_discount) as volume,
        n2.n_name as nation
    from
        part,
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2,
        region
    where
        p_partkey = l_partkey
        and s_suppkey = l_suppkey
        and l_orderkey = o_orderkey
        and o_custkey = c_custkey
        and c_nationkey = n1.n_nationkey
        and n1.n_regionkey = r_regionkey
        and r_name = '[REGION]'
        and s_nationkey = n2.n_nationkey
        and o_orderdate between date '1995-01-01' and date '1996-12-31'
        and p_type = '[TYPE]'
    ) as all_nations
group by
    o_year
order by
    o_year;

 

Product Type Profit Measure Query (Q9) 

profit,收益,流水的extended的价格 - 供应总价格

对满足条件的part的订单流水的收益,按nation和年份进行groupby,得到某nation,某年的收益

Join,CP2.1(large join),6表join,CP2.3

字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query

 

Returned Item Reporting Query (Q10) 

3个月内,有退货的客户,总退货款的倒序

GroupBy,CP 1.4(Dependent Group-By Keys),虽然groupby那么多字段,其实都是depend on custKey,所以可以消除

Join,CP2.4(late projection)

Important Stock Identification Query (Q11) 

零件的存货价值,超过nation所有零件的存货价值的一定比例,大宗货品;

CP5.3: Overlap between Outer- and Subquery. 去关联重写

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
from
    partsupp,
    supplier,
    nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = '[NATION]'
group by
    ps_partkey having
        sum(ps_supplycost * ps_availqty) > (
            select
                sum(ps_supplycost * ps_availqty) * [FRACTION]
            from
                partsupp,
                supplier,
                nation
            where
                ps_suppkey = s_suppkey
                and s_nationkey = n_nationkey
                and n_name = '[NATION]'
        )
order by
    value desc;

 

Shipping Modes and Order Priority Query (Q12) 

一年内收货的,按时shipping的,按shipmode groupby,订单优先级的sum

Locality,CP3.2,receiptdate推导到orderdate

CP 4.2c: Large IN Clauses.

 

Customer Distribution Query (Q13) 

客户订单数,customer和order join,并count groupby orderkey获取,并对订单有字符串匹配的筛选

根据客户订单数进行groupby和排序,统计不同订单数的客户数,获取客户在订单数上的分布

Left outer join = left join

非关联子查询,

字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query,匹配comment字段的性能会成为瓶颈

Join,CP2.1(Group Join),在customer和orders基于custKey join的同时,完成对于custKey的groupby count

Promotion Effect Query (Q14) 

全局aggregation

CP 4.3a: Rewrite LIKE(X%) into Range Query

select 
    100.00 * sum(case 
               when p_type like 'PROMO%' 
               then l_extendedprice*(1-l_discount) 
               else 0 
    end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue 
from 
    lineitem, 
    part 
where 
    l_partkey = p_partkey
    and l_shipdate >= date '[DATE]'
    and l_shipdate < date '[DATE]' + interval '1' month;

Top Supplier Query (Q15) 

创建view revenue,一个季度,所有供应商,各自的总供应金额;view既是对于非关联子查询的优化

然后join,找出所有供应金额等于最大的供应金额的供应商

CP5.3: Overlap between Outer- and Subquery. 找出最大供应金额的子查询,做一遍即可

create view revenue[STREAM_ID] (supplier_no, total_revenue) as
    select
        l_suppkey,
        sum(l_extendedprice * (1 - l_discount))
    from
        lineitem
    where
        l_shipdate >= date '[DATE]'
        and l_shipdate < date '[DATE]' + interval '3' month
    group by
        l_suppkey;
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,
    revenue[STREAM_ID]
where
    s_suppkey = supplier_no
    and total_revenue = (
        select
            max(total_revenue)
        from
            revenue[STREAM_ID]
    )
order by
    s_suppkey;
drop view revenue[STREAM_ID];

 

Parts/Supplier Relationship Query (Q16) 

表达式,

CP 4.2c: Large IN Clauses.

CP 4.3a: Rewrite LIKE(X%) into Range Query

Small-Quantity-Order Revenue Query (Q17) 

小订单流水,该流水的量低于该零件订单平均量的20%

某种零件,小订单年平均销售总额

关联子查询,

- CP5.2: Moving Predicates into a Subquery

- CP5.3: Overlap between Outer- and Subquery.

select
    sum(l_extendedprice) / 7.0 as avg_yearly
from
    lineitem,
    part
where
    p_partkey = l_partkey
    and p_brand = '[BRAND]'
    and p_container = '[CONTAINER]'
    and l_quantity < (
        select
            0.2 * avg(l_quantity)
        from
            lineitem
        where
            l_partkey = p_partkey
    );

 

Large Volume Customer Query (Q18) 

大订单,即customer的某个order的总quantity大于某个阈值,

列出所有客户的大订单,并且按订单总价格排序

子查询,

GroupBy,CP 1.2,CP1.4

Join,CP2.1(large join)

Discounted Revenue Query (Q19) 

过滤条件比较复杂,

CP4.2b,join相关表达式下推

CP4.2c: Large IN Clauses.

CP4.2d: Evaluation Order in Conjunctions and Disjunctions.

 

Potential Part Promotion Query (Q20) 

对特定nation的供应商的筛选,

供应商有满足某种特性的part,这里是name中包含某种color

且当前part的带供货的数量,超出该供应商一年中对于该零件的供应量的一半

字符串表达式,CP 4.3a: Rewrite LIKE(X%) into Range Query

关联子查询,

- CP5.2: Moving Predicates into a Subquery

- CP5.3: Overlap between Outer- and Subquery. 去关联重写

 

Suppliers Who Kept Orders Waiting Query (Q21) 

supplier超期,lineitem.receiptdate > lineitem.commitdate

supplier block order, 订单中该supplier的流水超期,并且该订单存在其他的supplier,并且其他的supplier没有超期

找出所有超期的supplier,并统计超期的订单数

子查询,CP5.1: Flattening Subqueries,exists转成equiv-join,not exists转成anti-join

 

Global Sales Opportunity Query (Q22) 

CP 4.2c: Large IN Clauses.

 

posted on 2021-09-28 17:29  fxjwind  阅读(485)  评论(0编辑  收藏  举报