AMDP实现计算结果关联查询的写法效率对比
1、需求介绍
需求说明:查询acdoca,需要使用acdoca.rbukrs = znefit_cfl02.bukrs,判断当前公司在znefit_cfl02中是否维护。
如果没有维护,就获取acdoca.racct;
如果维护了,就获取acdoca.lokkt;
经过判断获取计算后的值作为acdoca的科目号zracct,并且这个科目号要和znefit_cfl01.racct匹配,确保科目在znefit_cfl01中。
测试数据:本次测试数据中acdoca一共有4亿7千万条数据,自身条件过滤后有1亿8千万条,与znefit_cfl01过滤后有2万6千条(最终结果)。
文章说明:因为要用判断后的racct结果关联znefit_cfl01表进行查询,因此出现以下几种不同的写法,
文章将一步步优化,展示优化的整个过程。
2、实现方法
方法1、分段获取
首先获取znefit_cfl02表中去重后的公司代码,得到公司代码的结果lt_bukrs
然后查询acdoca和lt_bukrs的结果集,并计算出科目racct,得到结果lt_acdoca
查询lt_acdoca和znefit_cfl01表,得到最终结果
最终耗时372044986微秒(6.2分钟)(后续都是相同的测试数据)
CLASS-METHODS get_cfl_data "版本1 AMDP OPTIONS CDS SESSION CLIENT current IMPORTING VALUE(iv_mandt) TYPE mandt VALUE(iv_bukrs) TYPE bukrs VALUE(iv_budat) TYPE budat EXPORTING VALUE(et_data) TYPE tt_cfl01. METHOD get_cfl_data BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT OPTIONS READ-ONLY USING acdoca znefit_cfl01 znefit_cfl02 t001 ekko kna1. -- 获取公司代码列表 lt_bukrs = SELECT bukrs FROM znefit_cfl02 WHERE bukrs <> '' and mandt = iv_mandt GROUP BY bukrs; lt_acdoca = select acdoca.rldnr, --总账会计中的分类账 acdoca.rbukrs as bukrs, --公司代码 acdoca.gjahr, --会计年度 acdoca.belnr, --会计凭证的凭证编号 acdoca.docln, --分类账 6 字符过账项目 right(acdoca.docln,3) as zbuzei, --过账项目 case when b.bukrs = '' or b.bukrs is null then acdoca.racct --总账科目 else acdoca.lokkt --备选科目 end as zracct, acdoca.koart, --科目类型 acdoca.xopvw, --未清项管理 acdoca.kunnr, --客户 kna1.name1 as zkhmc, case when acdoca.lifnr = '' and acdoca.ebeln <> '' then ekko.lifnr else acdoca.lifnr end as lifnr, --供应商 acdoca.ebeln, --采购订单号 acdoca.ebelp, --采购订单行项目 acdoca.budat, --过账日期 acdoca.augdt, --清账日期 acdoca.rwcur, --交易货币 acdoca.wsl, --交易货币金额 acdoca.rhcur, --本位币 acdoca.hsl, --本位币金额 acdoca.blart, -- acdoca.poper, -- t001.butxt --公司名称 from acdoca left outer join :lt_bukrs as b on acdoca.rbukrs = b.bukrs left outer join t001 on acdoca.rbukrs = t001.bukrs and t001.mandt = iv_mandt left outer join ekko on acdoca.ebeln = ekko.ebeln and ekko.mandt = iv_mandt left outer join kna1 on acdoca.kunnr = kna1.kunnr and kna1.mandt = iv_mandt where acdoca.rldnr = '0L' and acdoca.rbukrs = iv_bukrs and acdoca.budat <= iv_budat and( acdoca.augdt = '00000000' or acdoca.augdt is null or acdoca.augdt > iv_budat ) and acdoca.blart <> '' and acdoca.hsl <> 0 and acdoca.RCLNT = iv_mandt; et_data = select a.*, znefit_cfl01.zfw, --范围 znefit_cfl01.zfwms, --范围 znefit_cfl01.zzfw, --子范围 znefit_cfl01.txt20, -- a.bukrs as zbukrs from :lt_acdoca as a inner join znefit_cfl01 on a.zracct = znefit_cfl01.racct and znefit_cfl01.mandt = iv_mandt and znefit_cfl01.zfw not in ( 'E', 'F' ); --如果不需要从关联表中获取字段,只判断是否存在,可以用 -- where EXISTS ( SELECT 1 FROM lt_clf01 AS f WHERE a.calc_racct = f.racct ); ENDMETHOD.
推测查询缓慢原因:
获取lt_acdoca时,acdoca表未经过滤全部关联lt_bukrs、t001、kna1、ekko表,造成浪费。
优化措施:
先查询acdoca并做排除
将lt_bukrs、t001、kna1、ekko放在排除后再关联
优化后的结果是315499123微秒(5.2分钟)
METHOD get_cfl_data4 BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT OPTIONS READ-ONLY USING acdoca znefit_cfl01 znefit_cfl02 t001 ekko kna1. -- 创建临时表来存储中间结果 -- 获取公司代码列表 lt_bukrs = SELECT bukrs FROM znefit_cfl02 WHERE bukrs <> '' and mandt = iv_mandt GROUP BY bukrs; lt_acdoca_all = SELECT RCLNT, rldnr, rbukrs, gjahr, belnr, docln, racct, lokkt, koart, xopvw, kunnr, lifnr, ebeln, ebelp, budat, augdt, rwcur, wsl, rhcur, hsl, blart, poper FROM acdoca WHERE rldnr = '0L' AND rbukrs = iv_bukrs -- 参数绑定 AND budat <= iv_budat -- 参数绑定 AND (augdt = '00000000' OR augdt IS NULL OR augdt > iv_budat) -- 参数绑定 AND blart <> '' AND hsl <> 0 and acdoca.RCLNT = iv_mandt; lt_acdoca = SELECT a.*, CASE WHEN b.bukrs IS NOT NULL THEN a.lokkt ELSE a.racct END AS zracct from :lt_acdoca_all as a left outer join :lt_bukrs as b on a.rbukrs = b.bukrs; et_data = SELECT a.rldnr, --总账会计中的分类账 a.rbukrs as bukrs, --公司代码 a.gjahr, --会计年度 a.belnr, --会计凭证的凭证编号 a.docln, --分类账 6 字符过账项目 right(a.docln,3) as zbuzei, --过账项目 a.zracct as racct, a.koart, --科目类型 a.xopvw, --未清项管理 a.kunnr, --客户 kna1.name1 as zkhmc, case when a.lifnr = '' and a.ebeln <> '' then ekko.lifnr else a.lifnr end as lifnr, --供应商 a.ebeln, --采购订单号 a.ebelp, --采购订单行项目 a.budat, --过账日期 a.augdt, --清账日期 a.rwcur, --交易货币 a.wsl, --交易货币金额 a.rhcur, --本位币 a.hsl, --本位币金额 a.blart, -- a.poper, -- t001.butxt, --公司名称 znefit_cfl01.zfw, --范围 znefit_cfl01.zfwms, --范围 znefit_cfl01.zzfw, --子范围 znefit_cfl01.txt20, -- a.rbukrs as zbukrs from :lt_acdoca as a inner join znefit_cfl01 on a.zracct = znefit_cfl01.racct and znefit_cfl01.mandt = iv_mandt and znefit_cfl01.zfw not in ( 'E', 'F' ) left outer join t001 on a.rbukrs = t001.bukrs and t001.mandt = iv_mandt left outer join ekko on a.ebeln = ekko.ebeln and ekko.mandt = iv_mandt left outer join kna1 on a.kunnr = kna1.kunnr and kna1.mandt = iv_mandt; ENDMETHOD.
分析数据集,acdoca排除之后,会在lt_acdoca_all中获取1亿8千万条数据,通过与lt_bukrs关联,计算科目racct。
如果在和lt_bukrs关联时,加入DISTINCT
lt_acdoca =
SELECT DISTINCT
......
from :lt_acdoca_all as a
left outer join :lt_bukrs as b on a.rbukrs = b.bukrs;
整个查询过程,耗时减少到11879548微秒(11秒)
推测原因:虽然结果不会出现重复(lt_acdoca_all中的bukrs会有重复的,但是lt_bukrs的bukrs是唯一的),但使用DISTINCT查询耗时11秒,不使用DISTINCT查询耗时5分钟。
这是因为结果虽然不会重复,但是HANA优化器会根据此处的DISTINCT,进行预先的去重,减小连接过程中出现的巨大中间结果集,减少内存开销,大大提高查询效率。
进一步优化:如果对后续的查询也使用DISTINCT是不是效率更高?
①比如在最终查询时加上DISTINCT
et_data = SELECT DISTINCT
②或者先与znefit_cfl01关联,结果lt_base再与t001等表关联,在关联znefit_cfl01时加上DISTINCT,如下:
-- 第三步:最终关联配置表 lt_base = SELECT DISTINCT ...FROM :lt_acdoca AS a inner join znefit_cfl01 as f on a.racct = f.racct and f.mandt = iv_mandt and f.zfw not in ( 'E', 'F' ); et_data = select ... from :lt_base as a LEFT JOIN t001 AS t ON a.rbukrs = t.bukrs and t.mandt = iv_mandt LEFT JOIN ekko AS e ON a.ebeln = e.ebeln and e.mandt = iv_mandt LEFT JOIN kna1 AS k ON a.kunnr = k.kunnr and k.mandt = iv_mandt;
①②两种优化最终结果还是11秒左右,并没有变化,为什么多加一个DISTINCT没有效果?
推测原因:HANA优化器在执行时,识别第一次已经将数据去重,而第二次的去重操作,将不会改变数据结果,因此优化器识别冗余操作后,将消除冗余的 DISTINCT,将其合并为一个更优的去重操作,因此,第二次的DISTINCT并没有实现第一次的优化操作。
方法2、使用WITH AS (CTE)
步骤与上述一致,只是使用了WITH AS表达式
最终耗时需12941073微秒(约12秒)并没有什么效果
WITH AS表达式(即通用表表达式 CTE - Common Table Expressions),其主要作用在于提高代码的可读性、可维护性和结构化程度,并允许使用递归查询。对查询效率方面并无较大作用。
METHOD get_cfl_data2 BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT OPTIONS READ-ONLY USING acdoca znefit_cfl01 znefit_cfl02 t001 ekko kna1. et_data = WITH lt_bukrs AS ( SELECT DISTINCT bukrs FROM znefit_cfl02 WHERE bukrs <> '' and mandt = iv_mandt GROUP BY bukrs ), lt_acdoca_all AS ( -- 第一步:快速过滤ACDOCA SELECT RCLNT, rldnr, rbukrs, gjahr, belnr, docln, racct, lokkt, koart, xopvw, kunnr, lifnr, ebeln, ebelp, budat, augdt, rwcur, wsl, rhcur, hsl, blart, poper FROM acdoca WHERE rldnr = '0L' AND rbukrs = iv_bukrs -- 参数绑定 AND budat <= iv_budat -- 参数绑定 AND (augdt = '00000000' OR augdt IS NULL OR augdt > iv_budat) -- 参数绑定 AND blart <> '' AND hsl <> 0 and acdoca.RCLNT = iv_mandt ), lt_acdoca AS ( -- 第二步:关联基础信息 SELECT DISTINCT a.*, CASE WHEN c.bukrs IS NOT NULL THEN a.lokkt ELSE a.racct END AS zracct FROM lt_acdoca_all AS a LEFT JOIN lt_bukrs AS c ON a.rbukrs = c.bukrs ) select ...... from lt_acdoca as a inner join znefit_cfl01 on a.zracct = znefit_cfl01.racct and znefit_cfl01.mandt = iv_mandt and znefit_cfl01.zfw not in ( 'E', 'F' ) left outer join t001 on a.rbukrs = t001.bukrs and t001.mandt = iv_mandt left outer join ekko on a.ebeln = ekko.ebeln and ekko.mandt = iv_mandt left outer join kna1 on a.kunnr = kna1.kunnr and kna1.mandt = iv_mandt;ENDMETHOD.
方法3、计算结果关联查询
以上两种方法是分步查询,而本方法是直接用计算结果进行关联 znefit_cfl01 on ( case when b.bukrs is null then acdoca.racct else acdoca.lokkt end ) = znefit_cfl01.racct
耗时13058358 微秒(13秒)相对还慢一点
推测原因:可能由于关联条件的计算,加剧了逻辑复杂度,没有起到优化作用
METHOD get_cfl_data5 BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT OPTIONS READ-ONLY USING acdoca znefit_cfl01 znefit_cfl02 t001 ekko kna1. -- 获取公司代码列表 lt_bukrs = SELECT bukrs FROM znefit_cfl02 WHERE bukrs <> '' and mandt = iv_mandt GROUP BY bukrs; lt_acdoca_all = SELECT RCLNT, rldnr, rbukrs, gjahr, belnr, docln, racct, lokkt, koart, xopvw, kunnr, lifnr, ebeln, ebelp, budat, augdt, rwcur, wsl, rhcur, hsl, blart, poper FROM acdoca WHERE rldnr = '0L' AND rbukrs = iv_bukrs -- 参数绑定 AND budat <= iv_budat -- 参数绑定 AND (augdt = '00000000' OR augdt IS NULL OR augdt > iv_budat) -- 参数绑定 AND blart <> '' AND hsl <> 0 and acdoca.RCLNT = iv_mandt; lt_clf01 = select * from znefit_cfl01 where mandt = iv_mandt and znefit_cfl01.zfw not in ( 'E','F' ); et_data = select disTINCT ... from :lt_acdoca_all as a left outer join :lt_bukrs as b on a.rbukrs = b.bukrs inner join :lt_clf01 as znefit_cfl01 on ( case when b.bukrs is null then a.racct else a.lokkt end ) = znefit_cfl01.racct left outer join t001 on a.rbukrs = t001.bukrs and t001.mandt = iv_mandt left outer join ekko on a.ebeln = ekko.ebeln and ekko.mandt = iv_mandt left outer join kna1 on a.kunnr = kna1.kunnr and kna1.mandt = iv_mandt; ENDMETHOD.
方法4、UNION ALL
以上3种方法,都需要通过计算后的zracct连接znefit_cfl01进行过滤,造成需要设计成分步获取结果进行关联,或在关联条件中增加判断的方式。
而当acdoca.rbukrs不在znefit_cfl02中时,其实获取的就是acdoca.racct,即acdoca.racct = 计算后的zracct,反之则acdoca.lokkt = 计算后的zracct。
测试,如果直接用acdoca.racct = znefit_cfl01.racct
et_data =
SELECT
...
from :lt_acdoca as a
inner join znefit_cfl01 on a.racct = znefit_cfl01.racct
执行查询只需要646501微秒(0.6秒)
对于lt_acdoca来说,racct和zracct都是表中的字段,而且内容完全一致,为什么不同的字段连接,会导致20倍的效率差异?
推测原因:
acdoca.racct是一个原始、稳定的列。HANA优化器执行高效的哈希连接或索引扫描,从而实现快速连接。
acdoca.zracct是一个表达式列,缺少相应信息,HANA优化器不得不选择效率低的连接策略,导致性能极具下降。
所以据此可以改变一下思路,将连接分为两种情况,并用union all连接两部分内容:
第一部分:获取acdoca.lokkt = znefit_cfl01.racct
union all
第二部分:获取acdoca.racct = znefit_cfl01.racct
耗时514004微秒(0.5秒)
METHOD get_cfl_data3 BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT OPTIONS READ-ONLY USING acdoca znefit_cfl01 znefit_cfl02 t001 ekko kna1. lt_bukrs = SELECT bukrs FROM znefit_cfl02 WHERE bukrs <> '' and mandt = iv_mandt GROUP BY bukrs; et_data = select ...... acdoca.racct, --总账科目 ...... b.bukrs as zbukrs-- from acdoca inner join znefit_cfl01 on acdoca.racct = znefit_cfl01.racct and znefit_cfl01.zfw not in ( 'E', 'F' ) and znefit_cfl01.mandt = iv_mandt left outer join :lt_bukrs as b on acdoca.rbukrs = b.bukrs left outer join t001 on acdoca.rbukrs = t001.bukrs and t001.mandt = iv_mandt left outer join ekko on acdoca.ebeln = ekko.ebeln and ekko.mandt = iv_mandt left outer join kna1 on acdoca.kunnr = kna1.kunnr and kna1.mandt = iv_mandt where acdoca.rldnr = '0L' and acdoca.rbukrs = iv_bukrs and acdoca.budat <= iv_budat and( acdoca.augdt = '00000000' or acdoca.augdt is null or acdoca.augdt > iv_budat ) and acdoca.blart <> '' and acdoca.hsl <> 0 and acdoca.RCLNT = iv_mandt and b.bukrs is null union all select ...... acdoca.lokkt as racct, ...... b.bukrs as zbukrs -- from acdoca inner join znefit_cfl01 on acdoca.lokkt = znefit_cfl01.racct and znefit_cfl01.zfw not in ( 'E', 'F' ) and znefit_cfl01.mandt = iv_mandt inner join :lt_bukrs as b on acdoca.rbukrs = b.bukrs left outer join t001 on acdoca.rbukrs = t001.bukrs and t001.mandt = iv_mandt left outer join ekko on acdoca.ebeln = ekko.ebeln and ekko.mandt = iv_mandt left outer join kna1 on acdoca.kunnr = kna1.kunnr and kna1.mandt = iv_mandt where acdoca.rldnr = '0L' and acdoca.rbukrs = iv_bukrs and acdoca.budat <= iv_budat and( acdoca.augdt = '00000000' or acdoca.augdt is null or acdoca.augdt > iv_budat ) and acdoca.blart <> '' and acdoca.hsl <> 0 and acdoca.RCLNT = iv_mandt; ENDMETHOD.
综上,当面临需要使用计算之后的结果进行关联查询时,最快的方法是通过分情况查询再合并的方式,充分利用HANA优化器的优化作用,高效率进行数据关联查询。
本人能力有限,读者对上述AMDP实现过程有更好方式,望不吝赐教。
以上就是关于计算结果关联查询的介绍,希望对你有所帮助
定期更文,欢迎关注


浙公网安备 33010602011771号