一:场景描述

表A,B都是如下结构:

标识 名称 删除状态
id na'me delete

现要查出A表中name字段在B表中的内容,然后找出其中name有重复的内容,并逻辑删除名称相同的行中id最大之外的所有行

二:查出重复内容

SELECT * 
FROM A
WHERE A.NAME IN(
  SELECT A.NAME
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1
);

三:查出重复内容中ID最大的行

SELECT * 
FROM A
WHERE A.NAME IN(
  SELECT A.NAME
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1)
AND A.ID IN(
  SELECT MAX(A.ID)
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1
);

四:以update方法进行更新

UPDATE A
SET A.DELETE=1
WHERE A.NAME IN(
  SELECT A.NAME
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1)
AND A.ID IN(
  SELECT MAX(A.ID)
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1
);

该方法效率极低,从30W条数据中更新3万条耗时80分钟尚未处理完成

五:当有建表权限时,通过新建中转表实现数据更新,器具体步骤如下:

       1.用上述查询语句查询出需要更新的内容

       2.新建表C,复制表A结构,并接受步骤一中超讯出来的数据

       3.通过表C更新表A 代码如下:

UPDATE A
SET A.DELETE=1
WHERE A.ID IN(
  SELECT C.ID
  FROM C
);

通过新建表更新,代码的执行时间为8S,但前提是需要有建表权限

六:使用merge into进行更新

      merge into在实际更新中效率最高,且不占用其他空间,也不需要其他权限,但语法较为复杂,容易对表进行误操作,具体代码如下

MERGE INTO A
USING(
  SELECT A.NAME,MAX(A.ID) AS ID
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1
) Q
ON(A.NAME=Q.NAME AND A.ID<>Q.ID)
WHEN MATCHED THEN
UPDATE SET A.DELETE=1;

同样的数据量执行时间3S

posted on 2018-04-28 11:46  xiangxiantao  阅读(135)  评论(0编辑  收藏  举报