对每个hall_id单独做计算,计算顺序依据start_day升序(体现在窗口函数OVER部分中);
作辅助列,对每一行,计算截至该行的最大的end_day时间值,记作max_end_day_so_far;
检查当前行start_day是否小于上一行的max_end_day_so_far,(注意到我们的计算都基于按start_day作升序排序)如果是说明这一行的start_day被包含在上一行的时间区间中,那么设置值为0,反之,设置值为1,记作range_start;
窗口累加range_start,记作range_grp,用于作为分组的标志;
按hall_id和range_grp进行分组,对于每个分组,MIN(d1)即为最终结果区间的开始,MAX(d2)即为最终结果区间的结束。

SELECT
hall_id,
start_day,
end_day,
MAX(end_day) OVER (PARTITION BY hall_id
ORDER BY
start_day) AS max_end_day_so_far
FROM
HallEvents
 
hall_id start_day end_day max_end_day_so_far
1 2023-01-13 2023-01-14 2023-01-14
1 2023-01-14 2023-01-17 2023-01-17
1 2023-01-18 2023-01-25 2023-01-25
2 2022-12-09 2022-12-23 2022-12-23
2 2022-12-13 2022-12-17 2022-12-23
3 2022-12-01 2023-01-30 2023-01-30
WITH t1 AS (
SELECT
hall_id,
start_day,
end_day,
MAX(end_day) OVER (PARTITION BY hall_id
ORDER BY
start_day) AS max_end_day_so_far
FROM
HallEvents)
SELECT
*,
CASE
WHEN start_day <= LAG(max_end_day_so_far) OVER (PARTITION BY hall_id
ORDER BY
start_day) THEN 0
ELSE 1
END AS range_start
FROM
t1
 
    end_day max_end_day_so_far range_start
1 2023-01-13 2023-01-14 2023-01-14 1
1 2023-01-14 2023-01-17 2023-01-17 0
1 2023-01-18 2023-01-25 2023-01-25 1
2 2022-12-09 2022-12-23 2022-12-23 1
2 2022-12-13 2022-12-17 2022-12-23 0
3 2022-12-01 2023-01-30 2023-01-30 1
WITH t1 AS (
SELECT
hall_id,
start_day,
end_day,
MAX(end_day) OVER (PARTITION BY hall_id
ORDER BY
start_day) AS max_end_day_so_far
FROM
HallEvents),
t2 AS (SELECT
*,
CASE
WHEN start_day <= LAG(max_end_day_so_far) OVER (PARTITION BY hall_id
ORDER BY
start_day) THEN 0
ELSE 1
END AS range_start
FROM
t1)
SELECT *,SUM(range_start) OVER (PARTITION BY hall_id ORDER BY start_day) AS range_grp FROM t2


hall_id start_day end_day max_end_day_so_far range_start range_grp
1 2023-01-13 2023-01-14 2023-01-14 1 1
1 2023-01-14 2023-01-17 2023-01-17 0 1
1 2023-01-18 2023-01-25 2023-01-25 1 2
2 2022-12-09 2022-12-23 2022-12-23 1 1
2 2022-12-13 2022-12-17 2022-12-23 0 1
3 2022-12-01 2023-01-30 2023-01-30 1 1
SELECT hall_id,
       MIN(start_day) AS start_day,
       MAX(end_day) AS end_day
FROM (
    SELECT *,
           SUM(range_start) OVER (PARTITION BY hall_id ORDER BY start_day) AS range_grp
    FROM (
        SELECT *,
               CASE WHEN start_day <= LAG(max_end_day_so_far) OVER (PARTITION BY hall_id ORDER BY start_day) THEN 0 
                    ELSE 1 END AS range_start
        FROM (
            SELECT hall_id,
                   start_day,
                   end_day,
                   MAX(end_day) OVER (PARTITION BY hall_id ORDER BY start_day) AS max_end_day_so_far
            FROM HallEvents
        ) t
    ) t1
) t2
GROUP BY hall_id, range_grp;

 

posted on 2023-04-16 10:07  白的枫叶  阅读(14)  评论(0)    收藏  举报