数据开发工程师面试题-窗口函数

表名: 交易明细表

字段: 产品、交易时间(精确到时分秒)、交易金额。

进行以下四个统计需求的实现:

1)、统计每个产品每天的总交易额

2)、统计每个产品每天的当月总交易额

3)、统计每天当日交易额排名前三的产品

4)、统计每个产品每天的最近7天总交易额

 

例如建表语句如下:

CREATE TABLE transaction_detail (
product string COMMENT '产品',
transaction_time datetime COMMENT '交易时间',
amount decimal(20,2) COMMENT '交易金额'
) ;

以下是使用HiveSQL实现四个统计需求的完整解决方案,结合窗口函数和聚合函数实现:

1. 统计每个产品每天的总交易额

SELECT 
    product,
    to_date(transaction_time) AS trans_date,
    SUM(amount) AS daily_total
FROM transaction_detail
GROUP BY product, to_date(transaction_time);

说明​​:

  • 按产品和交易日期分组,使用SUM()聚合函数计算每日交易总额
  • to_date()函数提取交易时间的日期部分

 

2. 统计每个产品每天的当月总交易额

WITH daily_summary AS (
    SELECT 
        product,
        to_date(transaction_time) AS trans_date,
        SUM(amount) AS daily_total
    FROM transaction_detail
    GROUP BY product, to_date(transaction_time)
)
SELECT 
    product,
    trans_date,
    daily_total,
    SUM(daily_total) OVER (
        PARTITION BY product, month(trans_date) 
        ORDER BY trans_date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS monthly_total
FROM daily_summary;

​​说明​​:

  • 先创建CTE计算每日交易额
  • 使用窗口函数SUM() OVER()按产品和月份分区,计算整个月的累计交易额
  • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING确保计算整个分区。
  • 如果需要计算截至当天的累计交易额,需要将UNBOUNDED FOLLOWING替换为CURRENT ROW。

 

3. 统计每天交易额排名前三的产品

WITH daily_product_summary AS (
    SELECT 
        to_date(transaction_time) AS trans_date,
        product,
        SUM(amount) AS daily_total
    FROM transaction_detail
    GROUP BY to_date(transaction_time), product
),
ranked_products AS (
    SELECT 
        trans_date,
        product,
        daily_total,
        RANK() OVER (
            PARTITION BY trans_date 
            ORDER BY daily_total DESC
        ) AS rank
    FROM daily_product_summary
)
SELECT 
    trans_date,
    product,
    daily_total
FROM ranked_products
WHERE rank <= 3;

​​说明​​:

  • 先按日期和产品分组计算每日交易额
  • 使用RANK() OVER()窗口函数按日期分区、交易额降序排名
  • 筛选排名≤3的记录

 

4. 统计每个产品每天的最近7天总交易额

WITH daily_summary AS (
    SELECT 
        product,
        to_date(transaction_time) AS trans_date,
        SUM(amount) AS daily_total,
        unix_timestamp(to_date(transaction_time)) AS date_ts  -- 转换为时间戳
    FROM transaction_detail
    GROUP BY product, to_date(transaction_time)
)
SELECT 
    product,
    trans_date,
    daily_total,
    SUM(daily_total) OVER (
        PARTITION BY product 
        ORDER BY date_ts 
        RANGE BETWEEN 6 * 86400 PRECEDING AND CURRENT ROW  -- 6天*86400秒
    ) AS last7d_total
FROM daily_summary;

​​说明​​:

  • 先计算每日交易额并转换为Unix时间戳
  • 使用RANGE BETWEEN指定时间范围窗口(7天=604,800秒)
  • 按产品分区,计算滚动7天交易总额
     

关键函数说明:

    1. ​​窗口函数结构​​:
      函数 OVER (PARTITION BY ... ORDER BY ... [窗口范围])
       
    2. ​​时间范围窗口​​:
      • RANGE BETWEEN:按实际时间值计算范围(需数值类型)
      • ROWS BETWEEN:按物理行数计算范围
         
    3. ​​排名函数区别​​:
      • RANK():并列排名会跳过后续名次(1,1,3)
      • DENSE_RANK():并列不跳名次(1,1,2)
      • ROW_NUMBER():连续编号(1,2,3)
posted @ 2025-05-29 10:17  业余砖家  阅读(173)  评论(0)    收藏  举报