12.25随笔

今天研究在虚拟机上清洗数据以及文件的可视化。
以下是命令行。
说起来今天是圣诞节,虽然我一般不过洋节

1. 启动Hive

hive

================== Hive 交互式命令 ==================

2. 开启本地模式(加速小数据量查询)

set hive.exec.mode.local.auto=true;

3. 清空历史表数据(如果表已存在)

TRUNCATE TABLE data;

4. 导入清洗后的clean_final.txt到data表(核心步骤,之前遗漏的关键)

LOAD DATA LOCAL INPATH '/home/hadoop/clean_final.txt' INTO TABLE data;

5. 验证数据导入成功

SELECT * FROM data LIMIT 10;

6. 统计1:视频/文章访问量Top10 → 导出到本地目录

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/stat_top10_type_id'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
SELECT type, id, COUNT(*) AS visit_count FROM data GROUP BY type, id ORDER BY visit_count DESC LIMIT 10;

7. 统计2:地市维度Top10 → 导出到本地目录

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/stat_top10_city'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
SELECT city, type, id, COUNT(*) AS visit_count FROM data GROUP BY city, type, id ORDER BY visit_count DESC LIMIT 10;

8. 统计3:视频/文章流量Top10 → 导出到本地目录

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/stat_top10_traffic'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
SELECT type, id, SUM(traffic) AS total_traffic FROM data GROUP BY type, id ORDER BY total_traffic DESC LIMIT 10;

9. 退出Hive

exit;

================== Hive 命令结束 ==================

10. 检查Hive导出的统计结果文件

ls -lh /home/hadoop/stat_top10_type_id/
cat /home/hadoop/stat_top10_type_id/000000_0

ls -lh /home/hadoop/stat_top10_traffic/
cat /home/hadoop/stat_top10_traffic/000000_0

1. 以开启local_infile模式登录MySQL(密码:Hive@123456)

mysql -uroot -pHive@123456 --local-infile=1

================== MySQL 交互式命令 ==================

2. 创建统计数据库

CREATE DATABASE IF NOT EXISTS hive_stat DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
USE hive_stat;

3. 创建3张统计表

CREATE TABLE IF NOT EXISTS top10_type_id (
type VARCHAR(20),
id VARCHAR(20),
visit_count INT
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

CREATE TABLE IF NOT EXISTS top10_city (
city VARCHAR(20),
type VARCHAR(20),
id VARCHAR(20),
visit_count INT
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

CREATE TABLE IF NOT EXISTS top10_traffic (
type VARCHAR(20),
id VARCHAR(20),
total_traffic BIGINT
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

4. 导入统计数据(关键:INFILE 而非 INPATH)

LOAD DATA LOCAL INFILE '/home/hadoop/stat_top10_type_id/000000_0'
INTO TABLE top10_type_id
FIELDS TERMINATED BY ',';

LOAD DATA LOCAL INFILE '/home/hadoop/stat_top10_city/000000_0'
INTO TABLE top10_city
FIELDS TERMINATED BY ',';

LOAD DATA LOCAL INFILE '/home/hadoop/stat_top10_traffic/000000_0'
INTO TABLE top10_traffic
FIELDS TERMINATED BY ',';

5. 验证数据导入成功

SELECT * FROM top10_type_id;
SELECT * FROM top10_traffic;

6. 退出MySQL

exit;

================== MySQL 命令结束 ==================

1. 切换到可视化脚本目录

cd /home/hadoop

2. 运行修复后的可视化脚本(生成柱状图)

python /home/hadoop/visualize_data.py

3. 验证可视化图片生成

ls -lh /home/hadoop/*.png

posted @ 2025-12-25 22:04  Thanatos。syts  阅读(3)  评论(0)    收藏  举报