随笔档案「2024年7月26日」：hive03_高级操作 ... - Stitches

2024年7月26日

摘要： Hive 分区表 https://blog.csdn.net/weixin_41122339/article/details/81584110 表在存储时，可以将数据根据分区键的列值存储在表目录的子目录中。这样将数据切分到不同目录存储后，可以加快对分区键字段的查询和过滤速度，通过在查询条件中指定过滤阅读全文

posted @ 2024-07-26 19:16 Stitches 阅读(85) 评论(0) 推荐(0)

hive02_SQL操作

摘要： Hive DDL 操作操作前需要保证 hive 成功启动： # 启动 HiveServer2 hive --service hiveserver2 & # 启动 MetaStore hive --service metastore & # 进入 hive 命令行界面 beeline -u jdbc 阅读全文

posted @ 2024-07-26 19:15 Stitches 阅读(43) 评论(0) 推荐(0)

06_sparkStreaming

摘要： SparkStreaming sparkStreaming 用于处理流式数据，其中输入数据源包括 Kafka、Flume、HDFS 等；结果输出目的地址包括 HDFS、数据库。 SparkCore 对应 RDD；SparkSQL 对应 DataFrame/DataSet；SparkStreaming 阅读全文

posted @ 2024-07-26 14:51 Stitches 阅读(37) 评论(0) 推荐(0)

istitches

公告