SPARK - 随笔分类 - dch_21

Distinc window functions are not supported

摘要：现象 spark 不支持count(distinct)开窗解决方案 size（） + collect_size() over() 阅读全文

posted @ 2021-10-18 23:06 dch_21 阅读(764) 评论(0) 推荐(0)

摘要：场景配置完scala(2.11)和spark(2.2.0)的windows环境后,写了个worldcount案例。加入以下依赖，然后编写wordcount <properties> <spark-version>2.2.0</spark-version> <scala-version>2.11</ 阅读全文

posted @ 2021-10-08 23:54 dch_21 阅读(338) 评论(0) 推荐(0)

spark-submit参数之excutor-memory

摘要：https://blog.csdn.net/zhuiqiuuuu/article/details/86539385 excutor-memory底层是如何分配的源码如下 // 默认值，1024MB var executorMemory = 1024 // OverHead 比例参数，默认0.1 v 阅读全文

posted @ 2020-11-07 20:06 dch_21 阅读(183) 评论(0) 推荐(0)

spark双流join

摘要：https://blog.csdn.net/dinghua_xuexi/article/details/107943242 背景在构建实时数仓过程中，有时需要将两个实时数据源进行关联，生成大宽表数据，这时就不得不用到双流join。场景比如有这样的场景，订单实时数据源，和订单物品实时数据源。订单阅读全文

posted @ 2020-11-07 16:46 dch_21 阅读(571) 评论(0) 推荐(0)

dch_21

随笔分类 - SPARK

公告