spark整合kafka - 随笔分类 - icecola

DStream转为DF的两种方式（突破map时元组22的限制）

摘要：在进行Spark Streaming的开发时，我们常常需要将DStream转为DataFrame来进行进一步的处理，共有两种方式，方式一：利用map算子和tuple来完成，一般的场景下采用这种方式即可。但是有的时候我们会遇到列数大于22的情况，这个时候会受到scala的tuple数不能超过22 阅读全文

posted @ 2019-07-12 16:01 icecola 阅读(2470) 评论(0) 推荐(0)

java.lang.reflect.InvocationTargetException at shade.com.datastax.spark.connector.google.common.base.Throwables.propagate(Throwables.java160)

摘要：org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 29.1 failed 4 times, most recent failure: Lost task 1.3 in stage 29 阅读全文

posted @ 2019-07-11 20:36 icecola 阅读(510) 评论(0) 推荐(0)

利用mapWithState实现按照首字母统计的有状态的wordCount

摘要：利用mapWithState算子实现有状态的wordCount，且按照word的第一个字母为key，但是要求输出的格式为(word,1)这样形式的结果阅读全文

posted @ 2019-07-07 13:22 icecola 阅读(1192) 评论(0) 推荐(0)

icecola

随笔分类 - spark整合kafka

公告