spark踩坑记录

spark、spark调优、spark streaming常见问题总结

1.server.TransportChannelHandler: Exception in connection from xxxxxx。java.io.IOException: Connection reset by peer

增加内存后解决了问题

 

2.集群模式下spark应用ID应该是local开头,如果是application开头,则要看下代码里setMaster方法是否写死了local模式

 

3.参数优化最终参数

spark-submit \
--master yarn \
--num-executors 50 \
--executor-memory 4G \
--executor-cores 2 \
--driver-memory 8G \
--conf spark.default.parallelism=200 \
--conf spark.yarn.executor.memoryOverhead=2048 \
--conf spark.storage.memoryFraction=0.5 \
--conf spark.shuffle.memoryFraction=0.3 \
--class com.xxx.rc.enginetask.service.article.ArticleHBaseALS

 

 

4.Error:(18, 31) type arguments [String,String,io.netty.handler.codec.string.StringDecoder,io.netty.handler.codec.string.StringDecoder] conform to the bounds of none of the overloaded alternatives of

value createDirectStream: [K, V, KD <: kafka.serializer.Decoder[K], VD <: kafka.serializer.Decoder[V]](jssc: org.apache.spark.streaming.api.java.JavaStreamingContext, keyClass: Class[K], valueClass: Class[V], keyDecoderClass: Class[KD], valueDecoderClass: Class[VD], kafkaParams: java.util.Map[String,String], topics: java.util.Set[String])org.apache.spark.streaming.api.java.JavaPairInputDStream[K,V] <and> [K, V, KD <: kafka.serializer.Decoder[K], VD <: kafka.serializer.Decoder[V]](ssc: org.apache.spark.streaming.StreamingContext, kafkaParams: Map[String,String], topics: Set[String])(implicit evidence$19: scala.reflect.ClassTag[K], implicit evidence$20: scala.reflect.ClassTag[V], implicit evidence$21: scala.reflect.ClassTag[KD], implicit evidence$22: scala.reflect.ClassTag[VD])org.apache.spark.streaming.dstream.InputDStream[(K, V)]
val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicSet)

错误包:import io.netty.handler.codec.string.StringDecoder

正确包:import kafka.serializer.StringDecoder

 

5.查询某个任务的日志

yarn logs -applicationId xxxxxxxxxxxxxxxx

posted @ 2018-04-09 16:13  南风叶  阅读(2735)  评论(2编辑  收藏  举报