Spark提交命令和参数调优

转载原文链接：https://blog.csdn.net/Q1059081877Q/article/details/106428301

1.num-executors 线程数：一般设置在50-100之间，必须设置，不然默认启动的executor非常少，不能充分利用集群资源，运行速度慢
2.executor-memory 线程内存：参考值4g-8g,num-executor乘以executor-memory不能超过队列最大内存，申请的资源最好不要超过最大内存的1/3-1/2
3.executor-cores 线程CPU core数量：core越多，task线程就能快速的分配，参考值2-4，num-executor*executor-cores的1/3-1/2

1.spark-submit spark提交
2.--queue spark 在spark队列
3.--master yarn 在yarn节点提交
4.--deploy-mode client 选择client模型，还是cluster模式；在同一个节点用client,在不同的节点用cluster
5.--executor-memory=4G 线程内存：参考值4g-8g,num-executor乘以executor-memory不能超过队列最大内存，申请的资源最好不要超过最大内存的1/3-1/2
6.--conf spark.dynamicAllocation.enabled=true 是否启动动态资源分配
7.--executor-cores 2 线程CPU core数量：core越多，task线程就能快速的分配，参考值2-4，num-executor*executor-cores的1/3-1/2
8.--conf spark.dynamicAllocation.minExecutors=4 执行器最少数量
9.--conf spark.dynamicAllocation.maxExecutors=10 执行器最大数量
10.--conf spark.dynamicAllocation.initialExecutors=4 若动态分配为true,执行器的初始数量
11.--conf spark.executor.memoryOverhead=2g 堆外内存：处理大数据的时候，这里都会出现问题，导致spark作业反复崩溃，无法运行；此时就去调节这个参数，到至少1G（1024M），甚至说2G、4G）
12.--conf spark.speculation=true 推测执行：在接入kafaka的时候不能使用，需要考虑情景
13.--conf spark.shuffle.service.enabled=true 提升shuffle计算性能

posted @ 2022-08-12 09:34 每天都要进步啊阅读(687) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

大宝丽呀

子夏曰：“日知其所亡，月无忘其所能，可谓好学也已矣。”

Spark提交命令和参数调优

公告