spark pyspark和spark-sql报错
我用的是spark2.3.0版本
错误1:pyspark使用报错
root@hadoop102:/opt/module/spark/bin# pyspark Python 3.8.10 (default, Feb 4 2025, 15:02:54) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. Traceback (most recent call last): File "/opt/module/spark/python/pyspark/shell.py", line 31, in <module> from pyspark import SparkConf File "/opt/module/spark/python/pyspark/__init__.py", line 46, in <module> from pyspark.context import SparkContext File "/opt/module/spark/python/pyspark/context.py", line 31, in <module> from pyspark import accumulators File "/opt/module/spark/python/pyspark/accumulators.py", line 97, in <module> from pyspark.cloudpickle import CloudPickler File "/opt/module/spark/python/pyspark/cloudpickle.py", line 146, in <module> _cell_set_template_code = _make_cell_set_template_code() File "/opt/module/spark/python/pyspark/cloudpickle.py", line 127, in _make_cell_set_template_code return types.CodeType( TypeError: an integer is required (got type bytes)
解决办法:降低pyspark使用的python版本。可以更改系统默认python版本,也可以在spark-env.sh中添加如
export PYSPARK_PYTHON=/usr/bin/python2.7
进行更改。
错误2:使用spark-sql报错
root@hadoop102:/opt/module/spark/bin# spark-sql 2025-03-05 20:05:37 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2025-03-05 20:05:37 INFO SecurityManager:54 - Changing view acls to: root 2025-03-05 20:05:37 INFO SecurityManager:54 - Changing modify acls to: root 2025-03-05 20:05:37 INFO SecurityManager:54 - Changing view acls groups to: 2025-03-05 20:05:37 INFO SecurityManager:54 - Changing modify acls groups to: 2025-03-05 20:05:37 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() java.lang.ClassNotFoundException: org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:235) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:836) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver. You need to build Spark with -Phive and -Phive-thriftserver. 2025-03-05 20:05:37 INFO ShutdownHookManager:54 - Shutdown hook called 2025-03-05 20:05:37 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-e7b42c80-ccd8-4337-8542-41495c392ec5
解决办法:下载对应spark源码
浙公网安备 33010602011771号