大叔问题定位分享(46)spark2.4升级3.1后执行spark-sql报错

背景

2个节点分别升级spark版本,从2.4升级到3.1,升级后一个节点执行spark-sql正常,另一个节点执行报错,报错信息如下:

spark-sql> select * from $table where dt = '$dt' limit 5;
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table $table. Invalid method name: 'get_table_req'
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table $table. Invalid method name: 'get_table_req'
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
        at org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:854)
        at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:462)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:197)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:488)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:474)
        at org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:65)
        at org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:282)
...
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table $table. Invalid method name: 'get_table_req'
        at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1282)
        at org.apache.spark.sql.hive.client.HiveClientImpl.getRawTableOption(HiveClientImpl.scala:392)
        at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$tableExists$1(HiveClientImpl.scala:406)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:291)
        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
        at org.apache.spark.sql.hive.client.HiveClientImpl.tableExists(HiveClientImpl.scala:406)
        at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$tableExists$1(HiveExternalCatalog.scala:854)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
        ... 126 more
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1567)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1554)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1350)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:127)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
        at com.sun.proxy.$Proxy31.getTable(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336)
        at com.sun.proxy.$Proxy31.getTable(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1274)
        ... 137 more

定位

以上报错说明spark连接hive metastore时存在版本兼容问题。
在两个节点上分别使用spark-sql执行同样的sql,对比两个spark应用后发现,有个配置不一样

正常的节点有一行配置如下:

spark.sql.hive.metastore.jars=/opt/cloudera/parcels/CDH/lib/hive/lib/*

在异常节点上修改spark-default.conf后,测试正常

原因

spark有一个buildin的hive,3.1.1中对应的是2.3.6,如果想连接其他版本的hive metastore,不需要重新打包,只需要修改变量配置

官方文档参考

posted @ 2022-05-19 16:51  匠人先生  阅读(650)  评论(0编辑  收藏  举报