数仓项目常见问题汇总

数仓项目常见问题汇总

Hive常见错误总结

一、启动hive,执行sql插入数据时出现 error 30041

Error while processing statement: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.

日志文件:提示找不见main方法

原因:spark环境变量没有配

解决办法:配置spark的环境变量

二、启动hive,执行sql插入数据时提示类和方法找不见

 

Job failed with java.lang.ClassNotFoundException: org.apache.spark.AccumulatorParam

FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed during runtime.

日志文件: status.SparkJobMonitor: Job failed with java.lang.ClassNotFoundException: org.apache.spark.AccumulatorParam

原因:hive和spark的兼容性问题

解决方案:

hive 安装目录的lib下查看spark-core的版本

lib删除,将符合版本的hive安装包中的lib文件添加进来,这样不影响元数据库(记得在lib中添加mysql驱动)

三、HDFS上的文件名的定义(默认为时间戳)出错

原因:flume中的时间拦截器Maxwell输出的数据中的ts字段时间戳单位为秒,Flume HDFSSink要求单位为毫秒

解决办法:将通过Maxwell输出的数据的时间拦截器中的时间的值转换为毫秒

四、启动Hiveserver2客户端多次执行sql之后提示堆内存满了

 

 

 

启动hiveserver2,通过jps -ml 查看hiveserver2的进程号,使用jinfo -flag MaxHeapSize +hiveserver2的进程号,查看MaxHeapSize的值是否为默认值(256M)

原因:Hive默认的MaxHeapSize:XX:MaxHeapSize=268435456(256M),比较小

解决办法:

修改hive-env.xml文件中的 HADOOP_HEAPSIZE值(根据自己的内存增大值)为:

 

#The heap size of the jvm stared by hive shell script can be controlled via:

 export HADOOP_HEAPSIZE=2048

五、hiveserver2启动慢

原因:缺少tez导致重复连接60s

 

 

 

解决方案:找到hive-default.xml.template文件中的hive.server2.sleep.interval.between.start.attempts配置,修改时间为2秒并复制到hive-site.xml文件中

 

<property>

    <name>hive.server2.sleep.interval.between.start.attempts</name>

    <value>60s</value>

    <description>

      Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is

msec if not specified.      The time should be in between 0 msec (inclusive) and 9223372036854775807 msec (inclusive).

      Amount of time to sleep between HiveServer2 start attempts. Primarily meant for tests

    </description>

  </property>

六、向Hive数据库导入数据时,出现FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

 

 

 

原因:动态分区时,hive默认是严格模式的,不允许动态分区的数据插入

解决方案:

方案一:在hive中执行:set hive.exec.dynamic.partition.mode=nonstrict,只对本次会话有效

方案二:在hive的安装目录conf下的hive-site.xml中加入:

<property> <name>hive.exec.dynamic.partition.mode</name> <value>nonstrict</value></property>

六、navicat执行sql出现异常

[Err] 1055 - Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

 

原因:

MySQL5.7之后,sql_mode中默认存在ONLY_FULL_GROUP_BY,SQL语句未通过ONLY_FULL_GROUP_BY语义检查所以报错。

解决方案:

 

posted @ 2022-08-28 11:28  新界王  阅读(1264)  评论(0)    收藏  举报