数仓项目常见问题汇总

Hive常见错误总结

一、启动hive，执行sql插入数据时出现 error 30041

Error while processing statement: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.

日志文件：提示找不见main方法

原因：spark环境变量没有配

解决办法：配置spark的环境变量

二、启动hive，执行sql插入数据时提示类和方法找不见

Job failed with java.lang.ClassNotFoundException: org.apache.spark.AccumulatorParam

FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed during runtime.

日志文件： status.SparkJobMonitor: Job failed with java.lang.ClassNotFoundException: org.apache.spark.AccumulatorParam

原因：hive和spark的兼容性问题

解决方案：

在hive 安装目录的lib下查看spark-core的版本

将lib删除，将符合版本的hive安装包中的lib文件添加进来，这样不影响元数据库（记得在lib中添加mysql驱动）

三、HDFS上的文件名的定义（默认为时间戳）出错

原因：flume中的时间拦截器Maxwell输出的数据中的ts字段时间戳单位为秒，Flume HDFSSink要求单位为毫秒

解决办法：将通过Maxwell输出的数据的时间拦截器中的时间的值转换为毫秒

四、启动Hiveserver2客户端多次执行sql之后提示堆内存满了

启动hiveserver2,通过jps -ml 查看hiveserver2的进程号，使用jinfo -flag MaxHeapSize +hiveserver2的进程号，查看MaxHeapSize的值是否为默认值（256M）

原因：Hive默认的MaxHeapSize：XX:MaxHeapSize=268435456（256M），比较小

解决办法：

修改hive-env.xml文件中的 HADOOP_HEAPSIZE值（根据自己的内存增大值）为：

#The heap size of the jvm stared by hive shell script can be controlled via:

export HADOOP_HEAPSIZE=2048

五、hiveserver2启动慢

原因：缺少tez导致重复连接60s

解决方案：找到hive-default.xml.template文件中的hive.server2.sleep.interval.between.start.attempts配置，修改时间为2秒并复制到hive-site.xml文件中

<name>hive.server2.sleep.interval.between.start.attempts</name>

Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is

msec if not specified. The time should be in between 0 msec (inclusive) and 9223372036854775807 msec (inclusive).

Amount of time to sleep between HiveServer2 start attempts. Primarily meant for tests

</description>

</property>

六、向Hive数据库导入数据时，出现FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

原因：动态分区时，hive默认是严格模式的，不允许动态分区的数据插入

解决方案：

方案一：在hive中执行：set hive.exec.dynamic.partition.mode=nonstrict，只对本次会话有效

方案二：在hive的安装目录conf下的hive-site.xml中加入:

<property> <name>hive.exec.dynamic.partition.mode</name> <value>nonstrict</value></property>

六、navicat执行sql出现异常

[Err] 1055 - Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

原因：

在MySQL5.7之后，sql_mode中默认存在ONLY_FULL_GROUP_BY，SQL语句未通过ONLY_FULL_GROUP_BY语义检查所以报错。

解决方案：

posted @ 2022-08-28 11:28 新界王阅读(1331) 评论(0) 收藏举报

刷新页面返回顶部

新界王

数仓项目常见问题汇总

数仓项目常见问题汇总

公告