Hive安装与配置

1 Hive简介

2 Hive下载与安装

2.1 下载

下载地址：http://www.apache.org/dyn/closer.cgi/hive/

笔者下载示例版本：hive-3.1.2

下载示例：wget 'https://dlcdn.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz' --no-check-certificate

2.2 安装

参考1：

2.2.1 确定Hive安装路径

安装路径：/home/Hadoop/Hive/

2.2.2 Hive解压

Step1：cp apache-hive-3.1.2-bin.tar.gz /home/Hadoop/Hive/

Step2：tar -zxvf apache-hive-3.1.2-bin.tar.gz

Step3：mv apache-hive-3.1.2-bin.tar.gz hive-3.1.2

2.2.3 创建Hive的系统环境变量

创建hive的系统环境变量可以在多个位置创建，笔者选择在~/.bash_profile 中创建。配置完了，使用命令：source ~/.bash_profile

须创建的环境变量如下：

export HIVE_HOME=/home/grid/Hive/hive-3.1.2
export PATH=$PATH:$HIVE_HOME/bin
export hive_dependency=$HIVE_HOME/conf:$HIVE_HOME/lib/*:$HIVE_HOME/hcatalog/share/hcatalog/hive-hcatalog-core-3.1.2.jar

2.2.4 MySQL创建Hive元数据库

（1）为Hive创建账户（非十分必要，采用root或其他账户也可以）

方式1：CREATE USER 'Hive'@'%' IDENTIFIED BY '******';

方式2：insert into mysql.user(Host,User,Password) values("%","Hive",password("******"));

上面例子：'Hive'为账户名；'%'为可登录账户的IP，表示任意IP都可以登录，可指定固定IP；'******'为设置账户密码。

（2）创建元数据库

create database hive;

（3）为账户赋予权限

Step1：grant all privileges on hive.* to Hive@'%' ;

Step2：flush privileges;

2.2.5 拷贝Hive连接MySQL的驱动程序

笔者的Hive连接MySQL的驱动程序：mysql-connector-java-8.0.25.jar

将驱动程序拷贝放入"${HIVE_HOME}/lib"下面，即“./hive-3.1.2/lib/”下面。

3 Hive环境及参数文件配置（重点）

Hive须配置参数的文件主要有：

hive-env.sh、hive-default.xml、hive-site.xml、hive-exec-log4j2.properties、hive-log4j2.properties

hive-env.sh：是hive软件本身需配置的环境参数文件。

hive-default.xml：是hive全部默认参数文件，不需要修改。

hive-site.xml：是hive自定义配置参数文件，多数参数可由hive-default.xml生成，只需修改参数值即可，但有额外配置的参数情况。

hive-exec-log4j2.properties：是hive执行程序的日志文件

hive-log4j2.properties：是hive软件本身的日志文件

3.1 修改文件名

hive默认的参数文件路径位于其安装路径"${HIVE_HOME}/conf"下面，即“./hive-3.1.2/conf/”下面。

默认的参数文件，以“.template”结尾。

Step1：直接修改文件名

修改文件名如下：

[grid@master conf]$ ls
beeline-log4j2.properties.template ivysettings.xml
hive-default.xml.template llap-cli-log4j2.properties.template
hive-env.sh.template llap-daemon-log4j2.properties.template
hive-exec-log4j2.properties.template parquet-logging.properties
hive-log4j2.properties.template
[grid@master conf]$ mv hive-env.sh.template hive-env.sh
[grid@master conf]$ mv hive-default.xml.template hive-default.xml
[grid@master conf]$ mv hive-exec-log4j2.properties.template hive-exec-log4j2.properties
[grid@master conf]$ mv hive-log4j2.properties.template hive-log4j2.properties
[grid@master conf]$ ls
beeline-log4j2.properties.template ivysettings.xml
hive-default.xml llap-cli-log4j2.properties.template
hive-env.sh llap-daemon-log4j2.properties.template
hive-exec-log4j2.properties parquet-logging.properties
hive-log4j2.properties

Step2：新建文件（非常重要）：hive-site.xml

新建文件命令： touch hive-site.xml

在hive-site.xml文件中添加如下内容：

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

</configuration>

3.2 配置参数文件：hive-env.sh

找到如下内容，按蓝色文字内容添加，修改相应的路径

示例如下：

# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
export HADOOP_HOME=/home/Hadoop/Hadoop/hadoop-3.3.1
# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=
export HIVE_CONF_DIR=/home/Hadoop/Hive/hive-3.1.2/conf
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=
export HIVE_AUX_JARS_PATH=/home/Hadoop/Hive/hive-3.1.2/lib

3.3 配置参数文件：hive-site.xml

该文件因不同需要配置的参数较多，一般都是为某个特定功能进行一系列的参数值配置。

3.3.1配置基本元数据库参数

（1）参数说明：

**表3.3.1-1 Hive元数据库配置参数列表**
参数	默认值	配置值	参数说明
javax.jdo.option.ConnectionURL	jdbc:derby:;databaseName=metastore_db;create=true	jdbc:mysql://master/hive?createDatabaseIfNotExist=true	该参数用于连接的元数据库，默认是derby。该参数值宜配置为MySQL中的元数据库。其中红色字体“hive”为MySQL中专Hive创建的元数据库，专用于存放hive的元数据。
javax.jdo.option.ConnectionDriverName	org.apache.derby.jdbc.EmbeddedDriver	com.mysql.cj.jdbc.Driver	该参数用于配置Hive连接MySQL的Java驱动程序。该参数须同连接MySQL的Java版本匹配。笔者的Java驱动版本为： mysql-connector-java-8.0.25.jar
javax.jdo.option.ConnectionUserName	APP	Hive	该参数用于设置连接MySQL上的数据库名，笔者使用的是Hive。
javax.jdo.option.ConnectionPassword	mine	******	该参数设置连接MySQL数据库的密码。

（2）参数示例：

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>Hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>******</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
<description>Whether to print the names of the columns in query output.</description>
</property>
<property>
<name>hive.resultset.use.unique.column.names</name>
<value>false</value>
<description>
Make column names unique in the result set by qualifying column names with table alias if needed.
Table alias will be added to column names for queries of type "select *" or
if query explicitly uses table alias "select r1.x..".
</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>Whether to include the current database in the Hive prompt.</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/tmp</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp/hive/Log</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hive/Local</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/hive/Resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>

3.3.2 配置Hive输出显示参数（可采用默认值，但建议重新配置）

（1）参数说明：

**表3.3.1-1 Hive输出显示配置参数列表**
参数	默认值	配置值	参数说明
hive.cli.print.header	false	true	该参数用于配置在hive cli中是否显示数据表的列名。设置true，即显示列名。
hive.resultset.use.unique.column.names	true	false	该参数用于配置在hive cli中所显示数据表的列名中是否带有数据库名。设置true，则列名中带有数据库名。
hive.cli.print.current.db	false	true	该参数用于配置在hive cli中是否带有数据库名。设置为true，则命令行带有数据库名，起提示作用。

（2）参数示例：

<property>
<name>hive.cli.print.header</name>
<value>true</value>
<description>Whether to print the names of the columns in query output.</description>
</property>
<property>
<name>hive.resultset.use.unique.column.names</name>
<value>false</value>
<description>
Make column names unique in the result set by qualifying column names with table alias if needed.
Table alias will be added to column names for queries of type "select *" or
if query explicitly uses table alias "select r1.x..".
</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>Whether to include the current database in the Hive prompt.</description>
</property>

3.3.3 配置Hive在HDFS上目录（关键）

（1）参数说明：

**表3.3.1-1 Hive输出显示配置参数列表**
参数	默认值	配置值	参数说明
hive.metastore.warehouse.dir	/user/hive/warehouse	/user/hive/warehouse	该参数用于配置HDFS数据仓库存放的位置。所有的数据表都存放在此处。可采用默认值。
hive.exec.scratchdir	/tmp/hive	/user/hive/tmp	该参数是hive用来存储不同阶段的map/reduce的执行计划的目录，同时也存储中间输出结果。
hive.querylog.location	${system:java.io.tmpdir}/${system:user.name}	/tmp/hive/Log	用于配置Hive运行时结构化日志文件的存放位置
hive.exec.local.scratchdir	${system:java.io.tmpdir}/${system:user.name}	/tmp/hive/Local	该参数用于配置Hive作业的本地暂存空间。
hive.downloaded.resources.dir	${system:java.io.tmpdir}/${hive.session.id}_resources	/tmp/hive/Resources	用于在远程文件系统中添加资源的临时本地目录。

（2）参数示例：

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/tmp</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp/hive/Log</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hive/Local</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/hive/Resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>

（3）在HDFS上创建Hive相关目录：

在Bash环境命令行中键入命令：

hadoop fs -mkdir /tmp

hadoop fs -mkdir /user/hive/warehouse

hadoop fs -chmod g+w /tmp

hadoop fs -chmod g+w /user/hive/warehouse

3.3.4 配置hiveServer和hive2Server

（1）参数说明：

（2）参数示例：

（3）参数

3.4 Hive初始化

Hive初始化是Hive在MySQL数据库中创建元数据表的过程。

键入命令：schematool -initSchema -dbType mysql

3.5 Hive启动服务

Hive启动的服务有：hiveServer、hive2Server

启用服务命令1：hive --service metastore

启用服务命令2：hive --service hiveserver2

命令1，是hive正常使用的基础，必须启动。

命令2，是hive提供hiveserver2服务时，须启动的。

3.5参数优化

可参考笔者如下页面：

https://www.cnblogs.com/LankeHome/articles/15531753.html

posted on 2021-11-17 01:09 LankeHome 阅读(482) 评论(0) 收藏举报