Hive笔记

Hive学习笔记

本文介绍了Hive的基本概念、使用方法以及和其他大数据产品的整合。

1. Hive简介

Hive是一个建立在Hadoop上的数据仓库工具,它可以将结构化数据映射为一张数据库表,并提供了类似SQL的查询语言HQL(Hive Query Language)。Hive通过将HQL转换为MapReduce程序来执行查询操作。Hive还支持自定义函数和插件,可以方便地扩展其功能。

2. Hive的安装和部署

2.1 单机部署

2.1.1 安装Hadoop

首先需要安装Hadoop,可以使用官方提供的二进制包或自行编译安装。

2.1.2 下载和解压Hive

从官方网站上下载Hive的压缩包并解压。

2.1.3 配置环境变量

在.bashrc文件中添加以下配置:

export HADOOP_HOME=/path/to/hadoop
export HIVE_HOME=/path/to/hive
export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin

2.1.4 启动Hive

在终端中执行以下命令启动Hive:

hive

2.2 集群部署

2.2.1 安装Hadoop集群

在Hadoop集群上安装Hive,可以将Hive安装在每个节点上,也可以将Hive安装在一个节点上并通过远程访问方式连接到其他节点上的Hadoop集群。

2.2.2 下载和解压Hive

从官方网站上下载Hive的压缩包并解压。

2.2.3 配置环境变量

在.bashrc文件中添加以下配置:

export HADOOP_HOME=/path/to/hadoop
export HIVE_HOME=/path/to/hive
export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin

2.2.4 启动Hive

在终端中执行以下命令启动Hive:

hive

3. Hive的常用命令

3.1 Shell命令

3.1.1 创建数据库

CREATE DATABASE database_name;

3.1.2 删除数据库

DROP DATABASE database_name;

3.1.3 创建表

CREATE TABLE table_name (column_name data_type, column_name data_type, ...)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';

3.1.4 删除表

DROP TABLE table_name;

3.1.5 加载数据

LOAD DATA INPATH 'path/to/data' INTO TABLE table_name;

3.1.6 查询数据

SELECT column_name FROM table_name;

3.2 Java操作

3.2.1 创建数据库

import java.sql.*;

public class HiveCreateDatabase {
   private static String driverName = "org.apache.hive.jdbc.HiveDriver";

   public static void main(String[] args) throws SQLException {
      // Register driver and create connection
      Class.forName(driverName);
      Connection connection = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "", "");

      // Create database
      Statement statement = connection.createStatement();
      String query = "CREATE DATABASE IF NOT EXISTS database_name";
      statement.executeUpdate(query);

      // Close connection and statement
      statement.close();
      connection.close();
   }
}

3.2.2 创建表

import java.sql.*;

public class HiveCreateTable {
   private static String driverName = "org.apache.hive.jdbc.HiveDriver";

   public static void main(String[] args) throws SQLException {
      // Register driver and create connection
      Class.forName(driverName);
      Connection connection = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "", "");

      // Create table
      Statement statement = connection.createStatement();
      String query = "CREATE TABLE IF NOT EXISTS table_name (column_name data_type, column_name data_type, ...)";
      statement.executeUpdate(query);

      // Close connection and statement
      statement.close();
      connection.close();
   }
}

3.2.1 创建和配置Hive连接

import java.sql.*;

public class HiveJdbcClient {
    private static String driverName = "org.apache.hive.jdbc.HiveDriver";

    public static void main(String[] args) {
        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            System.exit(1);
        }

        Connection con = null;
        Statement stmt = null;
        ResultSet rs = null;

        try {
            con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "", "");
            stmt = con.createStatement();
            String tableName = "test";
            stmt.execute("CREATE TABLE IF NOT EXISTS " + tableName + " (key INT, value STRING)");
            rs = stmt.executeQuery("SELECT * FROM " + tableName);

            while (rs.next()) {
                System.out.println(rs.getInt(1) + " " + rs.getString(2));
            }
        } catch (SQLException e) {
            e.printStackTrace();
        } finally {
            try {
                if (rs != null) {
                    rs.close();
                }
                if (stmt != null) {
                    stmt.close();
                }
                if (con != null) {
                    con.close();
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    }
}

3.2.2 执行HQL查询

import java.sql.*;

public class HiveJdbcClient {
    private static String driverName = "org.apache.hive.jdbc.HiveDriver";

    public static void main(String[] args) {
        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            System.exit(1);
        }

        Connection con = null;
        Statement stmt = null;
        ResultSet rs = null;

        try {
            con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "", "");
            stmt = con.createStatement();
            String tableName = "test";
            stmt.execute("CREATE TABLE IF NOT EXISTS " + tableName + " (key INT, value STRING)");
            stmt.execute("LOAD DATA LOCAL INPATH '/path/to/data' INTO TABLE " + tableName);
            rs = stmt.executeQuery("SELECT * FROM " + tableName);

            while (rs.next()) {
                System.out.println(rs.getInt(1) + " " + rs.getString(2));
            }
        } catch (SQLException e) {
            e.printStackTrace();
        } finally {
            try {
                if (rs != null) {
                    rs.close();
                }
                if (stmt != null) {
                    stmt.close();
                }
                if (con != null) {
                    con.close();
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    }
}

4. Hive的整合

4.1 HBase和Hive整合

HBase是一个分布式的、面向列的NoSQL数据库,可以将HBase与Hive整合来实现实时查询和批量分析。具体整合省略。

4.2 Spark和Hive整合

Spark是一个快速的、通用的分布式计算系统,可以将Spark与Hive整合来实现高效的数据分析和处理。

5. 安装脚本和Dockerfile文件

5.1 安装脚本

可以使用以下脚本自动安装Hive:

#!/bin/bash

# 安装Java
sudo apt-get update
sudo apt-get install -y openjdk-8-jdk

# 下载Hadoop
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
tar zxvf hadoop-3.2.2.tar.gz

# 配置Hadoop
echo "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" >> ~/hadoop-3.2.2/etc/hadoop/hadoop-env.sh
echo "export HADOOP_HOME=~/hadoop-3.2.2" >> ~/.bashrc
echo "export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin" >> ~/.bashrc
source ~/.bashrc

# 下载Hive
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
tar zxvf apache-hive-3.1.2-bin.tar.gz

# 配置Hive
echo "export HIVE_HOME=~/apache-hive-3.1.2-bin" >> ~/.bashrc
echo "export PATH=$PATH:$HIVE_HOME/bin" >> ~/.bashrc
source ~/.bashrc

# 启动Hadoop
cd ~/hadoop-3.2.2
bin/hdfs namenode -format
sbin/start-dfs.sh

# 启动Hive Metastore
cd ~/apache-hive-3.1.2-bin
bin/schematool -initSchema -dbType derby
nohup bin/hive --service metastore > metastore.log 2>&1 &

5.2 Dockerfile文件

也可以使用Dockerfile来构建Hive镜像:

FROM openjdk:8-jdk

# 设置时区为上海
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

# 设置apt-get源为阿里云镜像
RUN sed -i 's/deb.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list
RUN sed -i 's/security.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list

# 安装必要的软件包
RUN apt-get update && \
    apt-get install -y wget net-tools procps && \
    rm -rf /var/lib/apt/lists/*

# 下载Hadoop
RUN wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz && \
    tar zxvf hadoop-3.3.0.tar.gz && \
    rm hadoop-3.3.0.tar.gz

# 配置Hadoop
ENV JAVA_HOME /usr/local/openjdk-8
ENV HADOOP_HOME /hadoop-3.3.0
ENV PATH $PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

RUN echo "export JAVA_HOME=/usr/local/openjdk-8" >> $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
    echo "export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin" >> /root/.bashrc

# 下载Hive
RUN wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz && \
    tar zxvf apache-hive-3.1.2-bin.tar.gz && \
    rm apache-hive-3.1.2-bin.tar.gz

# 配置Hive
ENV HIVE_HOME /apache-hive-3.1.2-bin
ENV PATH $PATH:$HIVE_HOME/bin

RUN echo "export HIVE_HOME=/apache-hive-3.1.2-bin" >> /root/.bashrc && \
    echo "export PATH=$PATH:$HIVE_HOME/bin" >> /root/.bashrc
posted @ 2022-01-12 22:49  runrab  阅读(107)  评论(0)    收藏  举报