容器镜像打包-hive3任意子版本

代码仓库

https://gitee.com/ysxz2025/hive3-docker.build

说明

构建hive3任意子版本容器镜像。hive4有可直接使用的官方镜像,若定制开发,应参考hive官方文档,在美国网络环境打包代码和容器镜像。
官方文档的方法适用于美国网络环境构建容器镜像。本文档适用于中国地区网络。
容器镜像加速配置参考网站1ms.run

参考文档

hive官方文档

资源说明

entrypoint.sh是hive官方提供的标准内置文件,不要修改。
software存放各软件包。
conf存放hive官方标准配置文件,不要修改,若docker环境部署,应单独挂载配置文件到容器中,若k8s环境部署,应在configmap中做配置。
Dockerfile.hub存放可用的dockerfile文件,请复制到当前目录使用。
build.sh是hive4官方脚本,仅构建hive4使用。

镜像构建思路

  1. 选择hive版本。将所需软件存入software目录。
  2. 构建容器镜像。
  3. 测试容器镜像。
  4. 上传容器镜像。测试镜像确认可用,上传至镜像仓库正式使用。

准备软件

详见目录software中说明文件。必须准备软件。

构建hive镜像

hive:dev1可替换为自定义镜像名字。

docker build -t hive:dev1 .

若构建镜像次数过多且频繁测试异常,可使用以下命令,跳过缓存,全新构建,

docker build --no-cache -t hive:dev1 .

快速测试镜像

快速启动metastore

docker run -d -p 9083:9083 --env SERVICE_NAME=metastore --name metastore-standalone hive:dev1

快速启动hive server2

docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hiveserver2 hive:dev1

查询容器状态和日志,状态需UP,日志需无erro。

docker ps -a
docker logs metastore-standalone | grep erro
docker logs hiveserver2 | grep erro

k8s环境使用

可能会用到以下配置,

securityContext:
          runAsUser: 1000

目录结构

.
├── conf
│   ├── hive-log4j2.properties
│   ├── hive-site.xml
│   └── README.md
├── Dockerfile
├── entrypoint.sh
├── README.md
└── software
    ├── apache-hive-3.1.2-bin.tar.gz
    ├── apache-tez-0.9.2-bin.tar.gz
    ├── hadoop-3.1.1.tar.gz
    ├── iceberg-hive-runtime-1.4.2.jar
    ├── mysql-connector-j-8.0.33.jar
    └── README.md

Dockerfile内容如下

# 基础镜像:Ubuntu 22.04 LTS
FROM ubuntu:22.04

# 禁用交互式apt提示,避免构建时弹窗
ENV DEBIAN_FRONTEND=noninteractive

# 核心环境变量配置(固定路径,无版本硬编码)
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
ENV HADOOP_HOME=/opt/hadoop
ENV HIVE_HOME=/opt/hive
ENV TEZ_HOME=/opt/tez
ENV HIVE_LOG_DIR=/var/log/hive
# 系统PATH整合
ENV PATH="${JAVA_HOME}/bin:${HIVE_HOME}/bin:${HADOOP_HOME}/bin:${TEZ_HOME}/bin:${PATH}"

# 第一步:安装基础依赖(修复envsubst依赖+语法错误)
# 说明:envsubst属于gettext-base包,Ubuntu 22.04无独立envsubst包
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        openjdk-8-jdk \
        sudo \
        findutils \
        gettext-base \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# 第二步:复制software目录(所有压缩包/jar包统一存放)
COPY software/ /tmp/software/

# 第三步:解压软件包 + 核心问题修复(Guava/SLF4J)
# 关键优化:处理set -eux下find命令的非0退出,扩展Guava删除范围
RUN set -eux; \
    # ===================== 解压Hadoop(必选,无包则构建失败)=====================
    HADOOP_TAR=$(find /tmp/software -name "hadoop-*.tar.gz" -type f | head -n1); \
    if [ -z "$HADOOP_TAR" ]; then echo "ERROR: 未找到Hadoop安装包"; exit 1; fi; \
    tar -xzf "$HADOOP_TAR" -C /opt/ && \
    mv /opt/hadoop-* /opt/hadoop && \
    rm -rf /opt/hadoop/share/doc/*; \
    \
    # ===================== 解压Hive(必选,无包则构建失败)=====================
    HIVE_TAR=$(find /tmp/software -name "apache-hive-*-bin.tar.gz" -type f | head -n1); \
    if [ -z "$HIVE_TAR" ]; then echo "ERROR: 未找到Hive安装包"; exit 1; fi; \
    tar -xzf "$HIVE_TAR" -C /opt/ && \
    mv /opt/apache-hive-*-bin /opt/hive; \
    # 先备份Hive原生的高版本Guava(避免被Tez的低版本覆盖)
    HIVE_ORIG_GUAVA=$(find /opt/hive/lib -name "guava-*.jar" | head -n1); \
    if [ -n "$HIVE_ORIG_GUAVA" ]; then \
        cp -f "$HIVE_ORIG_GUAVA" /tmp/hive_guava_backup.jar; \
    fi; \
    \
    # ===================== 解压Tez(可选,无包则跳过)=====================
    TEZ_TAR=$(find /tmp/software -name "apache-tez-*-bin.tar.gz" -type f | head -n1); \
    if [ -n "$TEZ_TAR" ]; then \
        tar -xzf "$TEZ_TAR" -C /opt/ && \
        mv /opt/apache-tez-*-bin /opt/tez && \
        rm -rf /opt/tez/share/*; \
        # 复制Tez依赖到Hive lib(会带入低版本Guava,后续删除)
        cp $TEZ_HOME/*.jar $HIVE_HOME/lib/; \
        cp $TEZ_HOME/lib/*.jar $HIVE_HOME/lib/; \
    else \
        echo "INFO: 未找到Tez安装包,跳过Tez相关处理"; \
    fi; \
    \
    # ===================== 复制software下所有jar包到Hive lib =====================
    cp /tmp/software/*.jar $HIVE_HOME/lib/; \
    \
    # ===================== 增强版Guava版本冲突修复(核心修复)=====================
    # 1. 修复find命令:跳过不存在的目录,即使找不到也不退出(关键!)
    #    -ignore_readdir_race:忽略目录读取竞争
    #    || true:强制返回0,避免set -e终止脚本
    find /opt/hadoop /opt/tez -name "guava-*.jar" -delete -ignore_readdir_race || true; \
    # 2. 删除Hive lib里的所有Guava(包括Tez复制的低版本)
    find /opt/hive/lib -name "guava-*.jar" -delete -ignore_readdir_race || true; \
    # 3. 恢复Hive原生的高版本Guava
    if [ -f "/tmp/hive_guava_backup.jar" ]; then \
        cp -f /tmp/hive_guava_backup.jar /opt/hive/lib/; \
        HIVE_GUAVA="/opt/hive/lib/$(basename /tmp/hive_guava_backup.jar)"; \
        # 4. 复制高版本Guava到所有核心目录
        cp -f "$HIVE_GUAVA" /opt/hadoop/share/hadoop/common/lib/; \
        cp -f "$HIVE_GUAVA" /opt/hadoop/share/hadoop/hdfs/lib/; \
        cp -f "$HIVE_GUAVA" /opt/hadoop/share/hadoop/mapreduce/lib/; \
        cp -f "$HIVE_GUAVA" /opt/hadoop/share/hadoop/yarn/lib/; \
        # 复制到Tez lib(若存在)
        [ -d "$TEZ_HOME" ] && cp -f "$HIVE_GUAVA" /opt/tez/lib/; \
        echo "✅ Guava版本统一完成:$(basename $HIVE_GUAVA)"; \
        # 删除临时备份
        rm -f /tmp/hive_guava_backup.jar; \
    else \
        echo "❌ 未找到Hive lib中的Guava包,构建失败!"; \
        exit 1; \
    fi; \
    \
    # ===================== 清理SLF4J多绑定警告(可选,美化日志)=====================
    rm -f /opt/hive/lib/slf4j-log4j12-*.jar || true; \
    rm -f /opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-*.jar || true; \
    \
    # ===================== 清理临时文件(减小镜像体积)=====================
    rm -rf /tmp/software /opt/*.tar.gz;

# 第四步:复制配置文件和入口脚本
COPY entrypoint.sh  /
COPY conf           ${HIVE_HOME}/conf

# 第五步:创建用户 + 权限修复(安全最佳实践)
RUN set -eux; \
    # 创建hive用户组(gid=1000)
    groupadd --gid 1000 hive; \
    # 创建hive用户(uid=1000,主组hive,家目录/opt/hive)
    useradd --uid 1000 --gid 1000 --home /opt/hive --create-home hive; \
    # 配置hive用户免密sudo(避免权限不足)
    echo "hive ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers; \
    # 赋予入口脚本执行权限
    chmod +x /entrypoint.sh; \
    # 创建Hive运行所需目录(避免启动时缺失)
    mkdir -p \
        /opt/hive/data/warehouse \
        /home/hive/.beeline \
        ${HIVE_LOG_DIR} \
        /data \
        /tmp/hive \
        /tmp/hive/_resultscache_; \
    # 统一设置目录权限(hive用户可读写)
    chown -R hive:hive \
        /opt/hadoop \
        /opt/hive \
        /opt/tez \
        ${HIVE_LOG_DIR} \
        /data \
        /home/hive \
        /tmp/hive \
        /opt; \
    # 设置目录读写权限
    chmod -R 755 \
        /opt/hadoop \
        /opt/hive \
        /opt/tez \
        ${HIVE_LOG_DIR} \
        /data \
        /home/hive \
        /tmp/hive;

# 恢复DEBIAN_FRONTEND默认值
ENV DEBIAN_FRONTEND=dialog

# 切换到hive用户(非root运行,符合安全规范)
USER hive
WORKDIR /opt/hive

# 暴露核心端口:HS2(10000)、WebUI(10002)、Metastore(9083)
EXPOSE 10000 10002 9083

# 入口脚本(使用exec保证信号传递,支持优雅停止)
ENTRYPOINT ["/entrypoint.sh"]

entrypoint.sh内容如下,

#!/bin/bash

set -x

# =========================================================================
# DYNAMIC JAR LOADER (AWS/S3 Support)
# =========================================================================
STAGING_DIR="/tmp/ext-jars"

# Checks if /tmp/ext-jars is mounted (via Docker volume).
if [ -d "$STAGING_DIR" ]; then
  if ls "$STAGING_DIR"/*.jar 1> /dev/null 2>&1; then
    echo "--> Copying custom jars from volume to Hive..."
    cp -vf "$STAGING_DIR"/*.jar "${HIVE_HOME}/lib/"
  else
    echo "--> Volume mounted at $STAGING_DIR, but no jars found."
  fi
fi

# =========================================================================
# REPLACE ${VARS} in the template
# =========================================================================
: "${HIVE_WAREHOUSE_PATH:=/opt/hive/data/warehouse}"
export HIVE_WAREHOUSE_PATH

envsubst < $HIVE_HOME/conf/core-site.xml.template > $HIVE_HOME/conf/core-site.xml
envsubst < $HIVE_HOME/conf/hive-site.xml.template > $HIVE_HOME/conf/hive-site.xml
# =========================================================================

# 修复1:给所有关键变量设置默认值,避免空值
: "${DB_DRIVER:=derby}"
: "${HIVE_VER:=3.1.3}"  # 设置默认Hive版本,适配无HIVE_VER的场景
: "${VERBOSE:=false}"
: "${IS_RESUME:=false}"
: "${SCHEMA_COMMAND:=initSchema}"
: "${SERVICE_NAME:=metastore}"  # 默认启动metastore
: "${SERVICE_OPTS:=}"
: "${TEZ_HOME:=/opt/tez}"  # 兼容无Tez的场景

SKIP_SCHEMA_INIT="${IS_RESUME}"
VERBOSE_MODE=""
[[ $VERBOSE = "true" ]] && VERBOSE_MODE="--verbose"

function initialize_hive {
  # 修复2:修正不合法的initOrUpgradeSchema参数,适配Hive版本
  local COMMAND=""
  HIVE_MAJOR_VER=$(echo "$HIVE_VER" | cut -d '.' -f1)
  
  # 处理空值/非数字的HIVE_MAJOR_VER,避免整数比较错误
  if ! [[ "$HIVE_MAJOR_VER" =~ ^[0-9]+$ ]]; then
    echo "WARN: 无法识别Hive主版本,默认使用-initSchema参数"
    COMMAND="-${SCHEMA_COMMAND}"
  elif [ "$HIVE_MAJOR_VER" -lt "4" ]; then
    COMMAND="-${SCHEMA_COMMAND}"
  else
    # Hive 4.x 仍使用-initSchema(无initOrUpgradeSchema参数)
    COMMAND="-${SCHEMA_COMMAND}"
  fi

  # 执行schematool,增加错误处理
  if [[ -n "$VERBOSE_MODE" ]]; then
    "$HIVE_HOME/bin/schematool" -dbType "$DB_DRIVER" "$COMMAND" "$VERBOSE_MODE"
  else
    "$HIVE_HOME/bin/schematool" -dbType "$DB_DRIVER" "$COMMAND"
  fi
  
  if [ $? -eq 0 ]; then
    echo "Initialized Hive Metastore Server schema successfully.."
  else
    echo "Hive Metastore Server schema initialization failed!"
    exit 1
  fi
}

export HIVE_CONF_DIR=$HIVE_HOME/conf
if [ -d "${HIVE_CUSTOM_CONF_DIR:-}" ]; then
  find "${HIVE_CUSTOM_CONF_DIR}" -type f -exec \
    ln -sfn {} "${HIVE_CONF_DIR}"/ \;
  export HADOOP_CONF_DIR=$HIVE_CONF_DIR
  export TEZ_CONF_DIR=$HIVE_CONF_DIR
fi

export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx1G $SERVICE_OPTS"
if [[ "${SKIP_SCHEMA_INIT}" == "false" ]]; then
  # handles schema initialization
  initialize_hive
fi

# 修复3:兼容无Tez的场景 + 处理未知SERVICE_NAME
if [ "${SERVICE_NAME}" == "hiveserver2" ]; then
  # 仅当TEZ_HOME目录存在时,才添加Tez到CLASSPATH
  if [ -d "$TEZ_HOME" ]; then
    export HADOOP_CLASSPATH="$TEZ_HOME/*:$TEZ_HOME/lib/*:$HADOOP_CLASSPATH"
  fi
  exec "$HIVE_HOME/bin/hive" --skiphadoopversion --skiphbasecp --service "$SERVICE_NAME"
elif [ "${SERVICE_NAME}" == "metastore" ]; then
  export METASTORE_PORT=${METASTORE_PORT:-9083}
  if [[ -n "$VERBOSE_MODE" ]]; then
    exec "$HIVE_HOME/bin/hive" --skiphadoopversion --skiphbasecp "$VERBOSE_MODE" --service "$SERVICE_NAME"
  else
    exec "$HIVE_HOME/bin/hive" --skiphadoopversion --skiphbasecp --service "$SERVICE_NAME"
  fi
else
  # 修复4:处理未知SERVICE_NAME,给出提示并退出
  echo "ERROR: 不支持的SERVICE_NAME: ${SERVICE_NAME},仅支持hiveserver2/metastore"
  exit 1
fi

software目录说明

# 说明
下载所需软件到当前目录。应明确要使用的hive版本

# 查询软件版本兼容性
Hadoop与hive的兼容性查询:[Hive官方文档](https://hive.apache.org/general/downloads/)  
hive与tez版本兼容性:暂无  
tez与Hadoop的版本兼容性:[tez官方文档](https://tez.apache.org/install.html)

# 下载软件
除特殊说明,都应下载到当前目录。
## hive-必选
下载hive软件包存到当前目录,文件格式为`apache-hive-版本号-bin.tar.gz`。    
当前目录只能存放一个发行版软件包,若是定制开发,则必须打包文件格式为`apache-hive-版本号-bin.tar.gz`
Hive:[版本清单](https://archive.apache.org/dist/hive/)  

## Hadoop-必选
下载兼容性范围的Hadoop版本软件包到当前目录,文件格式为`hadoop-版本号.tar.gz`。  
当前目录只能存放一个发行版软件包,若是定制开发,则必须打包文件格式为`hadoop-版本号.tar.gz`  
Hadoop:[版本清单](https://archive.apache.org/dist/hadoop/common/)  

## tez-可选
下载兼容性范围内的tez软件包到当前目录,文件格式为`apache-tez-版本号-bin.tar.gz`。
tez:[版本清单1](https://tez.apache.org/releases/index.html),[版本清单2](https://dlcdn.apache.org/tez/)  

## iceberg-可选,推荐
下载兼容性范围内的tez软件包到当前目录,文件格式为`iceberg-hive-runtime-版本号.jar`。不使用不下载。  
> [Hive官网文档 ](https://iceberg.apache.org/docs/nightly/hive/#feature-support)   
[seatunnel官网文档](https://seatunnel.apache.org/zh-CN/docs/2.3.12/connector-v2/sink/Iceberg/)  
打包低于Hive4的版本时需要下载`iecberg-hive-runtime`。`libfb303`是可能需要的文件,默认不需要。  

iceberg下载地址:[版本清单](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-hive-runtime/)  
例如:`https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-hive-runtime/1.4.2/iceberg-hive-runtime-1.4.2.jar`  
libfb303下载地址:[版本清单](https://repo1.maven.org/maven2/org/apache/thrift/libfb303/) ,[备用下载地址](https://mvnrepository.com/artifact/org.apache.thrift/libfb303)  
例如:`https://repo1.maven.org/maven2/org/apache/thrift/libfb303/0.9.3/libfb303-0.9.3.jar`
## JDBC-可选,推荐
根据项目需求选择。用于hive连接数据库。文件格式大多是`connector.jar`
### MySQL
[版本清单](https://downloads.mysql.com/archives/c-j/)  
选择版本时,`Operating System`选择`Platform Independent`,`Product Version`选择期望使用的版本,推荐最新版本。将下载后的文件解压,仅取出名字格式为`mysql-connector-j-*.jar`的文件存放在本目录。
### Oracle
[版本清单](https://www.oracle.com/database/technologies/appdev/jdbc-downloads.html)  
下载存放在本目录。
### PostgreSQL
[版本清单](https://jdbc.postgresql.org/download/)  
下载存放在本目录。

conf/hive-site.xml内容如下

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<configuration>
    <property>
        <name>hive.server2.enable.doAs</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.tez.exec.inplace.progress</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.tez.exec.print.summary</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.exec.scratchdir</name>
        <value>/opt/hive/scratch_dir</value>
    </property>
    <property>
        <name>hive.user.install.directory</name>
        <value>/opt/hive/install_dir</value>
    </property>
    <property>
        <name>tez.runtime.optimize.local.fetch</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.exec.submit.local.task.via.child</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.compactor.worker.threads</name>
        <value>1</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>local</value>
    </property>
    <property>
        <name>tez.local.mode</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/opt/hive/data/warehouse</value>
    </property>
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
</configuration>

conf/hive-log4j2.properties内容如下

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name = HiveLog4j2

# list of properties
property.hive.log.level = INFO
property.hive.root.logger = stdout
property.hive.perflogger.log.level = INFO

# console appender
appender.console.name = stdout
appender.console.type = Console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n

logger.NIOServerCnxn.name = org.apache.zookeeper.server.NIOServerCnxn
logger.NIOServerCnxn.level = WARN

logger.ClientCnxnSocketNIO.name = org.apache.zookeeper.ClientCnxnSocketNIO
logger.ClientCnxnSocketNIO.level = WARN

logger.DataNucleus.name = DataNucleus
logger.DataNucleus.level = ERROR

logger.Datastore.name = Datastore
logger.Datastore.level = ERROR

logger.JPOX.name = JPOX
logger.JPOX.level = ERROR

logger.AmazonAws.name=com.amazonaws
logger.AmazonAws.level = INFO

logger.ApacheHttp.name=org.apache.http
logger.ApacheHttp.level = INFO

logger.PerfLogger.name = org.apache.hadoop.hive.ql.log.PerfLogger
logger.PerfLogger.level = ${sys:hive.perflogger.log.level}

# root logger
rootLogger.level = ${sys:hive.log.level}
rootLogger.appenderRefs = root
rootLogger.appenderRef.root.ref = ${sys:hive.root.logger}
posted @ 2026-01-19 15:45  云上小朱  阅读(0)  评论(0)    收藏  举报