ZhangZhihui's Blog  

Download Hive 4.0.1 from https://dlcdn.apache.org/hive/hive-4.0.1/apache-hive-4.0.1-bin.tar.gz .

 .bashrc:

export HIVE_HOME=$sfw/hive-4.0.1
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=$PATH:$HIVE_HOME/bin

export HADOOP_CLASSPATH=$(hadoop classpath):$HIVE_HOME/lib/*

The value of HADOOP_CLASSPATH should be directories containing class files or JAR files. It is used to specify additional Java classes that Hadoop should load.

Accepted values:

  1. Directories containing compiled class files:

    export HADOOP_CLASSPATH=/path/to/classes/
  2. JAR files (you can specify multiple JARs using : as a separator):

    export HADOOP_CLASSPATH=/path/to/lib/some-library.jar:/path/to/another-lib.jar
  3. Wildcards for multiple JARs:

    export HADOOP_CLASSPATH=/path/to/lib/*

    This includes all JAR files in /path/to/lib/.

 

$HIVE_CONF_DIR/hive-env.sh:

export HADOOP_OPTS="$HADOOP_OPTS \
  --add-opens java.base/java.net=ALL-UNNAMED \
  --add-opens java.base/java.lang=ALL-UNNAMED \
  --add-opens java.base/java.io=ALL-UNNAMED \
  --add-opens java.base/java.nio=ALL-UNNAMED \
  --add-opens java.base/java.math=ALL-UNNAMED \
  --add-opens java.base/java.text=ALL-UNNAMED \
  --add-opens java.base/java.util=ALL-UNNAMED \
  --add-opens java.base/java.util.concurrent=ALL-UNNAMED \
  --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED"

The HADOOP_OPTS is to avoid errors like below:

java.lang.reflect.InaccessibleObjectException: Unable to make field private volatile java.lang.String java.net.URI.string accessible: module java.base does not "opens java.net" to unnamed module @10163d6

 

zzh@ZZHPC:~/Downloads/sfw/hive-4.0.1/conf$ cp hive-default.xml.template hive-site.xml

 

1. Replace all occurrences of ${system:java.io.tmpdir} to /tmp/hive. If not do this, you'll get below error:

java.lang.RuntimeException: Error applying authorization policy on hive configuration: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D

2. Replace all occurrences of ${system:user.name} to username, the username should be the one you log in with.

  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/tmp/hive/zzh</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp/hive/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>

 

  <property>
    <name>hive.querylog.location</name>
    <value>/tmp/hive/zzh</value>
    <description>Location of Hive run time structured log file</description>
  </property>

 

  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/tmp/hive/zzh/operation_logs</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>

 

3. Change below properties:

  <property>
    <name>hive.druid.basePersistDirectory</name>
    <value>/tmp/hive</value>
    <description>Local temporary directory used to persist intermediate indexing state, will default to JVM system property java.io.tmpdir.</description>
  </property>

 

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:/home/zzh/hive/metastore_db;create=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>

 

The log file is /tmp/zzh/hive.log. Have no idea where to set this location.

 

4. In $HADOOP_HOME/etc/haddop/core-site.xml, add below properties:

<property>
    <name>hadoop.proxyuser.zzh.groups</name>
    <value>*</value>
</property>

<property>
    <name>hadoop.proxyuser.zzh.hosts</name>
    <value>*</value>
</property>

This configuration allows the zzh user to impersonate any user. If you want to restrict this, specify particular groups and hosts instead of *.

Without this setting, you can try username 'scott' with password 'tiger'.

 

  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  <property>
    <name>hive.metastore.warehouse.external.dir</name>
    <value/>
    <description>Default location for external tables created in the warehouse. If not set or null, then the normal warehouse location will be used as the default location.</description>
  </property>
  <property>
    <name>hive.metastore.uris</name>
    <value/>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>

 

Create Hive Warehouse Directories

As mentioned in the introduction, Hive uses Hadoop HDFS to store the data files hence, we need to create certain directories in HDFS in order to work

First create the HIve data warehouse directory on HDFS.

zzh@ZZHPC:~$ hdfs dfs -mkdir -p /user/hive/warehouse

and then create the temporary tmp directory.

zzh@ZZHPC:~$ hdfs dfs -mkdir /user/tmp

Hive required read and write access to these directories hence, change the permission and grant read and write to HIve.

zzh@ZZHPC:~$ hdfs dfs -chmod g+w /user/tmp
zzh@ZZHPC:~$ hdfs dfs -chmod g+w /user/hive/warehouse

Create Hive Metastore Derby Database

Post Apache Hive Installation, before you start using Hive, you need to initialize the Metastore database with the database type you choose. By default Hive uses the Derby database, you can also choose any RDBS database for Metastore.

Run the schematool -initSchema -dbType derby command, which initializes the derby as Metastore database for Hive.

zzh@ZZHPC:~/hive$ schematool -initSchema -dbType derby
Initializing the schema to: 4.0.0
Metastore connection URL:	 jdbc:derby:/home/zzh/hive/metastore_db;create=true
Metastore connection Driver :	 org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:	 APP
Starting metastore schema initialization to 4.0.0
Initialization script hive-schema-4.0.0.derby.sql

......(many blank lines)

Initialization script completed

zzh@ZZHPC:~/hive$ ls
derby.log metastore_db

 

zzh@ZZHPC:~/hive$ schematool -validate -dbType derby
Starting metastore validation

Validating schema version
[SUCCESS]

Validating sequence number for SEQUENCE_TABLE
[SUCCESS]

Validating metastore schema tables
[SUCCESS]

Validating DFS locations
[SUCCESS]

Validating columns for incorrect NULL values.
[SUCCESS]

Done with metastore validation: [SUCCESS]

 

zzh@ZZHPC:~/hive$ beeline -u jdbc:derby:/home/zzh/hive/metastore_db
Connecting to jdbc:derby:/home/zzh/hive/metastore_db
Connected to: Apache Derby (version 10.14.2.0 - (1828579))
Driver: Apache Derby Embedded JDBC Driver (version 10.14.2.0 - (1828579))
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.1 by Apache Hive
0: jdbc:derby:/home/zzh/hive/metastore_db> SELECT tablename FROM sys.systables WHERE tabletype = 'T' ORDER BY tablename;
+--------------------------------+
|           TABLENAME            |
+--------------------------------+
| AUX_TABLE                      |
| BUCKETING_COLS                 |
| CDS                            |
| COLUMNS                        |
| COLUMNS_V2                     |
| COMPACTION_METRICS_CACHE       |
| COMPACTION_QUEUE               |
| COMPLETED_COMPACTIONS          |
| COMPLETED_TXN_COMPONENTS       |
| CTLGS                          |
| DATABASE_PARAMS                |
| DATACONNECTORS                 |
| DATACONNECTOR_PARAMS           |
| DBS                            |
| DB_PRIVS                       |
| DC_PRIVS                       |
| DELEGATION_TOKENS              |
| FUNCS                          |
| FUNC_RU                        |
| GLOBAL_PRIVS                   |
| HIVE_LOCKS                     |
| I_SCHEMA                       |
| KEY_CONSTRAINTS                |
| MASTER_KEYS                    |
| MATERIALIZATION_REBUILD_LOCKS  |
| METASTORE_DB_PROPERTIES        |
| MIN_HISTORY_LEVEL              |
| MIN_HISTORY_WRITE_ID           |
| MV_CREATION_METADATA           |
| MV_TABLES_USED                 |
| NEXT_COMPACTION_QUEUE_ID       |
| NEXT_LOCK_ID                   |
| NEXT_WRITE_ID                  |
| NOTIFICATION_LOG               |
| NOTIFICATION_SEQUENCE          |
| NUCLEUS_TABLES                 |
| PACKAGES                       |
| PARTITIONS                     |
| PARTITION_EVENTS               |
| PARTITION_KEYS                 |
| PARTITION_KEY_VALS             |
| PARTITION_PARAMS               |
| PART_COL_PRIVS                 |
| PART_COL_STATS                 |
| PART_PRIVS                     |
| REPLICATION_METRICS            |
| REPL_TXN_MAP                   |
| ROLES                          |
| ROLE_MAP                       |
| RUNTIME_STATS                  |
| SCHEDULED_EXECUTIONS           |
| SCHEDULED_QUERIES              |
| SCHEMA_VERSION                 |
| SDS                            |
| SD_PARAMS                      |
| SEQUENCE_TABLE                 |
| SERDES                         |
| SERDE_PARAMS                   |
| SKEWED_COL_NAMES               |
| SKEWED_COL_VALUE_LOC_MAP       |
| SKEWED_STRING_LIST             |
| SKEWED_STRING_LIST_VALUES      |
| SKEWED_VALUES                  |
| SORT_COLS                      |
| STORED_PROCS                   |
| TABLE_PARAMS                   |
| TAB_COL_STATS                  |
| TBLS                           |
| TBL_COL_PRIVS                  |
| TBL_PRIVS                      |
| TXNS                           |
| TXN_COMPONENTS                 |
| TXN_LOCK_TBL                   |
| TXN_TO_WRITE_ID                |
| TXN_WRITE_NOTIFICATION_LOG     |
| TYPES                          |
| TYPE_FIELDS                    |
| VERSION                        |
| WM_MAPPING                     |
| WM_POOL                        |
| WM_POOL_TO_TRIGGER             |
| WM_RESOURCEPLAN                |
| WM_TRIGGER                     |
| WRITE_SET                      |
+--------------------------------+
84 rows selected (0.142 seconds)
0: jdbc:derby:/home/zzh/hive/metastore_db> SELECT * FROM version;
+---------+-----------------+-----------------------------+
| VER_ID  | SCHEMA_VERSION  |       VERSION_COMMENT       |
+---------+-----------------+-----------------------------+
| 1       | 4.0.0           | Hive release version 4.0.0  |
+---------+-----------------+-----------------------------+
1 row selected (0.012 seconds)
0: jdbc:derby:/home/zzh/hive/metastore_db>

 

zzh@ZZHPC:~$ hive --version
Hive 4.0.1

 

zzh@ZZHPC:~/hive$ schematool -upgradeSchema -dbType derby
Upgrading from the version 4.0.0
No schema upgrade required from version 4.0.0

 

zzh@ZZHPC:~/hive$ hive --service metastore
2025-03-01 10:43:47: Starting Hive Metastore Server

 

No error in log file:

zzh@ZZHPC:/tmp/zzh$ grep ERROR hive.log

 

zzh@ZZHPC:~$ hiveserver2
2025-03-01 10:53:57: Starting HiveServer2
Hive Session ID = eb62a419-9f8f-43f5-8e58-55ddc8462bf9
Hive Session ID = d9996cfc-d4a3-4a28-a807-e6bb7f2764d6

 

Got error:

ERROR XSDB6: Another instance of Derby may have already booted the database /home/zzh/hive/metastore_db.

Both metastore and hiveserver2 are configured to use the embedded Derby database, then the second service you start will encounter this error because Derby only allows a single active connection.

 

Can I use Hive Server 2 without starting the metastore service?

No, in most cases, you must start the Hive Metastore Service before HiveServer2, unless you're using an alternative approach like remote metastore or an external database.

Why Does HiveServer2 Need Metastore?

  • HiveServer2 is responsible for query execution.
  • It relies on the Metastore to fetch metadata (tables, partitions, schemas).
  • If the Metastore isn't running, HiveServer2 won't be able to retrieve metadata, and queries will fail.

When Can HiveServer2 Run Without a Metastore Service?

You can avoid manually starting the Metastore service in these scenarios:

  1. Using Embedded Metastore Mode (Not Recommended)

    • By default, if no separate Metastore service is running, HiveServer2 starts its own embedded metastore.
    • This works, but it locks the Derby database, preventing other services from accessing it.
    • Only recommended for single-user setups.
  2. Using an External Database (MySQL/PostgreSQL)

    • If Hive Metastore is configured to use MySQL or PostgreSQL, HiveServer2 can directly connect to it without requiring a separately running Metastore service.
    • This is the recommended setup for production.
  3. Using a Remote Metastore

    • If another Metastore instance is running on a different machine (or a different process), HiveServer2 can connect to it remotely.

How to Check Your Configuration?

Look at your hive-site.xml:

  • If javax.jdo.option.ConnectionURL points to Derby (jdbc:derby), you must start Metastore first.
  • If it points to MySQL or PostgreSQL, HiveServer2 can work without manually starting Metastore.

 

Start hiveserver2 without starting metastore service first:

zzh@ZZHPC:~$ hiveserver2
2025-03-01 11:19:38: Starting HiveServer2
Hive Session ID = eef9fe9b-c8fc-40bd-878c-6ae0e530e7f8
Hive Session ID = 6d9c9d1d-7c53-4d59-b77e-eeaf95efd9b5

Hiveserver2 started successfully:

2025-03-01T11:19:43,261  INFO [main] server.HiveServer2: Starting Web UI on port 10002
2025-03-01T11:19:43,276  INFO [HiveMaterializedViewsRegistry-0] metadata.HiveMaterializedViewsRegistry: Materialized views registry has been initialized
2025-03-01T11:19:43,292  INFO [main] util.log: Logging initialized @4115ms to org.eclipse.jetty.util.log.Slf4jLog
2025-03-01T11:19:43,354  INFO [main] http.HttpServer: ASYNC_PROFILER_HOME env or -Dasync.profiler.home not specified. Disabling /prof endpoint..
2025-03-01T11:19:43,357  INFO [main] service.AbstractService: Service:OperationManager is started.
2025-03-01T11:19:43,357  INFO [main] service.AbstractService: Service:SessionManager is started.
2025-03-01T11:19:43,357  INFO [main] service.AbstractService: Service:CLIService is started.
2025-03-01T11:19:43,357  INFO [main] service.AbstractService: Service:ThriftBinaryCLIService is started.
2025-03-01T11:19:43,511  INFO [main] thrift.ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads
2025-03-01T11:19:43,511  INFO [main] service.AbstractService: Service:HiveServer2 is started.
2025-03-01T11:19:43,512  INFO [main] server.Server: jetty-9.4.45.v20220203; built: 2022-02-03T09:14:34.105Z; git: 4a0c91c0be53805e3fcffdcdcc9587d5301863db; jvm 21.0.6+7-Ubuntu-122.04.1
2025-03-01T11:19:43,580  INFO [main] server.session: DefaultSessionIdManager workerName=node0
2025-03-01T11:19:43,580  INFO [main] server.session: No SessionScavenger set, using defaults
2025-03-01T11:19:43,580  INFO [main] server.session: node0 Scavenging every 660000ms
2025-03-01T11:19:43,603  INFO [main] handler.ContextHandler: Started o.e.j.w.WebAppContext@74159dc9{hiveserver2,/,file:///tmp/jetty-0_0_0_0-10002-hive-service-4_0_1_jar-_-any-14826812884783587992/webapp/,AVAILABLE}{jar:file:/home/zzh/Downloads/sfw/hive-4.0.1/lib/hive-service-4.0.1.jar!/hive-webapps/hiveserver2}
2025-03-01T11:19:43,603  INFO [main] handler.ContextHandler: Started o.e.j.s.ServletContextHandler@2ffaa711{static,/static,jar:file:/home/zzh/Downloads/sfw/hive-4.0.1/lib/hive-service-4.0.1.jar!/hive-webapps/static,AVAILABLE}
2025-03-01T11:19:43,604  INFO [main] handler.ContextHandler: Started o.e.j.s.ServletContextHandler@29ebaf2f{logs,/logs,file:///tmp/zzh/,AVAILABLE}
2025-03-01T11:19:43,608  INFO [main] server.AbstractConnector: Started ServerConnector@13f10967{HTTP/1.1, (http/1.1)}{0.0.0.0:10002}
2025-03-01T11:19:43,608  INFO [main] server.HiveServer2: Web UI has started on port 10002
2025-03-01T11:19:43,608  INFO [main] server.Server: Started @4431ms
2025-03-01T11:19:43,608  INFO [main] http.HttpServer: Started HttpServer[hiveserver2] on port 10002

After running for 5 minutes, there's no error in log file:

zzh@ZZHPC:/tmp/zzh$ grep ERROR hive.log

 

Check if the HiveServer2 service is running and listening on port 10000 using netstat command.

zzh@ZZHPC:/tmp/zzh$ sudo netstat -anp | grep 10000
tcp6       0      0 :::10000                :::*                    LISTEN      12934/java

 

zzh@ZZHPC:~$ beeline -u jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 4.0.1)
Driver: Hive JDBC (version 4.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.1 by Apache Hive
0: jdbc:hive2://localhost:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (1.319 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+-----------+
| tab_name  |
+-----------+
+-----------+
No rows selected (0.054 seconds)
0: jdbc:hive2://localhost:10000> create table t1(id int);
ERROR : Failed
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=anonymous, access=WRITE, inode="/user/hive/warehouse":zzh:supergroup:drwxrwxr-x

 

zzh@ZZHPC:~$ beeline -u jdbc:hive2://localhost:10000 -n zzh
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 4.0.1)
Driver: Hive JDBC (version 4.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.1 by Apache Hive
0: jdbc:hive2://localhost:10000> create table t1(id int);
No rows affected (0.202 seconds)

 

zzh@ZZHPC:/tmp/zzh$ hdfs dfs -ls /user/hive/warehouse
Found 1 items
drwxr-xr-x   - zzh supergroup          0 2025-03-01 11:45 /user/hive/warehouse/t1

 

0: jdbc:hive2://localhost:10000> insert into t1 values(1);
INFO  : Compiling command(queryId=zzh_20250301123012_cd0453b9-6936-467f-a5c4-be54ccfab829): insert into t1 values(1)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col1, type:int, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=zzh_20250301123012_cd0453b9-6936-467f-a5c4-be54ccfab829); Time taken: 2.225 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=zzh_20250301123012_cd0453b9-6936-467f-a5c4-be54ccfab829): insert into t1 values(1)
WARN  : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez) or using Hive 1.X releases.
INFO  : Query ID = zzh_20250301123012_cd0453b9-6936-467f-a5c4-be54ccfab829
INFO  : Total jobs = 3
INFO  : Launching Job 1 out of 3
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1740801980634_0003
INFO  : Executing with tokens: []
INFO  : The url to track the job: http://ZZHPC:8088/proxy/application_1740801980634_0003/
INFO  : Starting Job = job_1740801980634_0003, Tracking URL = http://ZZHPC:8088/proxy/application_1740801980634_0003/
INFO  : Kill Command = /home/zzh/Downloads/sfw/hadoop-3.4.1/bin/mapred job  -kill job_1740801980634_0003
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO  : 2025-03-01 12:30:20,453 Stage-1 map = 0%,  reduce = 0%
INFO  : 2025-03-01 12:30:33,723 Stage-1 map = 100%,  reduce = 100%
ERROR : Ended Job = job_1740801980634_0003 with errors
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 HDFS EC Read: 0 FAIL
INFO  : Total MapReduce CPU Time Spent: 0 msec
INFO  : Completed executing command(queryId=zzh_20250301123012_cd0453b9-6936-467f-a5c4-be54ccfab829); Time taken: 21.401 seconds
Error: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
0: jdbc:hive2://localhost:10000>

 

Error in /tmp/zzh/hive.log:

Error: java.lang.reflect.InaccessibleObjectException: Unable to make field private volatile java.lang.String java.net.URI.string accessible: module java.base does not "opens java.net" to unnamed module @1f7030a6
	at java.base/java.lang.reflect.AccessibleObject.throwInaccessibleObjectException(AccessibleObject.java:391)
	at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:367)
	at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:315)
	at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:183)
	at java.base/java.lang.reflect.Field.setAccessible(Field.java:177)
	at org.apache.hadoop.hive.common.StringInternUtils.<clinit>(StringInternUtils.java:57)
	at org.apache.hadoop.hive.ql.plan.TableDesc.setProperties(TableDesc.java:132)
	at org.apache.hadoop.hive.ql.plan.TableDesc.<init>(TableDesc.java:69)
	at org.apache.hadoop.hive.ql.exec.Utilities.<clinit>(Utilities.java:708)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:483)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFiltersAndAsOf(HiveInputFormat.java:1009)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:708)
	at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:176)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:525)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)

To fix this error related to MapReduce, do the following:

1. In $HADOOP_HOME/etc/hadoop/yarn-env.sh, add HADOOP_OPTS like in hive-env.sh.

zzh@ZZHPC:~/Downloads/sfw/hadoop-3.4.1/etc/hadoop$ cat yarn-env.sh
export HADOOP_OPTS="$HADOOP_OPTS \
  --add-opens java.base/java.net=ALL-UNNAMED \
  --add-opens java.base/java.lang=ALL-UNNAMED \
  --add-opens java.base/java.io=ALL-UNNAMED \
  --add-opens java.base/java.nio=ALL-UNNAMED \
  --add-opens java.base/java.math=ALL-UNNAMED \
  --add-opens java.base/java.text=ALL-UNNAMED \
  --add-opens java.base/java.util=ALL-UNNAMED \
  --add-opens java.base/java.util.concurrent=ALL-UNNAMED \
  --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED"

2. Add below content in yarn-site.xml:

    <property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>--add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.math=ALL-UNNAMED --add-opens java.base/java.text=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED</value>
    </property>
    <property>
        <name>yarn.app.tez.am.command-opts</name>
        <value>--add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.math=ALL-UNNAMED --add-opens java.base/java.text=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED</value>
    </property>

3. Add below content in mapred-site.xml:

  <property>
    <name>mapreduce.map.java.opts</name>
    <value>--add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.math=ALL-UNNAMED --add-opens java.base/java.text=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED</value>
  </property>
  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>--add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.math=ALL-UNNAMED --add-opens java.base/java.text=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED</value>
  </property>

4. Restart the services.

The error is gone:

0: jdbc:hive2://localhost:10000> insert into t1 values(1);
INFO  : Compiling command(queryId=zzh_20250301132145_65f7cd1a-36ed-490d-939e-f358f9b67eea): insert into t1 values(1)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col1, type:int, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=zzh_20250301132145_65f7cd1a-36ed-490d-939e-f358f9b67eea); Time taken: 2.678 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=zzh_20250301132145_65f7cd1a-36ed-490d-939e-f358f9b67eea): insert into t1 values(1)
WARN  : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez) or using Hive 1.X releases.
INFO  : Query ID = zzh_20250301132145_65f7cd1a-36ed-490d-939e-f358f9b67eea
INFO  : Total jobs = 3
INFO  : Launching Job 1 out of 3
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1740806384983_0001
INFO  : Executing with tokens: []
INFO  : The url to track the job: http://ZZHPC:8088/proxy/application_1740806384983_0001/
INFO  : Starting Job = job_1740806384983_0001, Tracking URL = http://ZZHPC:8088/proxy/application_1740806384983_0001/
INFO  : Kill Command = /home/zzh/Downloads/sfw/hadoop-3.4.1/bin/mapred job  -kill job_1740806384983_0001
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO  : 2025-03-01 13:21:56,172 Stage-1 map = 0%,  reduce = 0%
INFO  : 2025-03-01 13:22:01,317 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.15 sec
INFO  : 2025-03-01 13:22:05,407 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.15 sec
INFO  : MapReduce Total cumulative CPU time: 4 seconds 260 msec
INFO  : Ended Job = job_1740806384983_0001
INFO  : Starting task [Stage-7:CONDITIONAL] in serial mode
INFO  : Stage-4 is selected by condition resolver.
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-5 is filtered out by condition resolver.
INFO  : Starting task [Stage-4:MOVE] in serial mode
INFO  : Moving data to directory hdfs://localhost:9000/user/hive/warehouse/t1/.hive-staging_hive_2025-03-01_13-21-45_796_1665654529337096155-1/-ext-10000 from hdfs://localhost:9000/user/hive/warehouse/t1/.hive-staging_hive_2025-03-01_13-21-45_796_1665654529337096155-1/-ext-10002
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table default.t1 from hdfs://localhost:9000/user/hive/warehouse/t1/.hive-staging_hive_2025-03-01_13-21-45_796_1665654529337096155-1/-ext-10000
INFO  : Starting task [Stage-2:STATS] in serial mode
INFO  : Executing stats task
INFO  : Table default.t1 stats: [numFiles=1, numRows=1, totalSize=2, rawDataSize=1, numFilesErasureCoded=0]
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.26 sec   HDFS Read: 17776 HDFS Write: 195 HDFS EC Read: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 4 seconds 260 msec
INFO  : Completed executing command(queryId=zzh_20250301132145_65f7cd1a-36ed-490d-939e-f358f9b67eea); Time taken: 20.39 seconds
1 row affected (23.146 seconds)
0: jdbc:hive2://localhost:10000>

 

0: jdbc:hive2://localhost:10000> CREATE TABLE employees (emp_id INT, emp_name STRING, emp_salary DOUBLE);
No rows affected (0.283 seconds)
0: jdbc:hive2://localhost:10000> INSERT INTO TABLE employees VALUES (1, 'Alice', 65000.0), (2, 'Bob', 75000.0), (3, 'Charlie', 60000.0);
WARN  : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez) or using Hive 1.X releases.
3 rows affected (20.547 seconds)
0: jdbc:hive2://localhost:10000> SELECT emp_name, emp_salary FROM employees;
+-----------+-------------+
| emp_name  | emp_salary  |
+-----------+-------------+
| Alice     | 65000.0     |
| Bob       | 75000.0     |
| Charlie   | 60000.0     |
+-----------+-------------+
3 rows selected (0.091 seconds)
0: jdbc:hive2://localhost:10000> UPDATE employees SET emp_salary = 70000.0 WHERE emp_id = 3;
Error: Error while compiling statement: FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations. (state=42000,code=10294)

 

Download Tez 0.10.4 from https://dlcdn.apache.org/tez/0.10.4/apache-tez-0.10.4-bin.tar.gz .

 
zzh@ZZHPC:~/Downloads$ mv apache-tez-0.10.4-bin.tar.gz tez-0.10.4.tar.gz
zzh@ZZHPC:~/Downloads$ tar -xzf tez-0.10.4.tar.gz
zzh@ZZHPC:~/Downloads$ cd sfw
zzh@ZZHPC:~/Downloads/sfw$ rm -rf tez-0.10.4
zzh@ZZHPC:~/Downloads/sfw$ cd ..
zzh@ZZHPC:~/Downloads$ mv apache-tez-0.10.4-bin tez-0.10.4
zzh@ZZHPC:~/Downloads$ mv tez-0.10.4 sfw

 

zzh@ZZHPC:~/Downloads$ hdfs dfs -mkdir -p /apps/tez
zzh@ZZHPC:~/Downloads/sfw$ hdfs dfs -copyFromLocal tez-0.10.4 /apps/tez zzh@ZZHPC:~/Downloads/sfw/tez-0.10.4/lib$ mv slf4j-reload4j-1.7.36.jar ../../..
zzh@ZZHPC:~/Downloads/sfw/tez-0.10.4/lib$ mv slf4j-api-1.7.36.jar ../../..

 

.bashrc:
export TEZ_HOME=$sfw/tez-0.10.4
export TEZ_CONF_DIR=$TEZ_HOME/conf
export HADOOP_CLASSPATH=$(hadoop classpath):$HIVE_HOME/lib/*:$TEZ_CONF_DIR:$TEZ_HOME/*.jar:$TEZ_HOME/lib/*

If $TEZ_CONF_DIR is not included, you will get this error:

org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration

But, if you put tez-site.xml under etc/hadoop, you don't need to put TEZ_CONF_DIR in the classpath.

If CLASSPATH is set by hand, please make sure at least the following has been added:

${HADOOP_HOME}/share/hadoop/common
${HADOOP_HOME}/share/hadoop/common/lib
${HADOOP_HOME}/share/hadoop/hdfs
${HADOOP_HOME}/share/hadoop/hdfs/lib

 

zzh@ZZHPC:~/Downloads/sfw/tez-0.10.4/conf$ vi tez-site.xml:

<configuration>
  <property>
    <name>tez.lib.uris</name>
    <value>${fs.defaultFS}/apps/tez/tez-0.10.4,${fs.defaultFS}/apps/tez/tez-0.10.4/lib</value>
  </property>
</configuration>

Various ways to configure tez.lib.uris

The tez.lib.uris configuration property supports a comma-separated list of values. The types of values supported are:

  • Path to simple file
  • Path to a directory
  • Path to a compressed archive ( tarball, zip, etc).

For simple files and directories, Tez will add all these files and first-level entries in the directories (recursive traversal of dirs is not supported) into the working directory of the Tez runtime and they will automatically be included into the classpath. For archives i.e. files whose names end with generally known compressed archive suffixes such as ‘tgz’, ‘tar.gz’, ‘zip’, etc. will be uncompressed into the container working directory too. However, given that the archive structure is not known to the Tez framework, the user is expected to configure tez.lib.uris.classpath to ensure that the nested directory structure of an archive is added to the classpath. This classpath values should be relative i.e. the entries should start with “./”.

 

hive-site.xml:

  <property>
    <name>hive.execution.engine</name>
    <value>tez</value>
    <description>
      Expects one of [mr, tez].
      Chooses execution engine. Options are: mr (Map reduce, default), tez. While MR
      remains the default engine for historical reasons, it is itself a historical engine
      and is deprecated in Hive 2 line. It may be removed without further warning.
    </description>
  </property>

 

mapred-site.xml:

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn-tez</value>
  </property>

 

0: jdbc:hive2://localhost:10000> DROP TABLE employees;
No rows affected (1.637 seconds)
0: jdbc:hive2://localhost:10000> CREATE TABLE employees (emp_id INT, emp_name STRING, emp_salary DOUBLE) STORED AS ORC TBLPROPERTIES ('transactional'='true');
Error: Error while compiling statement: FAILED: SemanticException [Error 10265]: This command is not allowed on an ACID table default.employees with a non-ACID transaction manager. Failed command: CREATE TABLE employees (emp_id INT, emp_name STRING, emp_salary DOUBLE) STORED AS ORC TBLPROPERTIES ('transactional'='true') (state=42000,code=10265)

1. Enable ACID Transactions in Hive

Before creating an ACID table, ensure that Hive is configured for transactions. Set the following parameters in your Hive session:

SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=2;

Alternatively, you can add these settings permanently in hive-site.xml:

  <property>
    <name>hive.support.concurrency</name>
    <value>true</value>
    <description>
      Whether Hive supports concurrency control or not.
      A ZooKeeper instance must be up and running when using zookeeper Hive lock manager
    </description>
  </property>

 

  <property>
    <name>hive.txn.manager</name>
    <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
    <description>
      Set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive
      transactions, which also requires appropriate settings for hive.compactor.initiator.on,hive.compactor.cleaner.on,
      hive.compactor.worker.threads, hive.support.concurrency (true),
      and hive.exec.dynamic.partition.mode (nonstrict).
      The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides
      no transactions.
    </description>
  </property>

 

  <property>
    <name>hive.compactor.initiator.on</name>
    <value>true</value>
    <description>
      Whether to run the initiator and cleaner threads on this metastore instance or not.
      Set this to true on one instance of the Thrift metastore service as part of turning
      on Hive transactions. For a complete list of parameters required for turning on
      transactions, see hive.txn.manager.
    </description>
  </property>
  <property>
    <name>hive.compactor.worker.threads</name>
    <value>1</value>
    <description>
      How many compactor worker threads to run on this metastore instance. Set this to a
      positive number on one or more instances of the Thrift metastore service as part of
      turning on Hive transactions. For a complete list of parameters required for turning
      on transactions, see hive.txn.manager.
      Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions
      themselves. Increasing the number of worker threads will decrease the time it takes
      tables or partitions to be compacted once they are determined to need compaction.
      It will also increase the background load on the Hadoop cluster as more MapReduce jobs
      will be running in the background.
    </description>
  </property>

2. Ensure the Table is Bucketed

ACID tables must be bucketed. Modify your CREATE TABLE statement as follows:

CREATE TABLE employees (
    emp_id INT,
    emp_name STRING,
    emp_salary DOUBLE
)
CLUSTERED BY (emp_id) INTO 2 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

 

zzh@ZZHPC:~$ beeline -u jdbc:hive2://localhost:10000 -n zzh
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 4.0.1)
Driver: Hive JDBC (version 4.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.1 by Apache Hive
0: jdbc:hive2://localhost:10000> CREATE TABLE employees (
. . . . . . . . . . . . . . . .>     emp_id INT,
. . . . . . . . . . . . . . . .>     emp_name STRING,
. . . . . . . . . . . . . . . .>     emp_salary DOUBLE
. . . . . . . . . . . . . . . .> )
. . . . . . . . . . . . . . . .> CLUSTERED BY (emp_id) INTO 2 BUCKETS
. . . . . . . . . . . . . . . .> STORED AS ORC
. . . . . . . . . . . . . . . .> TBLPROPERTIES ('transactional'='true');
No rows affected (0.273 seconds)
0: jdbc:hive2://localhost:10000> SET hive.execution.engine;
+----------------------------+
|            set             |
+----------------------------+
| hive.execution.engine=tez  |
+----------------------------+
1 row selected (0.05 seconds)

 

0: jdbc:hive2://localhost:10000> INSERT INTO employees VALUES (1, 'Alice', 65000.0), (2, 'Bob', 75000.0), (3, 'Charlie', 60000.0);
Error: Error while compiling statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask. TezSession has already shutdown. Application application_1741000600982_0002 failed 2 times due to AM Container for appattempt_1741000600982_0002_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2025-03-03 19:22:29.281]Exception from container-launch.
Container id: container_1741000600982_0002_02_000001
Exit code: 1

[2025-03-03 19:22:29.283]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/service/AbstractService


[2025-03-03 19:22:29.283]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/service/AbstractService


For more detailed output, check the application tracking page: http://ZZHPC:8088/cluster/app/application_1741000600982_0002 Then click on links to logs of each attempt.
. Failing the application. (state=08S01,code=1)
0: jdbc:hive2://localhost:10000>

 

zzh@ZZHPC:~/Downloads/sfw/tez-0.10.4$ hadoop jar tez-examples-0.10.4.jar orderedwordcount /input /output
2025-03-03 14:32:07,094 INFO shim.HadoopShimsLoader: Trying to locate HadoopShimProvider for hadoopVersion=3.4.1, majorVersion=3, minorVersion=4
2025-03-03 14:32:07,096 INFO shim.HadoopShimsLoader: Picked HadoopShim org.apache.tez.hadoop.shim.DefaultHadoopShim, providerName=null, overrideProviderViaConfig=null, hadoopVersion=3.4.1, majorVersion=3, minorVersion=4
2025-03-03 14:32:07,261 INFO counters.Limits: Counter limits initialized with parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200
2025-03-03 14:32:07,261 INFO counters.Limits: Counter limits initialized with parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200
2025-03-03 14:32:07,261 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.10.4, revision=5b5fff619683a204d0c62f76ddab50e3ddc38760, SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git, buildTime=2024-08-05T12:46:09Z, buildUser=laszlobodor, buildJavaVersion=1.8.0_292 ]
2025-03-03 14:32:07,318 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2025-03-03 14:32:07,734 INFO examples.OrderedWordCount: Running OrderedWordCount
2025-03-03 14:32:07,830 INFO client.TezClient: Submitting DAG application with id: application_1740983045829_0002
2025-03-03 14:32:07,833 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://localhost:9000/apps/tez/tez-0.10.4,hdfs://localhost:9000/apps/tez/tez-0.10.4/lib
2025-03-03 14:32:07,833 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null
2025-03-03 14:32:08,092 INFO client.TezClient: Stage directory /tmp/zzh/tez/staging doesn't exist and is created
2025-03-03 14:32:08,097 INFO client.TezClient: Tez system stage directory hdfs://localhost:9000/tmp/zzh/tez/staging/.tez/application_1740983045829_0002 doesn't exist and is created
2025-03-03 14:32:08,116 INFO conf.Configuration: resource-types.xml not found
2025-03-03 14:32:08,116 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2025-03-03 14:32:08,287 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1740983045829_0002, dagName=OrderedWordCount, callerContext={ context=TezExamples, callerType=null, callerId=null }
2025-03-03 14:32:08,320 INFO impl.YarnClientImpl: Submitted application application_1740983045829_0002
2025-03-03 14:32:08,323 INFO client.TezClient: The url to track the Tez AM: http://ZZHPC:8088/proxy/application_1740983045829_0002/
2025-03-03 14:32:10,836 INFO client.TezClient: App did not succeed. Diagnostics: Application application_1740983045829_0002 failed 2 times due to AM Container for appattempt_1740983045829_0002_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2025-03-03 14:32:10.330]Exception from container-launch.
Container id: container_1740983045829_0002_02_000001
Exit code: 1

[2025-03-03 14:32:10.331]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/service/AbstractService

......

For more detailed output, check the application tracking page: http://ZZHPC:8088/cluster/app/application_1740983045829_0002 Then click on links to logs of each attempt.
. Failing the application.]

Can't fix this issue. Have to use mr instead of tez. 

 

0: jdbc:hive2://localhost:10000> INSERT INTO employees VALUES (1, 'Alice', 65000.0), (2, 'Bob', 75000.0), (3, 'Charlie', 60000.0);
3 rows affected (45.227 seconds)

 

 

 Old notes:

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>

 

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>org.apache.derby.jdbc.EmbeddedDriver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

 

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>APP</value>
    <description>Username to use against metastore database</description>
  </property>

 

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>mine</value>
    <description>password to use against metastore database</description>
  </property>

 

  <property>
    <name>hive.server2.transport.mode</name>
    <value>binary</value>
    <description>
      Expects one of [binary, http, all].
      Transport mode of HiveServer2.
    </description>
  </property>
  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value/>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
  </property>

When the property hive.server2.thrift.bind.host is not set, HiveServer2 will bind to all available network interfaces on the host machine by default. This means that it will listen on all IP addresses assigned to the machine, including localhost (127.0.0.1) and any external network interfaces.

  <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
  </property>

 

  <property>
    <name>hive.server2.thrift.http.port</name>
    <value>10001</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.</description>
  </property>
  <property>
    <name>hive.server2.thrift.http.path</name>
    <value>cliservice</value>
    <description>Path component of URL endpoint when in HTTP mode.</description>
  </property>

 

  <property>
    <name>hive.server2.authentication</name>
    <value>NONE</value>
    <description>
      Expects one of [nosasl, none, ldap, kerberos, pam, custom, saml, jwt].
      Client authentication types.
        NONE: no authentication check
        LDAP: LDAP/AD based authentication
        KERBEROS: Kerberos/GSSAPI authentication
        CUSTOM: Custom authentication provider
                (Use with property hive.server2.custom.authentication.class)
        PAM: Pluggable authentication module
        NOSASL:  Raw transport
        SAML: SAML 2.0 compliant authentication. This is only supported in http transport mode.
        JWT: JWT based authentication. HS2 expects JWT contains the user name as subject and was signed by an
             asymmetric key. This is only supported in http transport mode.
    </description>
  </property>

 

Remote Metastore Database:

Run MySQL docker container without creating a database:

zzh@ZZHPC:~$ docker run --name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=root -d mysql

 

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hivemeta?createDatabaseIfNotExist=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>

 

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

 

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>

 

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
    <description>password to use against metastore database</description>
  </property>

 

Download the MySQL JDBC driver and put it in $HIVE_HOME/lib:

zzh@ZZHPC:~/Downloads/mysql-connector-j-9.2.0$ cp mysql-connector-j-9.2.0.jar $HIVE_HOME/lib

 

This command will:

  • Create the hivemeta database and necessary tables in it.
  • Set up the Hive metastore schema version information.
zzh@ZZHPC:~$ schematool -initSchema -dbType mysql
Initializing the schema to: 4.0.0
Metastore connection URL: jdbc:mysql://localhost:3306/hivemeta?createDatabaseIfNotExist=true
Metastore connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 4.0.0
Initialization script hive-schema-4.0.0.mysql.sql

......(many blank lines)

Initialization script completed

 

zzh@ZZHPC:~/Downloads/sfw/hive-4.0.1/conf$ beeline -u "jdbc:mysql://localhost:3306/hivemeta" -n root -p root
Connecting to jdbc:mysql://localhost:3306/hivemeta
Connected to: MySQL (version 9.2.0)
Driver: MySQL Connector/J (version mysql-connector-j-9.2.0 (Revision: a3909bfeb62d5a517ab444bb88ba7ecf26100297))
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.1 by Apache Hive
0: jdbc:mysql://localhost:3306/hivemeta>

 

posted on 2025-02-25 20:24  ZhangZhihuiAAA  阅读(182)  评论(0)    收藏  举报