Impala1.2.4安装和配置

Impala1.2.4安装手册

安装前说明:

1、  安全性考虑,我们使用hive用到的账户cup进行impala的启停等操作,而不另外使用impala账户;这涉及到后文中的一些文件夹权限调整、配置文件中的用户参数调整;

2、  性能考虑,impala-state-store、impala-catalog这两个服务安装在hadoop集群的namenode上面,impala-server、impala-shell服务安装在各个datanode上,namenode上不安装使用impala-server;

3、  在安装impala相关软件包的时候使用root账户,之后再将相关文件所有者修改为cup账户;

4、  启停impala服务需要root权限的账号;

5、  安装步骤参照官方文档:

http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/Installing-and-Using-Impala.html

安装Impala软件包

下载所需要的安装包,根据需要选择合适的版本(由于我们用的是CDH4.2.1版本,所以选择了impala1.2.4):

http://archive.cloudera.com/impala/redhat/6/x86_64/impala/

 

在Hadoop集群的namenode节点上依次安装以下的包:

rpm -ivh ./bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm

rpm -ivh ./impala-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-state-store-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-server-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-catalog-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-udf-devel-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-shell-1.2.4-1.p0.420.el6.x86_64.rpm

注意:impala的安装依赖这个包:bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm,这个包在官网1.2.4版本的目录中找不到,需要在1.2.3或者其他版本的目录中下载。

 

在其它datanode节点上依次安装以下的包:

rpm -ivh ./bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm

rpm -ivh ./impala-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-server-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-catalog-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-udf-devel-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-shell-1.2.4-1.p0.420.el6.x86_64.rpm

 

查看安装之后的impala路径:

[root@cup-slave-11 cup]# find / -name impala

/etc/alternatives/impala

/etc/impala

/etc/default/impala

/var/log/impala

/var/lib/alternatives/impala

/var/lib/impala

/var/run/impala

/usr/lib/impala

Impala配置

在hdfs-site.xml文件中添加如下内容:

<property>

    <name>dfs.client.read.shortcircuit</name>

    <value>true</value>

</property>

<property>

    <name>dfs.domain.socket.path</name>

    <value>/var/run/hadoop-hdfs/dn._PORT</value>

</property>

<property>

  <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>

  <value>true</value>

</property>

<property>

   <name>dfs.client.use.legacy.blockreader.local</name>

   <value>false</value>

</property>

<property>

   <name>dfs.datanode.data.dir.perm</name>

   <value>750</value>

</property>

<property>

   <name>dfs.block.local-path-access.user</name>

   <value>cup</value>

</property>

<property>

   <name>dfs.client.file-block-storage-locations.timeout</name>

   <value>3000</value>

</property>

 

添加配置文件:

impalad的配置文件路径由环境变量IMPALA_CONF_DIR指定,默认为/etc/impala/conf,拷贝配置好的hive-site.xml、core-site.xml、hdfs-site.xml、hbase-site.xml文件至/etc/impala/conf目录下。

 

将相关so文件拷贝到hadoop的lib目录(如果目标目录有这些文件,可以忽略此步骤):

cp /usr/lib/impala/lib/*.so* $HADOOP_HOME/lib/native/

 

用$HIVE_HOME/lib目录下带“datanucleus”字样的文件替换/usr/lib/impala/lib目录下对应文件(名称要改成跟/usr/lib/impala/lib原来的一样);不然在启动impala-state-store 和impala-catalog的时候会报错,详见异常3、异常5。

 

复制$HADOOP_HOME/lib 目录下的mysql-connector-java.jar文件到“/usr/share/java”目录,因为impala的catalogd要使用(注意mysql驱动包的名称一定要是mysql-connector-java.jar):

[root@cup-slave-11 native]# more /usr/bin/catalogd

#!/bin/bash

 

export IMPALA_BIN=${IMPALA_BIN:-/usr/lib/impala/sbin}

export IMPALA_HOME=${IMPALA_HOME:-/usr/lib/impala}

export HIVE_HOME=${HIVE_HOME:-/usr/lib/hive}

export HBASE_HOME=${HBASE_HOME:-/usr/lib/hbase}

export IMPALA_CONF_DIR=${IMPALA_CONF_DIR:-/etc/impala/conf}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/impala/conf}

export HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/impala/conf}

export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/etc/impala/conf}

export LIBHDFS_OPTS=${LIBHDFS_OPTS:--Djava.library.path=/usr/lib/impala/lib}

export MYSQL_CONNECTOR_JAR=${MYSQL_CONNECTOR_JAR:-/usr/share/java/mysql-connector-java.jar}

 

根据实际环境修改impala配置信息:

[root@cup-master-1 ~]# vi /etc/default/impala

 

IMPALA_STATE_STORE_HOST=10.204.193.10

IMPALA_STATE_STORE_PORT=24000

IMPALA_BACKEND_PORT=22000

IMPALA_LOG_DIR=/var/log/impala

 

IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} "

IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}"

IMPALA_SERVER_ARGS=" \

    -log_dir=${IMPALA_LOG_DIR} \

    -state_store_port=${IMPALA_STATE_STORE_PORT} \

    -use_statestore \

    -state_store_host=${IMPALA_STATE_STORE_HOST} \

    -be_port=${IMPALA_BACKEND_PORT}"

 

ENABLE_CORE_DUMPS=false

 

# LIBHDFS_OPTS=-Djava.library.path=/usr/lib/impala/lib

MYSQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar

IMPALA_BIN=/usr/lib/impala/sbin

IMPALA_HOME=/usr/lib/impala

HIVE_HOME=/home/cup/hive-0.10.0-cdh4.2.1

HBASE_HOME=/home/cup/hbase-0.94.2-cdh4.2.1

IMPALA_CONF_DIR=/etc/impala/conf

HADOOP_CONF_DIR=/etc/impala/conf

HIVE_CONF_DIR=/etc/impala/conf

HBASE_CONF_DIR=/etc/impala/conf

 

根据实际环境修改impala相关脚本文件/etc/init.d/impala-state-store、/etc/init.d/impala-server、/etc/init.d/impala-catalog,修改其中两处跟用户相关的地方:

DAEMON="catalogd"

DESC="Impala Catalog Server"

EXEC_PATH="/usr/bin/catalogd"

SVC_USER="cup"  ###编者注:这里默认是impala

DAEMON_FLAGS="${IMPALA_CATALOG_ARGS}"

CONF_DIR="/etc/impala/conf"

PIDFILE="/var/run/impala/catalogd-impala.pid"

LOCKDIR="/var/lock/subsys"

LOCKFILE="$LOCKDIR/catalogd"

 

install -d -m 0755 -o cup -g cup /var/run/impala 1>/dev/null 2>&1 || :

[ -d "$LOCKDIR" ] || install -d -m 0755 $LOCKDIR 1>/dev/null 2>&1 || :

 

 

在hdfs上创建impala目录:

hadoop dfs -mkdir /user/impala

 

在每个节点上创建/var/run/hadoop-hdfs,因为hdfs-site.xml文件的dfs.domain.socket.path参数指定了这个目录:

[root@cup-slave-11 impala]# mkdir /var/run/hadoop-hdfs

 

将/var/run/hadoop-hdfs和/var/log/impala目录的所有权赋给cup账户和cup用户组,不然在启动impala-server的时候会出现异常4:

chown -R cup:cup /var/log/impala

chown -R cup:cup /var/run/hadoop-hdfs

启动Impala服务

启动namenode节点上impala的state-store服务:

sudo service impala-state-store start

启动namenode节点上impala的catalog服务:

sudo service impala-catalog start

启动datanode节点上impala的impala-server服务:

sudo service impala-server start

停止namenode节点上impala的state-store服务:

sudo service impala-state-store stop

停止namenode节点上impala的catalog服务:

sudo service impala-catalog stop

停止datanode节点上impala的impala-server服务:

sudo service impala-server stop

 

注意:少数情况下启动impala服务虽然没有明显的错误提示,但是也有可能并未启动成功,需要观察/var/log/impala中是否有error字样的错误日志,如果有的话需要进一步核查。

确认Impala正常使用

查看datanode上面的impala进程是否存在:

[cup@cup-master-1 ~]$ ps -ef|grep impala

cup       5522 45968  0 08:58 pts/25   00:00:00 grep impala

cup       8292     1  0 Mar27 ?        00:01:06 /usr/lib/impala/sbin/statestored -log_dir=/var/log/impala -state_store_port=24000

 

查看datanode上面的impala-server进程是否存在:

[cup@cup-slave-11 ~]$ ps -ef|grep impala

cup       15630  15599  0 09:24 pts/0    00:00:00 grep impala

cup      112216      1  0 Mar27 ?        00:01:15 /usr/lib/impala/sbin/impalad -log_dir=/var/log/impala -state_store_port=24000 -use_statestore -state_store_host=10.204.193.10 -be_port=22000

 

访问datanode上impala的web页面,默认端口25010:

  

访问datanode上面impala的web页面,默认端口25000:

 

在安装了impala-shell的节点执行sql语句:

[cup@cup-slave-11 ~]$ impala-shell

Starting Impala Shell without Kerberos authentication

Connected to cup-slave-11:21000

Server version: impalad version 1.2.4 RELEASE (build ac29ae09d66c1244fe2ceb293083723226e66c1a)

Welcome to the Impala shell. Press TAB twice to see a list of available commands.

 

Copyright (c) 2012 Cloudera, Inc. All rights reserved.

 

(Shell build version: Impala Shell v1.2.4 (ac29ae0) built on Wed Mar  5 07:05:40 PST 2014)

[cup-slave-11:21000] > show databases;

Query: show databases

+---------+

| name    |

+---------+

| cloudup |

| default |

| xhyt    |

+---------+

Returned 3 row(s) in 0.01s

[cup-slave-11:21000] > use cloudup;

Query: use cloudup

[cup-slave-11:21000] > select * from url_read_typ_rel limit 5;

Query: select * from url_read_typ_rel limit 5

+----------------------+---------+---------+---------+---------+--------+-----+

| urlhash              | rtidlv1 | rtyplv1 | rtidlv2 | rtyplv2 | isttim | url |

+----------------------+---------+---------+---------+---------+--------+-----+

| 2160609062987073557  | 3       | 股票    | NULL    |         | NULL   |     |

| 8059679893178527423  | 3       | 股票    | NULL    |         | NULL   |     |

| -404610021015528651  | 2       | 房产    | NULL    |         | NULL   |     |

| -6322366252916938780 | 5       | 教育    | NULL    |         | NULL   |     |

| -6821513749785855580 | 12      | 游戏    | NULL    |         | NULL   |     |

+----------------------+---------+---------+---------+---------+--------+-----+

Returned 5 row(s) in 0.61s

 

常见异常:

异常1:

在启停state-store的时候会报错:

[root@cup-master-1 ~]# service impala-state-store start

/etc/init.d/impala-state-store: line 35: /etc/default/hadoop: No such file or directory

Starting Impala State Store Server:[  OK  ]

解决方法:

impala多个启动文件中有执行/etc/default/hadoop的操作,但实际上我们并没有此文件,此异常提示没有实质影响,可忽略。

异常2:

启动impala-server服务的时候会报错(错误日志在目录/var/log/impala下面):

ERROR: short-circuit local reads is disabled because

  - Impala cannot read or execute the parent directory of dfs.domain.socket.path

  - dfs.client.read.shortcircuit is not enabled.

ERROR: block location tracking is not properly enabled because

  - dfs.client.file-block-storage-locations.timeout is too low. It should be at least 3000.

解决方法:

确保在hdfs-site.xml文件配置了以下参数即可:

dfs.client.read.shortcircuit、

dfs.domain.socket.path、

dfs.datanode.hdfs-blocks-metadata.enabled、

dfs.client.use.legacy.blockreader.local、

dfs.datanode.data.dir.perm、

dfs.block.local-path-access.user、

dfs.client.file-block-storage-locations.timeout

异常3:

启动impala-state-store start报错:

java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory

        at com.cloudera.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:51)

        at com.cloudera.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:41)

/*编者注:此处省略若干信息*/

Caused by: javax.jdo.JDOFatalUserException: Class datanucleus.jdo.JDOPersistenceManagerFactory was not found.

NestedThrowables:

java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory

Caused by: java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory

        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

        at java.security.AccessController.doPrivileged(Native Method)

        at java.net.URLClassLoader.findClass(URLClassLoader.java:205) javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1155)

解决方法:

这是由于/usr/lib/impala/lib目录下的datanucleus相关软件包跟$HIVE_HOME/lib目录下的版本不一致,需要将$HIVE_HOME/lib目录下的datanucleus相关文件替换到/usr/lib/impala/lib目录,同时修改文件名称与原来/usr/lib/impala/lib中的一样(因为有些配置文件中写明了文件名)。

异常4:

如果这两个目录的所有者不是运行impala的用户,在启动会报错:

[root@cup-slave-11 impala]# service impala-server start

/etc/init.d/impala-server: line 35: /etc/default/hadoop: No such file or directory

Starting Impala Server:[  OK  ]

/bin/bash: /var/log/impala/impala-server.log: Permission denied

解决方法:

将/var/run/hadoop-hdfs和/var/log/impala目录的所有权赋给cup账户和cup用户组,同时确保/etc/init.d/impala-state-store、/etc/init.d/impala-server、/etc/init.d/impala-catalog三个文件中的用户和用户组配置为cup用户。

异常5:

启动impala-catalog服务的时候报错:

E0327 16:02:46.283989 45718 Log4JLogger.java:115] Bundle "org.datanucleus.api.jdo" requires "org.datanucleus" version "3.2.0.m4" but the resolved bundle has version "3

.2.1" which is outside the expected range.

解决方法:

根据错误描述,将/usr/lib/impala/lib目录下的datanucleus-api-jdo-3.2.1.jar文件名称改为datanucleus-api-jdo-3.2.0.m4.jar,问题解决。

 

----end

本文连接:http://www.cnblogs.com/chenz/articles/3629698.html

作者:chenzheng

联系:vinkeychen@gmail.com

posted @ 2014-03-27 23:27 Mr.chenz 阅读(...) 评论(...)  编辑 收藏