impala - jeasonchen001

1.impala的介绍

1.impala是cloudera公司提供的一个查询工具。通过sql进行数据的查询。impala和hive是紧耦合。impala的			查询速度要比hive高出3--10倍。它摒弃了mapreduce，通过C来快速的完成数据的检索查询。

	impala是基于内存进行运算。

2.impala和hive的关系

	impala和hive是紧耦合的关系，必须要有hive，impala才能完成数据的查询。

	原因是impala查询是通过sql进行，需要表的描述。

	前提：需要安装hive，并且要启动hive中的metastore服务。	

3.优点：

	1)基于内存进行查询，速度快。

	2）摒弃默认，改用c完成数据的查询，通过短路读取的原则（hadoop提供）

	3）具有数据仓库的特性

	4）支持jdbc odbc

缺点：

	1）依赖于内存

	2）用c进行编写，维护难度大

	3）与hive紧耦合

	4.impala的架构

1)impala-server:impalad服务，是一个从节点。用于生成查询计划，并执行查询。

				建议将impalad服务放到和datanode相同的节点。

2）impala state store：主节点，用于存储和管理impala执行sql的状态

3）impala catalog：主节点，用于存储元数据信息，和hive的元数据是同步。

impalad生成的计划分为两个阶段：

	frontend和backend：

	frontend：用于生成查询计划。

	backend：用于执行查询计划。

impala的执行过程：

	1）impalad向state store：进行注册

	2）client提交sql，向impalad提交任务。

	3）impalad接收到client的sql查询，先生成单机查询计划，再将计划分发到其他的节点，在集群中进行查询。

	4）查询的结果，进行一个汇总，将结果返回给客户端。

2.impala的安装

	启动metastore服务：

	建议先启动，hiveserver2 然后在启动metasore服务。

	如果启动hiveserver2 后，metasore服务起不来。

	我们调换一下启动顺序，先启动metasore，在启动hiveserver2 。

	

其他安装步骤详见文档。

3.impala的使用

1.impala的shell语法

Options:

-h, --help show this help message and exit

-i IMPALAD, --impalad=IMPALAD

                    host:port of impalad to connect to

                    [default: node03.hadoop.com:21000]

-q QUERY, --query=QUERY

                    Execute a query without the shell [default: none]

-f QUERY_FILE, --query_file=QUERY_FILE

                    Execute the queries in the query file, delimited by ;.

                    If the argument to -f is "-", then queries are read

                    from stdin and terminated with ctrl-d. [default: none]

-k, --kerberos Connect to a kerberized impalad [default: False]

-o OUTPUT_FILE, --output_file=OUTPUT_FILE

                    If set, query results are written to the given file.

                    Results from multiple semicolon-terminated queries

                    will be appended to the same file [default: none]

-B, --delimited Output rows in delimited mode [default: False]

--print_header Print column names in delimited mode when pretty-

                    printed. [default: False]

--output_delimiter=OUTPUT_DELIMITER

                    Field delimiter to use for output in delimited mode

                    [default: \t]

-s KERBEROS_SERVICE_NAME, --kerberos_service_name=KERBEROS_SERVICE_NAME

                    Service name of a kerberized impalad [default: impala]

-V, --verbose Verbose output [default: True]

-p, --show_profiles Always display query profiles after execution

                    [default: False]

--quiet Disable verbose output [default: False]

-v, --version Print version information [default: False]

-c, --ignore_query_failure

                    Continue on query failure [default: False]

-r, --refresh_after_connect

                    Refresh Impala catalog after connecting

                    [default: False]

-d DEFAULT_DB, --database=DEFAULT_DB

                    Issues a use database command on startup

                    [default: none]

-l, --ldap Use LDAP to authenticate with Impala. Impala must be

                    configured to allow LDAP authentication.

                    [default: False]

-u USER, --user=USER User to authenticate with. [default: root]

--ssl Connect to Impala via SSL-secured connection

                    [default: False]

--ca_cert=CA_CERT Full path to certificate file used to authenticate

                    Impala's SSL certificate. May either be a copy of

                    Impala's certificate (for self-signed certs) or the

                    certificate of a trusted third-party CA. If not set,

                    but SSL is enabled, the shell will NOT verify Impala's

                    server certificate [default: none]

--config_file=CONFIG_FILE

                    Specify the configuration file to load options. The

                    following sections are used: [impala],

                    [impala.query_options]. Section names are case

                    sensitive. Specifying this option within a config file

                    will have no effect. Only specify this as an option in

                    the commandline. [default: /root/.impalarc]

--live_summary Print a query summary every 1s while the query is

                    running. [default: False]

--live_progress Print a query progress every 1s while the query is

                    running. [default: False]

--auth_creds_ok_in_clear

                    If set, LDAP authentication may be used with an

                    insecure connection to Impala. WARNING: Authentication

                    credentials will therefore be sent unencrypted, and

                    may be vulnerable to attack. [default: none]

--ldap_password_cmd=LDAP_PASSWORD_CMD

                    Shell command to run to retrieve the LDAP password

                    [default: none]

--var=KEYVAL Defines a variable to be used within the Impala

                    session. Can be used multiple times to set different

                    variables. It must follow the pattern "KEY=VALUE", KEY

                    starts with an alphabetic character and contains

                    alphanumeric characters or underscores. [default:

                    none]

-Q QUERY_OPTIONS, --query_option=QUERY_OPTIONS

                    Sets the default for a query option. Can be used

                    multiple times to set different query options. It must

                    follow the pattern "KEY=VALUE", KEY must be a valid

                    query option. Valid query options  can be listed by

                    command 'set'. [default: none]	



1）impala-shell  -i   node02  连接到指定节点终端

2）impala-shell-q "select *from myhive.course;"  执行一个sql查询；

3）impala-shell -f  impala.sql       执行一个文件的查询

4）impala-shell -r			 刷新元数据

5)   impala-shell  -f  impala.sql  -o  a.txt 将查询结果保存到指定文件中

impala-shell的内部命令

refresh命令：刷新一张表的元数据。

invalidate  metadata：内部命令，用于刷新全部的元数据。

数据库的同步：

1）在hive中创建的表，在impala中不能看到，需要刷新元数据。

2）在impala的创建的表，在hive中可一直接查看。不需要刷新元数据。

impala中的查询：

1）view的操作：最主要的作用就是为了方便查询检索。

	create view as select 字段 from  表。

impala的数据加载方式：

1）load的方式：impala中load的方式没有本地加载，只能从hdfs上加载数据。

2.HUE的介绍

HUE=Hadoop User Experience

HUE提供一个可视化界面，将多个大数据的框架进行整合，操作方便。

posted on 2019-08-04 23:18 jeasonchen001 阅读(957) 评论(0) 收藏举报