第二章 impala基础使用

第二章 impala基本使用

1、impala的使用

1.1、impala-shell语法

1.1.1、impala-shell的外部命令参数语法

不需要进入到impala-shell交互命令行当中即可执行的命令参数

impala-shell后面执行的时候可以带很多参数：

-h 查看帮助文档

impala-shell -h

[root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -h
Usage: impala_shell.py [options]

Options:
  -h, --help            show this help message and exit
  -i IMPALAD, --impalad=IMPALAD
                        <host:port> of impalad to connect to
                        [default: node03.hadoop.com:21000]
  -q QUERY, --query=QUERY
                        Execute a query without the shell [default: none]
  -f QUERY_FILE, --query_file=QUERY_FILE
                        Execute the queries in the query file, delimited by ;.
                        If the argument to -f is "-", then queries are read
                        from stdin and terminated with ctrl-d. [default: none]
  -k, --kerberos        Connect to a kerberized impalad [default: False]
  -o OUTPUT_FILE, --output_file=OUTPUT_FILE
                        If set, query results are written to the g

-r 刷新整个元数据，数据量大的时候，比较消耗服务器性能

impala-shell -r

#结果
[root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -r
Starting Impala Shell without Kerberos authentication
Connected to node03.hadoop.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
Invalidating Metadata
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)

The HISTORY command lists all shell commands in chronological order.
***********************************************************************************
+==========================================================================+
| DEPRECATION WARNING:                                                     |
| -r/--refresh_after_connect is deprecated and will be removed in a future |
| version of Impala shell.                                                 |
+==========================================================================+
Query: invalidate metadata
Query submitted at: 2019-08-22 14:45:28 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=ce4db858e1dfd774:814fabac00000000
Fetched 0 row(s) in 5.04s

-B 去格式化，查询大量数据可以提高性能
--print_header 去格式化显示列名
--output_delimiter 指定分隔符
-v 查看对应版本

impala-shell -v -V

#结果
[root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -v -V
Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018

-f 执行查询文件
--query_file 指定查询文件

cd /export/servers
vim impala-shell.sql

#写入下面两段话
use weblog;
select * from ods_click_pageviews limit 10;

#赋予可执行权限
chmod 755 imapala-shell.sql 

#通过-f 参数来执行执行的查询文件
impala-shell -f impala-shell.sql

#结果
[root@node03 hivedatas]# impala-shell -f imapala-shell.sql 
Starting Impala Shell without Kerberos authentication
Connected to node03.hadoop.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
Query: use hivesql
Query: select * from ods_click_pageviews limit 10
Query submitted at: 2019-08-22 15:29:54 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=6a4d51930cf99b9d:21f02c4e00000000
+--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+
| session                              | remote_addr     | remote_user | time_local          | request                    | visit_step | page_staylong | http_referer                                                                                                                                                                                                                                                                                                                    | http_user_agent                                                                                                                                                                                   | body_bytes_sent | status | datestr  |
+--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+
| d1328698-d475-4973-86ee-15ad9da8c860 | 1.80.249.223    | -           | 2013-09-18 07:57:33 | /hadoop-hive-intro/        | 1          | 60            | "http://www.google.com.hk/url?sa=t&rct=j&q=hive%E7%9A%84%E5%AE%89%E8%A3%85&source=web&cd=2&ved=0CC4QFjAB&url=%68%74%74%70%3a%2f%2f%62%6c%6f%67%2e%66%65%6e%73%2e%6d%65%2f%68%61%64%6f%6f%70%2d%68%69%76%65%2d%69%6e%74%72%6f%2f&ei=5lw5Uo-2NpGZiQfCwoG4BA&usg=AFQjCNF8EFxPuCMrm7CvqVgzcBUzrJZStQ&bvm=bv.52164340,d.aGc&cad=rjt" | "Mozilla/5.0(WindowsNT5.2;rv:23.0)Gecko/20100101Firefox/23.0"                                                                                                                                     | 14764           | 200    | 20130918 |
| 0370aa09-ebd6-4d31-b6a5-469050a7fe61 | 101.226.167.201 | -           | 2013-09-18 09:30:36 | /hadoop-mahout-roadmap/    | 1          | 60            | "http://blog.fens.me/hadoop-mahout-roadmap/"

-i 连接到impalad

--impalad 指定impalad去执行任务

-o 保存执行结果到文件当中去

--output_file 指定输出文件名

impala-shell -f impala-shell.sql -o fizz.txt

#结果
[root@node03 hivedatas]# impala-shell -f imapala-shell.sql -o fizz.txt
Starting Impala Shell without Kerberos authentication
Connected to node03.hadoop.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
Query: use hivesql
Query: select * from ods_click_pageviews limit 10
Query submitted at: 2019-08-22 15:31:45 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=7c421ab5d208f3b1:dec5a09300000000
Fetched 10 row(s) in 0.13s

#当前文件夹多了一个 fizz.txt 文件
[root@node03 hivedatas]# ll
total 2592
-rw-r--r-- 1 root root     511 Aug 21  2017 dim_time_dat.txt
-rw-r--r-- 1 root root    9926 Aug 22 15:31 fizz.txt
-rwxr-xr-x 1 root root      57 Aug 22 15:29 imapala-shell.sql
-rwxrwxrwx 1 root root     133 Aug 20 00:36 movie.txt
-rw-r--r-- 1 root root   18372 Jun 17 18:33 pageview2
-rwxr-xr-x 1 root root     154 Aug 20 00:32 test.txt
-rw-r--r-- 1 root root     327 Aug 20 02:37 user_table
-rw-r--r-- 1 root root   10361 Jun 18 09:00 visit2
-rw-r--r-- 1 root root 2587511 Jun 17 18:05 weblog2

-p 显示查询计划

impala-shell -f impala-shell.sql -p

-q 执行片段sql语句

impala-shell -q "use hivesql;select * from ods_click_pageviews limit 10;"

[root@node03 hivedatas]# impala-shell -q "use hivesql;select * from ods_click_pageviews limit 10;"
Starting Impala Shell without Kerberos authentication
Connected to node03.hadoop.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
Query: use hivesql
Query: select * from ods_click_pageviews limit 10
Query submitted at: 2019-08-22 15:36:58 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=b443d56565419f60:a149235700000000
+--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+
| session                              | remote_addr     | remote_user | time_local          | request                    | visit_step | page_staylong | http_referer                                                                                                                                                                                                                                                                                                                    | http_user_agent                                                                                                                                                                                   | body_bytes_sent | status | datestr  |

1.1.2、impala-shell的内部命令行参数语法

进入impala-shell命令行之后可以执行的语法

进入impala-shell：

impala-shell  #任意目录

#结果
[root@node03 hivedatas]# impala-shell
Starting Impala Shell without Kerberos authentication
Connected to node03.hadoop.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)

To see more tips, run the TIP command.
***********************************************************************************
[node03.hadoop.com:21000] >

help命令

帮助文档

[node03.hadoop.com:21000] > help;

Documented commands (type help <topic>):
========================================
compute  describe  explain  profile  rerun   set    show  unset  values   with
connect  exit      history  quit     select  shell  tip   use    version

Undocumented commands:
======================
alter   delete  drop  insert  source  summary  upsert
create  desc    help  load    src     update

connect命令

connect hostname 连接到某一台机器上面去执行

connect node02;

#结果
[node03.hadoop.com:21000] > connect node02;
Connected to node02:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
[node02:21000] >

refresh命令

refresh dbname.tablename 增量刷新，刷新某一张表的元数据，主要用于刷新hive当中数据表里面的数据改变的情况

用于刷新hive当中数据表里面的数据改变的情况

refresh movie_info;

#结果
[node03:21000] > refresh movie_info;
Query: refresh movie_info
Query submitted at: 2019-08-22 15:49:24 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=f74330d533ff2402:27364f7600000000
Fetched 0 row(s) in 0.27s

invalidate metadata 命令：

invalidate metadata全量刷新，性能消耗较大，主要用于hive当中新建数据库或者数据库表的时候来进行刷新

invalidate metadata;

#结果
[node03:21000] > invalidate metadata;
Query: invalidate metadata
Query submitted at: 2019-08-22 15:48:04 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=6a431748d41bc369:7eeb053400000000
Fetched 0 row(s) in 2.87s

explain 命令：

用于查看sql语句的执行计划

explain select * from stu;

#结果
[node03:21000] > explain select * from user_table;
Query: explain select * from user_table
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B                                       |
| Per-Host Resource Estimates: Memory=32.00MB                                        |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| hivesql.user_table                                                                 |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 01:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 00:SCAN HDFS [hivesql.user_table]                                                  |
|    partitions=1/1 files=1 size=327B                                                |
+------------------------------------------------------------------------------------+
Fetched 11 row(s) in 3.99s

explain的值可以设置成0,1,2,3等几个值，其中3级别是最高的，可以打印出最全的信息

set explain_level=3;

#结果
[node03:21000] > set explain_level=3;
EXPLAIN_LEVEL set to 3
[node03:21000] >

profile命令：

执行sql语句之后执行，可以打印出更加详细的执行步骤，

主要用于查询结果的查看，集群的调优等

select * from user_table;
profile;

#部分结果截取
[node03:21000] > profile;
Query Runtime Profile:
Query (id=ff4799938b710fbb:7997836800000000):
  Summary:
    Session ID: a14d3b3894050309:7f300ddf8dcd8584
    Session Type: BEESWAX
    Start Time: 2019-08-22 15:58:22.786612000
    End Time: 2019-08-22 15:58:24.558806000
    Query Type: QUERY
    Query State: FINISHED
    Query Status: OK
    Impala Version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
    User: root
    Connected User: root
    Delegated User: 
    Network Address: ::ffff:192.168.52.120:48318
    Default Db: hivesql
    Sql Statement: select * from user_table
    Coordinator: node03.hadoop.com:22000
    Query Options (set by configuration): EXPLAIN_LEVEL=3
    Query Options (set by configuration and planner): EXPLAIN_LEVEL=3,MT_DOP=0
    Plan:

注意:在hive窗口当中插入的数据或者新建的数据库或者数据库表，在impala当中是不可直接查询到的，需要刷新数据库，在impala-shell当中插入的数据，在impala当中是可以直接查询到的，不需要刷新数据库，其中使用的就是catalog这个服务的功能实现的，catalog是impala1.2版本之后增加的模块功能，主要作用就是同步impala之间的元数据

1.2、创建数据库

1.1.1进入impala交互窗口

impala-shell #进入到impala的交互窗口

1.1.2查看所有数据库

show databases;

1.1.3创建与删除数据库

创建数据库

CREATE DATABASE IF NOT EXISTS mydb1;
drop database  if exists  mydb;

1.3、创建数据库表

创建student表

CREATE TABLE IF NOT EXISTS mydb1.student (name STRING, age INT, contact INT );

创建employ表

create table employee (Id INT, name STRING, age INT,address STRING, salary BIGINT);

1.3.1、数据库表中插入数据

insert into employee (ID,NAME,AGE,ADDRESS,SALARY)VALUES (1, 'Ramesh', 32, 'Ahmedabad', 20000 );
insert into employee values (2, 'Khilan', 25, 'Delhi', 15000 );
Insert into employee values (3, 'kaushik', 23, 'Kota', 30000 );
Insert into employee values (4, 'Chaitali', 25, 'Mumbai', 35000 );
Insert into employee values (5, 'Hardik', 27, 'Bhopal', 40000 );
Insert into employee values (6, 'Komal', 22, 'MP', 32000 );

数据的覆盖

Insert overwrite employee values (1, 'Ram', 26, 'Vishakhapatnam', 37000 );

执行覆盖之后，表中只剩下了这一条数据了

另外一种建表语句

create table customer as select * from employee;

1.3.2、数据的查询

select * from employee;
select name,age from employee;

1.3.3、删除表

DROP table  mydb1.employee;

1.3.4、清空表数据

truncate  employee;

1.3.5、创建视图

CREATE VIEW IF NOT EXISTS employee_view AS select name, age from employee;

1.3.6、查看视图数据

select * from employee_view;

1.4、order by语句

基础语法

select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]
Select * from employee ORDER BY id asc;

1.5、group by 语句

Select name, sum(salary) from employee Group BY name;

1.6、 having 语句

基础语法

select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]

按年龄对表进行分组，并选择每个组的最大工资，并显示大于20000的工资

select max(salary) from employee group by age having max(salary) > 20000

1.7、 limit语句

select * from employee order by id limit 4;

2、impala当中的数据表导入几种方式

第一种方式，通过load hdfs的数据到impala当中去

create table user(id int ,name string,age int ) row format delimited fields terminated by "\t";

准备数据user.txt并上传到hdfs的 /user/impala路径下去

上传user.txt到hadoop上去：

hdfs dfs -put user.txt /user/impala/

查看是否上传成功：

hdfs dfs -ls /user/impala

1       kasha   15
2       fizz        20
3       pheonux    30
4       manzi  50

加载数据

load data inpath '/user/impala/' into table user;

查询加载的数据

select  *  from  user;

如果查询不不到数据，那么需要刷新一遍数据表

refresh  user;

第二种方式：

create  table  user2   as   select * from  user;

第三种方式：

insert  into  #不推荐使用 因为会产生大量的小文件

千万不要把impala当做一个数据库来使用

第四种方式：

insert  into  select  #用的比较多

posted on 2019-08-27 00:15 -小鱼- 阅读(1514) 评论(0) 收藏举报

刷新页面返回顶部

第二章 impala基础使用

第二章 impala基本使用

1、impala的使用

1.1、impala-shell语法

1.1.1、impala-shell的外部命令参数语法

1.1.2、impala-shell的内部命令行参数语法

help命令

connect命令

refresh命令

invalidate metadata 命令：

explain 命令：

profile命令：

1.2、创建数据库

1.1.1进入impala交互窗口

1.1.2查看所有数据库

1.1.3创建与删除数据库

1.3、创建数据库表

1.3.1、数据库表中插入数据

1.3.2、数据的查询

1.3.3、删除表

1.3.4、清空表数据

1.3.5、创建视图

1.3.6、查看视图数据

1.4、order by语句

1.5、group by 语句

1.6、 having 语句

1.7、 limit语句

2、impala当中的数据表导入几种方式

导航

公告

第二章 impala基础使用

第二章 impala基本使用

1、impala的使用

1.1、impala-shell语法

1.1.1、impala-shell的外部命令参数语法

1.1.2、impala-shell的内部命令行参数语法

help命令

connect命令

refresh命令

invalidate metadata 命令：

explain 命令：

profile命令：

1.2、创建数据库

1.1.1进入impala交互窗口

1.1.2查看所有数据库

1.1.3创建与删除数据库

1.3、 创建数据库表

1.3.1、 数据库表中插入数据

1.3.2、 数据的查询

1.3.3、 删除表

1.3.4、 清空表数据

1.3.5、 创建视图

1.3.6、 查看视图数据

1.4、order by语句

1.5、group by 语句

1.6、 having 语句

1.7、 limit语句

2、impala当中的数据表导入几种方式

导航

公告

1.3、创建数据库表

1.3.1、数据库表中插入数据

1.3.2、数据的查询

1.3.3、删除表

1.3.4、清空表数据

1.3.5、创建视图

1.3.6、查看视图数据