Apache Phoenix基本操作-1

Posted on 2017-08-02 15:36 Aaron-Mhs 阅读(1377) 评论(1) 收藏举报

本篇我们将介绍phoenix的一些基本操作。

1. 如何使用Phoenix输出Hello World？

1.1 使用sqlline终端命令

sqlline.py SZB-L0023780:2181:/hbase114

0:jdbc:phoenix:SZB-L0023780:2181:/hbase114> create table test (mykey integernot null primary key, mycolumn varchar);

0:jdbc:phoenix:SZB-L0023780:2181:/hbase114> upsert into test values(1,'Hello');

0:jdbc:phoenix:SZB-L0023780:2181:/hbase114> upsert into test values(2,'World!');

0:jdbc:phoenix:SZB-L0023780:2181:/hbase114> select * from test;

+--------------+---------------------+

| MYKEY | MYCOLUMN |

+--------------+---------------------+

| 1 |Hello |

| 2 | World! |

+---------------+---------------------+

1.2 使用Java方式访问

创建test.java文件，内容如下：

import java.sql.Connection;

import java.sql.DriverManager;

import java.sql.ResultSet;

import java.sql.SQLException;

import java.sql.PreparedStatement;

import java.sql.Statement;

public class test2 {

public static void main(String[] args) throws SQLException {

Statement stmt = null;

ResultSet rset = null;

Connection con = DriverManager.getConnection("jdbc:phoenix:SZB-L0023780:2181:/hbase114");

stmt= con.createStatement();

stmt.executeUpdate("create table test2 (mykey integer not null primary key, mycolumn varchar)");

stmt.executeUpdate("upsert into test2 values (1,'Hello')");

stmt.executeUpdate("upsert into test2 values (2,'World!')");

con.commit();

PreparedStatement statement = con.prepareStatement("select * from test2");

rset= statement.executeQuery();

while(rset.next()) {

System.out.println(rset.getString("mycolumn"));

}

statement.close();

con.close();

}

编译：

javac test2.java

执行编译好的程序：

java -cp"../phoenix-4.8.0-Hbase-1.1-client.jar:." test2

输出结果：

Hello

World!

2. 如何通过Phoenix批量加载数据

Phoenix提供了两种方法用来加载CSV数据到Phoenix 表中，一种是通过psql命令，单线程方式加载；另一种是基于MapReduce批量加载方式。

psql方式适合几十MB的数据量，而基于MapReduce的方式适合更大的数据量加载。

下面我们来演示一下通过这两种方式加载CSV格式的数据到Phoenix表中。

(1)样例数据data.csv

12345,John,Doe

67890,Mary,Poppins

(2)创建表SQL

CREATE TABLE example (

my_pk bigint not null,

m.first_name varchar(50),

m.last_name varchar(50)

CONSTRAINT pk PRIMARY KEY(my_pk)

);

(3)通过psql方式加载

bin/psql.py -t EXAMPLE SZB-L0023780:2181:/hbase114 data.csv

psql.py使用的示例如下：

Examples:

psql my_ddl.sql

psql localhost my_ddl.sql

psql localhost my_ddl.sql my_table.csv

psql -t MY_TABLE my_cluster:1825 my_table2012-Q3.csv

psql -t MY_TABLE -h COL1,COL2,COL3 my_cluster:1825 my_table2012-Q3.csv

psql -t MY_TABLE -h COL1,COL2,COL3 -d : my_cluster:1825 my_table2012-Q3.csv

下面将一些参数说明一下：

*Parameter*	*Description*
-t	加载数据的表名，默认为CSV文件名称，大小写敏感
-h	Overrides the column names to which the CSV data maps and is case sensitive. A special value of in-line indicating that the first line of the CSV file determines the column to which the data maps.
-s	Run in strict mode, throwing an error on CSV parsing errors
-d	Supply a custom delimiter or delimiters for CSV parsing
-q	Supply a custom phrase delimiter, defaults to double quote character
-e	Supply a custom escape character, default is a backslash
-a	Supply an array delimiter (explained in more detail below)

(4)通过MapReduce来加载数据

对于分布式集群更高吞吐量数据加载，建议使用MapReduce加载方式。这种方式首先将数据写入HFile中，等HFile创建好之后就写入到hbase表中。

MapReduce加载器是使用Hadoop命令，然后借助Phoenix的Client的Jar实现的，如下：

hadoop jar phoenix-<version>-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool--table EXAMPLE --input /data/example.csv

这里需要注意的是，输入的文件必须是HDFS上的文件，不是本地文件系统上的。

比如我在环境里面执行如下；

hadoop jar phoenix-4.8.0-HBase-1.1-client.jarorg.apache.phoenix.mapreduce.CsvBulkLoadTool --table EXAMPLE --input /okok/data.csv-z SZB-L0023780:2181:/hbase114

执行部分日志如下：

mapreduce.AbstractBulkLoadTool: LoadingHFiles from /tmp/94b60a06-86d8-49d7-a8d1-df5428971a33

mapreduce.AbstractBulkLoadTool: LoadingHFiles for EXAMPLE from /tmp/94b60a06-86d8-49d7-a8d1-df5428971a33/EXAMPLE

mapreduce.LoadIncrementalHFiles: Trying toloadhfile=hdfs://SZB-L0023776:8020/tmp/94b60a06-86d8-49d7-a8d1-df5428971a33/EXAMPLE/M/b456b2a2a5834b32aa8fb3463d3bfd76first=\x80\x00\x00\x00\x00\x0009 last=\x80\x00\x00\x00\x00\x01\x092

下面我们将MapReduce加载器常用的参数罗列一下：

*Parameter*	*Description*
-i,–input	Input CSV path (mandatory)
-t,–table	Phoenix table name (mandatory)
-a,–array-delimiter	Array element delimiter (optional)
-c,–import-columns	Comma-separated list of columns to be imported
-d,–delimiter	Input delimiter, defaults to comma
-g,–ignore-errors	Ignore input errors
-o,–output	Output path for temporary HFiles (optional)
-s,–schema	Phoenix schema name (optional)
-z,–zookeeper	Zookeeper quorum to connect to (optional)
-it,–index-table	Index table name to load (optional)

注：

psql.py这种方式典型的upsert效率为每秒20k-50k行（依赖每行的大小）。

使用方法如下：

使用psql创建表：

psql.py [zookeeper] ../examples/web_stat.sql

使用psql批量upsert CSV格式的数据：

psql.py [zookeeper] ../examples/web_stat.csv

刷新页面返回顶部

累吗？累就对了，舒服是留给死人的...

公告

Apache Phoenix基本操作-1