Sqoop安装与应用过程



 

1.  参考说明

参考文档:

 

http://sqoop.apache.org/

 

http://sqoop.apache.org/docs/1.99.7/admin/Installation.html

 

2.  安装环境说明

2.1.  环境说明

 

CentOS7.4+ Hadoop2.7.5的伪分布式环境

 

主机名

NameNode

SecondaryNameNode

DataNodes

centoshadoop.smartmap.com

192.168.1.80

192.168.1.80

192.168.1.80

 

 

 

 

 

Hadoop的安装目录为:/opt/hadoop/hadoop-2.7.5

 

3.  安装

Sqoop由两部分组成:客户端(client)和服务端(server)。需要在集群的其中某个节点上安装server,该节点的服务端可以作为其他Sqoop客户端的入口点。在服务端的节点上必须安装有Hadoop。客户端可以安装在任意数量的机子上。在装有客户端的机子上不需要安装Hadoop

3.1.  Sqoop下载

http://sqoop.apache.org/

 

3.2.  Sqoop解压

 

将下载的sqoop-1.99.7-bin-hadoop200.tar.gz解压到/opt/hadoop/sqoop-1.99.7目录下

 

4.  配置

4.1.  修改Hadoop配置

4.1.1.  修改core-site.xml文件

[root@centoshadoop hadoop]# vi /opt/hadoop/hadoop-2.7.5/etc/hadoop/core-site.xml

 

<property>

       <name>hadoop.proxyuser.sqoop2.hosts</name>

       <value>*</value>

</property>

<property>

       <name>hadoop.proxyuser.sqoop2.groups</name>

       <value>*</value>

</property>

<property>

       <name>hadoop.proxyuser.root.hosts</name>

       <value>*</value>

</property>

<property>

       <name>hadoop.proxyuser.root.groups</name>

       <value>*</value>

</property>

 

4.1.2.  修改container-executor.cfg文件

[root@centoshadoop hadoop]# vi /opt/hadoop/hadoop-2.7.5/etc/hadoop/container-executor.cfg

 

allowed.system.users=sqoop2,hadoop,root

 

4.1.3.  修改yarn-site.xml文件

 

[root@centoshadoop hadoop]# vi /opt/hadoop/hadoop-2.7.5/etc/hadoop/yarn-site.xml

 

4.2.  创建第三方数据库驱动目录

 

[root@centoshadoop sqoop-1.99.7]# mkdir /opt/hadoop/sqoop-1.99.7/jdbcDriver

 

[root@centoshadoop sqoop-1.99.7]# chmod g+rwx,o+rwx /opt/hadoop/sqoop-1.99.7/jdbcDriver

 

4.3.  上传驱动到存放JDBC驱动的目录下

上传软件包mysql-connector-java-5.1.46.jar/opt/hadoop/sqoop-1.99.7/jdbcDriver目录下

 

4.4.  修改profile文件

vi /etc/profile

 

# export JAVA_HOME=/opt/java/jdk1.7.0_45

export JAVA_HOME=/opt/java/jdk1.8.0_171

# export CLASSPATH=.:$JAVA_HOME/lib

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.5

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

 

export PIG_HOME=/opt/hadoop/pig-0.17.0

export HIVE_HOME=/opt/hadoop/hive-2.3.3

 

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export PATH=$PATH:$PIG_HOME/bin

export PATH=$PATH:$HIVE_HOME/bin

 

export SQOOP_HOME=/opt/hadoop/sqoop-1.99.7

export PATH=$PATH:$SQOOP_HOME/bin

export CLASSPATH=$CLASSPATH:$SQOOP_HOME/server/lib:$SQOOP_HOME/shell/lib:$SQOOP_HOME/tools/lib

export SQOOP_SERVER_EXTRA_LIB=/opt/hadoop/sqoop-1.99.7/jdbcDriver/

 

export CLASSPATH=.:$JAVA_HOME/lib:$HIVE_HOME/lib

export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:/usr/local/lib:$HADOOP_HOME/lib/native

export JAVA_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH

export HADOOP_USER_NAME=hadoop

 

4.5.  JDK升级为1.8版本

 

JDK切换成1.8的版本,并修改所有与JAVA_HOME相关的变量

 

4.6.  修改Sqoop的配置文件

 

[root@centoshadoop bin]# vi /opt/hadoop/sqoop-1.99.7/conf/sqoop.properties

 

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/hadoop-2.7.5/etc/hadoop/

 

 

[root@centoshadoop bin]# vi /opt/hadoop/sqoop-1.99.7/bin/sqoop.sh

 

export HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common

export HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs

export HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce

export HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn

 

5.  启动Hadoop

 

5.1.  启动YARNHDFS

cd /opt/hadoop/hadoop-2.7.5/sbin

 

start-all.sh

 

5.2.  启动historyserver

 

cd /opt/hadoop/hadoop-2.7.5/sbin

 

mr-jobhistory-daemon.sh start historyserver

 

6.  验证Sqoop安装与配置

 

[root@centoshadoop sqoop-1.99.7]# sqoop2-tool verify

Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop home directory: /opt/hadoop/sqoop-1.99.7

Sqoop tool executor:

        Version: 1.99.7

        Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

        Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine

Running tool: class org.apache.sqoop.tools.tool.VerifyTool

1   [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.

6   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

Verification was successful.

Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

7.  初始化SqoopRepository

[root@centoshadoop sqoop-1.99.7]# sqoop2-tool upgrade

Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop home directory: /opt/hadoop/sqoop-1.99.7

Sqoop tool executor:

        Version: 1.99.7

        Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

        Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine

Running tool: class org.apache.sqoop.tools.tool.UpgradeTool

0   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.

[root@centoshadoop sqoop-1.99.7]#

 

 

8.  应用Sqoop工具

8.1.  服务端

8.1.1.  启动与关闭sqoop2-server

[root@centoshadoop bin]# cd /opt/hadoop/sqoop-1.99.7/bin/

[root@centoshadoop bin]# sqoop2-server start

Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop home directory: /opt/hadoop/sqoop-1.99.7

Starting the Sqoop2 server...

 

[root@centoshadoop bin]# sqoop2-server stop

Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop home directory: /opt/hadoop/sqoop-1.99.7

Stopping the Sqoop2 server...

Sqoop2 server stopped.

 

[root@centoshadoop bin]# jps

29780 NameNode

30085 SecondaryNameNode

30247 ResourceManager

31767 SqoopJettyServer

29913 DataNode

32107 Jps

30364 NodeManager

29245 JobHistoryServer

[root@centoshadoop bin]#

 

8.2.  客户端

8.2.1.  启动运行Sqoop的交互式Shell环境

 

[root@centoshadoop bin]# cd /opt/hadoop/sqoop-1.99.7/bin/

[root@centoshadoop bin]# sqoop2-shell

sqoop:000> :exit

 

8.2.2.  设置错误显示

sqoop:000> set option --name verbose --value true

Verbose option was changed to true

 

8.2.3.  连接Sqoop服务端

 

sqoop:000> set server -host 192.168.1.80

Server is set successfully

 

# set server --host ubuntu02 --port 12000 --webapp sqoop

 

sqoop:000> show version --all

client version:

 Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016

server version:

 Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016

API versions:

  [v1]

 

8.2.4.  创建MySQLLink对象

 

sqoop:000> show connector

+------------------------+---------+------------------------------------------------------------+----------------------+

|          Name          | Version |                           Class                            | Supported Directions |

+------------------------+---------+------------------------------------------------------------+----------------------+

| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |

| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |

| oracle-jdbc-connector  | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |

| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |

| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |

| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |

| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |

+------------------------+---------+------------------------------------------------------------+----------------------+

sqoop:000>

 

sqoop:000> create link -c generic-jdbc-connector

Creating link for connector with name generic-jdbc-connector

Please fill following values to create new link object

Name: mysql-link

 

Database connection

 

Driver class: com.mysql.jdbc.Driver

Connection String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false

Username: root

Password: ***

Fetch Size:

Connection Properties:

There are currently 0 values in the map:

entry#

SQL Dialect

 

Identifier enclose:       # 空格

New link was successfully created with validation status OK and name mysql-link

sqoop:000>

 

8.2.5.  创建HDFSLink对象

 

sqoop:000> create link -connector hdfs-connector

Creating link for connector with name hdfs-connector

Please fill following values to create new link object

Name: hdfs-link

 

HDFS cluster

 

URI: hdfs://192.168.1.80:9000

Conf directory:

Additional configs: /opt/hadoop/hadoop-2.7.5/etc/hadoop

There are currently 0 values in the map:

entry#

New link was successfully created with validation status OK and name hdfs-link

sqoop:000>

 

8.2.6.  显示Link对象

 

sqoop:000> show link --all

2 link(s) to show:

link with name mysql-link (Enabled: true, Created by root at 5/23/18 3:48 PM, Updated by root at 5/23/18 3:48 PM)

Using Connector generic-jdbc-connector with name {1}

 Database connection

   Driver class: com.mysql.jdbc.Driver

   Connection String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false

   Username: root

   Password:

   Fetch Size:

   Connection Properties:

     protocol = tcp

  SQL Dialect

   Identifier enclose: `

link with name hdfs-link (Enabled: true, Created by root at 5/23/18 4:00 PM, Updated by root at 5/23/18 4:00 PM)

Using Connector hdfs-connector with name {1}

  HDFS cluster

   URI: hdfs://192.168.1.80:9000

   Conf directory:

   Additional configs::

sqoop:000>

 

 

8.2.7.  创建MySQL导数据到HDFSJob

 

8.2.7.1. 准备数据

 

create database sqooptest character set utf8;

 

CREATE TABLE topic (

         id int(11) NOT NULL AUTO_INCREMENT,

         name varchar(50) DEFAULT NULL,

         address varchar(100) DEFAULT NULL,

         time datetime DEFAULT NULL,

         content varchar(500) DEFAULT NULL,

         PRIMARY KEY(id)

)ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=UTF8;

 

INSERT INTO topic(name, address, time, content) values('11', '111', '2018-05-01 08:00:00', 'hello world');

 

INSERT INTO topic(name, address, time, content) values('11', '111', '2018-05-01 08:00:00', NULL);

 

8.2.7.2. 创建任务(MySQLàHDFS

 

sqoop:000> create job -f mysql-link -t hdfs-link

Creating job for links with from name mysql-link and to name hdfs-link

Please fill following values to create new job object

Name:  mysql2hdfs

 

Database source

 

Schema name: sqooptest   #必填,数据库名称

Table name: topic  #必填,表名

SQL statement:

Column names:

There are currently 0 values in the list:

element#

Partition column: id

Partition column nullable:

Boundary query:

 

Incremental read

 

Check column:

Last value:

 

Target configuration

 

Override null value:

Null value:

File format:

  0 : TEXT_FILE

  1 : SEQUENCE_FILE

  2 : PARQUET_FILE

Choose: 0

Compression codec:

  0 : NONE

  1 : DEFAULT

  2 : DEFLATE

  3 : GZIP

  4 : BZIP2

  5 : LZO

  6 : LZ4

  7 : SNAPPY

  8 : CUSTOM

Choose: 0

Custom codec:

Output directory:/sqoop/sqooptest/mysqltable/    #必填

Append mode:

 

Throttling resources

 

Extractors:

Loaders:

 

Classpath configuration

 

Extra mapper jars:

There are currently 0 values in the list:

element#

New job was successfully created with validation status OK  and name  mysql2hdfs

sqoop:000>

 

8.2.7.3. 启动Job

 

[root@centoshadoop sbin]# hadoop fs –mkdir -p /sqoop/sqooptest/mysqltable

[root@centoshadoop sbin]# hadoop fs –chmod –R 777 /sqoop

 

sqoop:000> show job

+----+-------------------------+-------------------------------------+----------------------------+---------+

| Id |          Name           |           From Connector            |        To Connector        | Enabled |

+----+-------------------------+-------------------------------------+----------------------------+---------+

| 1  |  mysql2hdfs | mysql-link (generic-jdbc-connector) | hdfs-link (hdfs-connector) | true    |

+----+-------------------------+-------------------------------------+----------------------------+---------+

sqoop:000>

 

sqoop:000> set option --name verbose --value true

Verbose option was changed to true

sqoop:000> start job -name mysql2hdfs

Submission details

Job Name: mysql2hdfs

Server URL: http://localhost:12000/sqoop/

Created by: root

Creation date: 2018-05-24 08:57:14 CST

Lastly updated by: root

External ID: job_1527123411152_0002

        http://centoshadoop:8088/proxy/application_1527123411152_0002/

Source Connector schema: Schema{name=  sqooptest  .  topic  ,columns=[

        FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},

        Text{name=name,nullable=true,type=TEXT,charSize=null},

        Text{name=address,nullable=true,type=TEXT,charSize=null},

        Date{name=time,nullable=true,type=DATE_TIME,hasFraction=true,hasTimezone=false},

        Text{name=content,nullable=true,type=TEXT,charSize=null}]}

2018-05-24 08:57:14 CST: BOOTING  - Progress is not available

sqoop:000>

 

8.2.7.4. 查看结果

 

[root@centoshadoop sbin]# hadoop fs -ls /sqoop/sqooptest/mysqltable

Found 1 items

-rw-r--r--   1 root supergroup         97 2018-05-24 08:57 /sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt

[root@centoshadoop sbin]# hadoop fs -cat /sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt

3,'11','111','2018-05-01 08:00:00.000','hello world'

4,'11','111','2018-05-01 08:00:00.000',NULL

[root@centoshadoop sbin]#

 

 

 

 

posted @ 2018-05-24 09:18  ParamousGIS  阅读(354)  评论(0编辑  收藏  举报