[原创]安装Sqoop并验证

 

1.下载包并配置

    下载hadoop-0.20.2-CDH3B4.tar.gzsqoop-1.2.0-CDH3B4.tar.gz 并解压,将Hadoop-0.20.2-CDH3B4 目录下hadoop-core-0.20.2-CDH3B4.jar 复制到Sqoop-1.2.0-CDH3B4lib目录下。修改Sqoop-1.2.0-CDH3B4/bin/ configure-sqoop, 注释掉对ZOOKEEPER_HOME的检查

 

2. 配置环境变量

export SQOOP_HOME=/home/admin/sqoop-1.2.0-CDH3B4

export PATH=$PATH:$SQOOP_HOME/bin

 

3. 测试安装

 

[admin@server1 ~]$ sqoop help

usage: sqoop COMMAND [ARGS]

 

Available commands:

  codegen            Generate code to interact with database records

  create-hive-table  Import a table definition into Hive

  eval               Evaluate a SQL statement and display the results

  export             Export an HDFS directory to a database table

  help               List available commands

  import             Import a table from a database to HDFS

  import-all-tables  Import tables from a database to HDFS

  job                Work with saved jobs

  list-databases     List available databases on a server

  list-tables        List available tables in a database

  merge              Merge results of incremental imports

  metastore          Run a standalone Sqoop metastore

  version            Display version information

 

See 'sqoop help COMMAND' for information on a specific command.

 

4. MySQL导入HDFS

 

使用原先做Hive实验的数据库实例hive,显示表SEQUENCE_TABLE的数据

mysql> select * from SEQUENCE_TABLE;

+-----------------------------------------------------------+----------+

| SEQUENCE_NAME                                             | NEXT_VAL |

+-----------------------------------------------------------+----------+

| org.apache.hadoop.hive.metastore.model.MColumnDescriptor  |       16 |

| org.apache.hadoop.hive.metastore.model.MDatabase          |        6 |

| org.apache.hadoop.hive.metastore.model.MSerDeInfo         |       16 |

| org.apache.hadoop.hive.metastore.model.MStorageDescriptor |       16 |

| org.apache.hadoop.hive.metastore.model.MTable             |       16 |

+-----------------------------------------------------------+----------+

5 rows in set (0.00 sec)

 

mysql-connector-java-5.1.18-bin.jar 复制到Sqoop-1.2.0-CDH3B4lib目录下,使用sqoop将表SEQUENCE_TABLE的数据导入HDFS

 

[admin@server1 bin]$ sqoop import --connect jdbc:mysql://server1/hive --username hive --password hive  --table SEQUENCE_TABLE -m 3;

12/12/16 01:27:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

12/12/16 01:27:16 INFO tool.CodeGenTool: Beginning code generation

12/12/16 01:27:16 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `SEQUENCE_TABLE` AS t LIMIT 1

12/12/16 01:27:16 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `SEQUENCE_TABLE` AS t LIMIT 1

12/12/16 01:27:16 INFO orm.CompilationManager: HADOOP_HOME is /home/admin/hadoop-0.20.2/bin/..

12/12/16 01:27:16 INFO orm.CompilationManager: Found hadoop core jar at: /home/admin/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar

12/12/16 01:27:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-admin/compile/415f2a5412b5c2aadd76474859647419/SEQUENCE_TABLE.jar

12/12/16 01:27:18 WARN manager.MySQLManager: It looks like you are importing from mysql.

12/12/16 01:27:18 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

12/12/16 01:27:18 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

12/12/16 01:27:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

12/12/16 01:27:18 INFO mapreduce.ImportJobBase: Beginning import of SEQUENCE_TABLE

12/12/16 01:27:19 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `SEQUENCE_TABLE` AS t LIMIT 1

12/12/16 01:27:20 WARN db.TextSplitter: Generating splits for a textual index column.

12/12/16 01:27:20 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

12/12/16 01:27:20 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

12/12/16 01:27:20 INFO mapred.JobClient: Running job: job_201212152320_0004

12/12/16 01:27:21 INFO mapred.JobClient:  map 0% reduce 0%

12/12/16 01:27:41 INFO mapred.JobClient:  map 25% reduce 0%

12/12/16 01:27:51 INFO mapred.JobClient:  map 50% reduce 0%

12/12/16 01:27:57 INFO mapred.JobClient:  map 100% reduce 0%

12/12/16 01:27:59 INFO mapred.JobClient: Job complete: job_201212152320_0004

12/12/16 01:27:59 INFO mapred.JobClient: Counters: 5

12/12/16 01:27:59 INFO mapred.JobClient:   Job Counters

12/12/16 01:27:59 INFO mapred.JobClient:     Launched map tasks=4

12/12/16 01:27:59 INFO mapred.JobClient:   FileSystemCounters

12/12/16 01:27:59 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=274

12/12/16 01:27:59 INFO mapred.JobClient:   Map-Reduce Framework

12/12/16 01:27:59 INFO mapred.JobClient:     Map input records=5

12/12/16 01:27:59 INFO mapred.JobClient:     Spilled Records=0

12/12/16 01:27:59 INFO mapred.JobClient:     Map output records=5

12/12/16 01:27:59 INFO mapreduce.ImportJobBase: Transferred 274 bytes in 40.4297 seconds (6.7772 bytes/sec)

12/12/16 01:27:59 INFO mapreduce.ImportJobBase: Retrieved 5 records.

 

查看HDFS中的数据

[admin@server1 bin]$ hadoop dfs -cat SEQUENCE_TABLE/part*

org.apache.hadoop.hive.metastore.model.MColumnDescriptor,16

org.apache.hadoop.hive.metastore.model.MDatabase,6

org.apache.hadoop.hive.metastore.model.MSerDeInfo,16

org.apache.hadoop.hive.metastore.model.MStorageDescriptor,16

org.apache.hadoop.hive.metastore.model.MTable,16

 

5.HDFS导出到MySQL

 

HDFS中有文件test.txt, 内容如下:

[admin@server1 ~]$ hadoop dfs -cat /test.txt

aaaa,111

bbbb,222

cccc,333

dddd,444

 

MySQL中建立表test

 

mysql> CREATE TABLE test(str VARCHAR(10), num INT);

Query OK, 0 rows affected (0.01 sec)

 

执行导出

[admin@server1 ~]$ sqoop export --connect jdbc:mysql://server1/hive --username hive --password hive  --table test --export-dir /test.txt  -m 3;

12/12/16 01:51:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

12/12/16 01:51:50 INFO tool.CodeGenTool: Beginning code generation

12/12/16 01:51:50 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1

12/12/16 01:51:50 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1

12/12/16 01:51:50 INFO orm.CompilationManager: HADOOP_HOME is /home/admin/hadoop-0.20.2/bin/..

12/12/16 01:51:50 INFO orm.CompilationManager: Found hadoop core jar at: /home/admin/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar

12/12/16 01:51:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-admin/compile/bebfa051f0e18c14ad2c466547b23c92/test.jar

12/12/16 01:51:52 INFO mapreduce.ExportJobBase: Beginning export of test

12/12/16 01:51:52 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1

12/12/16 01:51:53 INFO input.FileInputFormat: Total input paths to process : 1

12/12/16 01:51:53 INFO input.FileInputFormat: Total input paths to process : 1

12/12/16 01:51:53 INFO mapred.JobClient: Running job: job_201212152320_0005

12/12/16 01:51:54 INFO mapred.JobClient:  map 0% reduce 0%

12/12/16 01:52:01 INFO mapred.JobClient:  map 100% reduce 0%

12/12/16 01:52:03 INFO mapred.JobClient: Job complete: job_201212152320_0005

12/12/16 01:52:03 INFO mapred.JobClient: Counters: 6

12/12/16 01:52:03 INFO mapred.JobClient:   Job Counters

12/12/16 01:52:03 INFO mapred.JobClient:     Rack-local map tasks=1

12/12/16 01:52:03 INFO mapred.JobClient:     Launched map tasks=1

12/12/16 01:52:03 INFO mapred.JobClient:   FileSystemCounters

12/12/16 01:52:03 INFO mapred.JobClient:     HDFS_BYTES_READ=42

12/12/16 01:52:03 INFO mapred.JobClient:   Map-Reduce Framework

12/12/16 01:52:03 INFO mapred.JobClient:     Map input records=4

12/12/16 01:52:03 INFO mapred.JobClient:     Spilled Records=0

12/12/16 01:52:03 INFO mapred.JobClient:     Map output records=4

12/12/16 01:52:03 INFO mapreduce.ExportJobBase: Transferred 42 bytes in 10.7834 seconds (3.8949 bytes/sec)

12/12/16 01:52:03 INFO mapreduce.ExportJobBase: Exported 4 records.

 

查看MySql中的表test记录

 

mysql> select * from test;

+------+------+

| str  | num  |

+------+------+

| aaaa |  111 |

| bbbb |  222 |

| cccc |  333 |

| dddd |  444 |

+------+------+

4 rows in set (0.00 sec)

 

 

posted @ 2014-05-28 08:44  lihui1625  阅读(131)  评论(0编辑  收藏  举报