Sqoop

Posted on 2016-08-16 23:12  创意无极限  阅读(714)  评论(0编辑  收藏

1. 安装
tar -zxvf sqoop-1.4.6.tar.gz
参考:http://www.cnblogs.com/edisonchou/p/4440216.html
http://www.cnblogs.com/wgp13x/p/5028220.html

2. 配置
vi conf/sqoop-env.sh
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop-2.5/

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop-2.5

#set the path to where bin/hbase is available
export HBASE_HOME=/home/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/home/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/home/zk

vi /etc/profile
将ZK_HOME改成ZOOKEEPER_HOME
vi bin/configure-sqoop (将下面的段落注释)
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi

#if [ ! -d "${ACCUMULO_HOME}" ]; then
# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
报错问题:
[root@H1 bin]# ./sqoop
错误: 找不到或无法加载主类 org.apache.sqoop.Sqoop
解决方法:[root@H1 bin]# cp lib/sqoop-1.4.6.jar /home/hadoop-2.5/share/hadoop/yarn/lib/
3.运行
1).MySQL数据源:mysql中的hive数据库的TBLS表,这里使用学习笔记17《Hive框架学习》里边Hive的数据库表。
2).使用import命令将mysql中的数据导入HDFS:
[root@H1 bin]# ./sqoop import --connect jdbc:mysql://h1:3306/hive --username root --password 123456 --table TBLS --fields-terminated-by '\t'

然后看看如何进行实战:这里将mysql中的TBLS表导入到hdfs中(默认导入目录是/user/<username>)

3).刚刚看到了默认是由多个map来进行处理生成,可以设置指定数量的map任务。又由于sqoop默认不是追加方式写入,还可以设置其为追加方式写入已有文件末尾:
[root@H1 bin]# ./sqoop import --connect jdbc:mysql://h1:3306/hive --username root --password 123456 --table TBLS --fields-terminated-by '\t' --null-string '**' -m 1 --append
4).还可以将MySQL中的数据导入Hive中(你设定的hive在hdfs中的存储位置,我这里是/hive/):
首先得删掉刚刚导入到hdfs中的文件数据:hadoop fs -rmr /user/root/*
然后再通过以下命令导入到hive中:
[root@H1 bin]# ./sqoop import --connect jdbc:mysql://h1:3306/hive --username root --password 123456 --table TBLS --fields-terminated-by '\t' -m 1 --append --hive-import

5).SQL Server 数据导出到HIVE
[root@H1 bin]# ./sqoop import --connect 'jdbc:sqlserver://192.168.0.50:124;username=ent;password=ent_1509_group;database=entBoxOfficeHours' --query "SELECT * from BaseTime where showdate>='2016-06-01' and \$CONDITIONS" --target-dir /user/hive/warehouse/basetime2 --split-by id -m 10
--追加
[root@H1 bin]# ./sqoop import --connect 'jdbc:sqlserver://192.168.0.50:124;username=ent;password=ent_1509_group;database=entBoxOfficeHours' --query "SELECT * from BaseTime where showdate='2016-06-02' and \$CONDITIONS" --target-dir /user/hive/warehouse/basetime2 --split-by id -m 10 --append
[root@H1 bin]# ./sqoop import --connect 'jdbc:sqlserver://192.168.0.50:124;username=ent;password=ent_1509_group;database=entBoxOfficeHours' --table RouterList --target-dir /user/RouterList
1).对于没有主键(Error during import: No primary key could be found for table ExecRecord_InsertCinemaBoxOffice_Hour. Please specify one with --split-by or perform a sequential import with '-m 1'.)
[root@H1 bin]# ./sqoop import --connect 'jdbc:sqlserver://192.168.0.100:9526;username=ent;password=m2uhs4z#Yx@GV*oN;database=ent_dayBoxOffice_Common' --table ExecRecord_InsertCinemaBoxOffice_Hour --target-dir /user/hive/warehouse/ExecRecord_InsertCinemaBoxOffice_Hour --split-by ID -m 1
[root@H1 bin]# ./sqoop import --connect 'jdbc:sqlserver://192.168.0.100:9526;username=ent;password=m2uhs4z#Yx@GV*oN;database=ent_dayBoxOffice_Common' --table BaseMovieInfo2 --target-dir /user/hive/warehouse/BaseMovieInfo2 --split-by ID -m 1
5).Hive 数据导出到 SQL Server
[root@H1 bin]# ./sqoop export --connect 'jdbc:sqlserver://192.168.0.50:124;username=ent;password=ent_1509_group;database=entBoxOfficeHours' --table tmp07_5 --export-dir /user/hive/warehouse/tmp07_5 --input-fields-terminated-by '\001'