Sqoop将MySQL的表数据同步到HDFS(二)设置存储格式
系统环境
操作系统: CentOS 7 主机名: centos02 IP: 192.168.122.1 Java: 1.8 Hadoop: 2.8.5 Sqoop: 1.4.7
MySQL: 8.0.12
一、Avro 格式存储
sqoop import --connect jdbc:mysql://centos02:3306/OfficialCashMid --driver com.mysql.cj.jdbc.Driver --username root --password sa123_ADMIN. --table tadminoperationlog --m 2 --target-dir /jdbcHDFS/TAdminLog_avro -- as-avrodatafile
[root@centos02 bin]# sqoop import --connect jdbc:mysql://centos02:3306/OfficialCashMid --driver com.mysql.cj.jdbc.Driver --username root --password sa123_ADMIN. --table tadminoperationlog --m 2 --target-dir /jdbcHDFS/TAdminLog_avro -- as-avrodatafile Warning: /opt/bigdata/sqoop/sqoop-1.4.7/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /opt/bigdata/sqoop/sqoop-1.4.7/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /opt/bigdata/sqoop/sqoop-1.4.7/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /opt/bigdata/sqoop/sqoop-1.4.7/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/09/04 01:11:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/09/04 01:11:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/09/04 01:11:25 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 19/09/04 01:11:25 INFO manager.SqlManager: Using default fetchSize of 1000 19/09/04 01:11:25 INFO tool.CodeGenTool: Beginning code generation 19/09/04 01:11:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM tadminoperationlog AS t WHERE 1=0 19/09/04 01:11:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM tadminoperationlog AS t WHERE 1=0 19/09/04 01:11:27 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bigdata/hadoop/hadoop-2.8.5 注: /tmp/sqoop-root/compile/64a3b4f66eb537e9b11f0416dbc6d58d/tadminoperationlog.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 19/09/04 01:11:30 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/64a3b4f66eb537e9b11f0416dbc6d58d/tadminoperationlog.jar 19/09/04 01:11:30 INFO mapreduce.ImportJobBase: Beginning import of tadminoperationlog 19/09/04 01:11:31 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/09/04 01:11:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM tadminoperationlog AS t WHERE 1=0 19/09/04 01:11:32 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/09/04 01:11:32 INFO client.RMProxy: Connecting to ResourceManager at centos02/192.168.122.1:8032 19/09/04 01:11:37 INFO db.DBInputFormat: Using read commited transaction isolation 19/09/04 01:11:37 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(FID), MAX(FID) FROM tadminoperationlog 19/09/04 01:11:37 INFO db.IntegerSplitter: Split size: 16058; Num splits: 2 from: 21 to: 32138 19/09/04 01:11:37 INFO mapreduce.JobSubmitter: number of splits:2 19/09/04 01:11:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1567503661837_0005 19/09/04 01:11:39 INFO impl.YarnClientImpl: Submitted application application_1567503661837_0005 19/09/04 01:11:39 INFO mapreduce.Job: The url to track the job: http://centos02:8088/proxy/application_1567503661837_0005/ 19/09/04 01:11:39 INFO mapreduce.Job: Running job: job_1567503661837_0005 19/09/04 01:11:51 INFO mapreduce.Job: Job job_1567503661837_0005 running in uber mode : false 19/09/04 01:11:51 INFO mapreduce.Job: map 0% reduce 0% 19/09/04 01:12:07 INFO mapreduce.Job: map 100% reduce 0% 19/09/04 01:12:09 INFO mapreduce.Job: Job job_1567503661837_0005 completed successfully 19/09/04 01:12:10 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=357730 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=206 HDFS: Number of bytes written=3753700 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=2 Other local map tasks=2 Total time spent by all maps in occupied slots (ms)=25703 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=25703 Total vcore-milliseconds taken by all map tasks=25703 Total megabyte-milliseconds taken by all map tasks=26319872 Map-Reduce Framework Map input records=12122 Map output records=12122 Input split bytes=206 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=443 CPU time spent (ms)=11370 Physical memory (bytes) snapshot=399974400 Virtual memory (bytes) snapshot=4243742720 Total committed heap usage (bytes)=198180864 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=3753700 19/09/04 01:12:10 INFO mapreduce.ImportJobBase: Transferred 3.5798 MB in 37.8235 seconds (96.9166 KB/sec) 19/09/04 01:12:10 INFO mapreduce.ImportJobBase: Retrieved 12122 records. [root@centos02 bin]#


二、Sequence格式存储
sqoop import --connect jdbc:mysql://centos02:3306/OfficialCashMid --driver com.mysql.cj.jdbc.Driver --username root --password sa123_ADMIN. --table tadminoperationlog --m 2 --target-dir /jdbcHDFS/TAdminLog_sequence -- as-sequencefile
[root@centos02 bin]# sqoop import --connect jdbc:mysql://centos02:3306/OfficialCashMid --driver com.mysql.cj.jdbc.Driver --username root --password sa123_ADMIN. --table tadminoperationlog --m 2 --target-dir /jdbcHDFS/TAdminLog_sequence -- as-sequencefile Warning: /opt/bigdata/sqoop/sqoop-1.4.7/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /opt/bigdata/sqoop/sqoop-1.4.7/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /opt/bigdata/sqoop/sqoop-1.4.7/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /opt/bigdata/sqoop/sqoop-1.4.7/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/09/04 01:17:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/09/04 01:17:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/09/04 01:17:11 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 19/09/04 01:17:11 INFO manager.SqlManager: Using default fetchSize of 1000 19/09/04 01:17:11 INFO tool.CodeGenTool: Beginning code generation 19/09/04 01:17:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM tadminoperationlog AS t WHERE 1=0 19/09/04 01:17:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM tadminoperationlog AS t WHERE 1=0 19/09/04 01:17:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bigdata/hadoop/hadoop-2.8.5 注: /tmp/sqoop-root/compile/80fedf2c36bf7f4118ebc9731bc7479f/tadminoperationlog.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 19/09/04 01:17:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/80fedf2c36bf7f4118ebc9731bc7479f/tadminoperationlog.jar 19/09/04 01:17:17 INFO mapreduce.ImportJobBase: Beginning import of tadminoperationlog 19/09/04 01:17:17 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/09/04 01:17:17 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM tadminoperationlog AS t WHERE 1=0 19/09/04 01:17:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/09/04 01:17:18 INFO client.RMProxy: Connecting to ResourceManager at centos02/192.168.122.1:8032 19/09/04 01:17:24 INFO db.DBInputFormat: Using read commited transaction isolation 19/09/04 01:17:24 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(FID), MAX(FID) FROM tadminoperationlog 19/09/04 01:17:24 INFO db.IntegerSplitter: Split size: 16058; Num splits: 2 from: 21 to: 32138 19/09/04 01:17:25 INFO mapreduce.JobSubmitter: number of splits:2 19/09/04 01:17:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1567503661837_0006 19/09/04 01:17:26 INFO impl.YarnClientImpl: Submitted application application_1567503661837_0006 19/09/04 01:17:26 INFO mapreduce.Job: The url to track the job: http://centos02:8088/proxy/application_1567503661837_0006/ 19/09/04 01:17:26 INFO mapreduce.Job: Running job: job_1567503661837_0006 19/09/04 01:17:37 INFO mapreduce.Job: Job job_1567503661837_0006 running in uber mode : false 19/09/04 01:17:37 INFO mapreduce.Job: map 0% reduce 0% 19/09/04 01:17:48 INFO mapreduce.Job: map 50% reduce 0% 19/09/04 01:17:49 INFO mapreduce.Job: map 100% reduce 0% 19/09/04 01:17:50 INFO mapreduce.Job: Job job_1567503661837_0006 completed successfully 19/09/04 01:17:50 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=357738 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=206 HDFS: Number of bytes written=3753700 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=2 Other local map tasks=2 Total time spent by all maps in occupied slots (ms)=16481 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=16481 Total vcore-milliseconds taken by all map tasks=16481 Total megabyte-milliseconds taken by all map tasks=16876544 Map-Reduce Framework Map input records=12122 Map output records=12122 Input split bytes=206 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=269 CPU time spent (ms)=7730 Physical memory (bytes) snapshot=360050688 Virtual memory (bytes) snapshot=4240666624 Total committed heap usage (bytes)=185073664 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=3753700 19/09/04 01:17:50 INFO mapreduce.ImportJobBase: Transferred 3.5798 MB in 32.0909 seconds (114.2295 KB/sec) 19/09/04 01:17:50 INFO mapreduce.ImportJobBase: Retrieved 12122 records. [root@centos02 bin]#



浙公网安备 33010602011771号