您的位置:首页 > 运维架构

Sqoop数据移植

2017-12-15 00:00 10 查看
摘要: Sqoop数据移植

这节课我们一起学习一下Sqoop,Sqoop是专门用来迁移数据的,它可以把数据库中的数据迁移到HDFS文件系统,当然也可以从HDFS文件系统导回到数据库。

我来说一下Sqoop的使用场景,假如你们公司有个项目运行好长时间了,积累了大量的数据,现在想升级项目并换种数据库进行存储原来的数据,那么我们就需要先把数据都存放到另一个地方,然后再用新数据库的语句把这些数据插入到新的数据库。在没有Sqoop之前,我们要做到这一点是很困难的,但是现在有了Sqoop,事情就变的简单多了,Sqoop是运行在Hadoop之上的一个工具,底层运用了MapReduce的技术,多台设备并行执行任务,速度当然大大提高,而且不用我们写这方面的代码,它提供了非常强大的命令,我们只需要知道怎样使用这些命令,再加上一些SQL语句就可以轻轻松松实现数据的迁移工作。

接下来我们便正式开始学习怎样使用Sqoop。

1.安装

我们使用的版本是sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

首先就是解压缩,重命名为sqoop,然后在文件/etc/profile中设置环境变量SQOOP_HOME。

把mysql的jdbc驱动mysql-connector-java-5.1.10.jar复制到sqoop项目的lib目录下。

2.重命名配置文件

在${SQOOP_HOME}/conf中执行命令

mv  sqoop-env-template.sh  sqoop-env.sh

在conf目录下,有两个文件sqoop-site.xml和sqoop-site-template.xml内容是完全一样的,不必在意,我们只关心sqoop-site.xml即可。

3.修改配置文件sqoop-env.sh

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/hadoop/

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/hadoop

#set the path to where bin/hbase is available
export HBASE_HOME=/usr/local/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/usr/local/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/usr/local/zk

好了,搞定了,下面就可以运行了。

4.执行sqoop命令 mysql -> hdfs

1.sqoop list-databases 显示所有数据库

[root@centos1 sqoop-1.4.6]# ./bin/sqoop list-databases --connect jdbc:mysql://192.168.20.224:3306 --username root --password root
Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/12/15 13:29:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/15 13:29:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/15 13:29:53 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
bak2
bak3
bbc5.1
hive
mysql
online
performance_schema
qf_crm_pms_new
sakila
sentry
test
world


2.sqoop import 移植整张表中的数据

[root@centos1 sqoop-1.4.6]# ./bin/sqoop list-databases --connect jdbc:mysql://192.168.20.224:3306 --username root --password root
.....

[root@centos1 src]# hadoop fs -cat /user/root/t_clue/part-m-00000




点击上图的"t_clue"文件夹之后我们会键入到如下图所示的界面,可以看到有4个结果文件,也就是说使用了4个mapper来参与导入操作了,我们还发现文件的名字中中间都是"m",“m”代表的意思是mapper生成的文件,"r"代表的意思是reducer生成的文件。列与列之间默认是以","分隔的。

sqoop 导数据的时候注意:表数据库存储引擎是InnoDB才可以正常导入,如果为MyISAM 会出现过如下错误:

Caused by: java.sql.SQLException: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '>='

3. --target-dir指定存放服务器哪个目录 -m 要启mapper的数量

我们上面以最简单的方式导入了一下数据,现在我们再多使用两个参数来进入导入操作,我们使用的命令如下,可以看到我们加了两个参数,--target-dir(指定要存放到服务器的哪个目录下)和-m(指定要起的mapper的数量,注意:m前面是一个"-",其它参数前面是两个"--",由于用不到reducer合并数据,因此起几个mapper就会生成几个文件。)

[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --table shop_activity --target-dir /sqoop/td1 -m 2


4. --fields-terminated-by 列与列的分隔符,--columns 指定要导出的列

我们接着来增加参数,--fields-terminated-by '\t'意思是指定列与列的分隔符为制表符,--columns 'ID,Name,Age'意思是我们要导入的只有ID、Name和Age这三列。如下所示

[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --table shop_role --target-dir /sqoop/shop_role -m 2 --fields-terminated-by '\t' --columns 'r_id,r_name,r_description,r_alias'
Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/12/15 14:59:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/15 14:59:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/15 14:59:34 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/12/15 14:59:34 INFO tool.CodeGenTool: Beginning code generation
17/12/15 14:59:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `shop_role` AS t LIMIT 1
17/12/15 14:59:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `shop_role` AS t LIMIT 1
17/12/15 14:59:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop
Note: /tmp/sqoop-root/compile/8b37357b383ec68014ebcb4a5d12c4af/shop_role.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/12/15 14:59:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/8b37357b383ec68014ebcb4a5d12c4af/shop_role.jar
17/12/15 14:59:36 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/12/15 14:59:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/12/15 14:59:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/12/15 14:59:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/12/15 14:59:36 INFO mapreduce.ImportJobBase: Beginning import of shop_role
17/12/15 14:59:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/12/15 14:59:37 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/12/15 14:59:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/12/15 14:59:38 INFO client.RMProxy: Connecting to ResourceManager at centos1/192.168.20.241:8032
17/12/15 14:59:41 INFO db.DBInputFormat: Using read commited transaction isolation
17/12/15 14:59:41 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`r_id`), MAX(`r_id`) FROM `shop_role`
17/12/15 14:59:41 WARN db.TextSplitter: Generating splits for a textual index column.
17/12/15 14:59:41 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
17/12/15 14:59:41 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
17/12/15 14:59:41 INFO mapreduce.JobSubmitter: number of splits:2
17/12/15 14:59:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1513320327311_0003
17/12/15 14:59:41 INFO impl.YarnClientImpl: Submitted application application_1513320327311_0003
17/12/15 14:59:41 INFO mapreduce.Job: The url to track the job: http://centos1:8088/proxy/application_1513320327311_0003/ 17/12/15 14:59:41 INFO mapreduce.Job: Running job: job_1513320327311_0003
17/12/15 14:59:51 INFO mapreduce.Job: Job job_1513320327311_0003 running in uber mode : false
17/12/15 14:59:51 INFO mapreduce.Job:  map 0% reduce 0%
17/12/15 15:00:01 INFO mapreduce.Job:  map 100% reduce 0%
17/12/15 15:00:02 INFO mapreduce.Job: Job job_1513320327311_0003 completed successfully
17/12/15 15:00:02 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=265102
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=227
HDFS: Number of bytes written=166
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=15792
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=15792
Total vcore-seconds taken by all map tasks=15792
Total megabyte-seconds taken by all map tasks=16171008
Map-Reduce Framework
Map input records=4
Map output records=4
Input split bytes=227
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=372
CPU time spent (ms)=3190
Physical memory (bytes) snapshot=303677440
Virtual memory (bytes) snapshot=4180291584
Total committed heap usage (bytes)=174587904
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=166
17/12/15 15:00:02 INFO mapreduce.ImportJobBase: Transferred 166 bytes in 24.7169 seconds (6.7161 bytes/sec)
17/12/15 15:00:02 INFO mapreduce.ImportJobBase: Retrieved 4 records.
[root@centos1 sqoop-1.4.6]#

[root@centos1 hadoop]# hadoop fs -ls /sqoop/shop_role
17/12/15 15:02:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r--   1 root supergroup          0 2017-12-15 15:00 /sqoop/shop_role/_SUCCESS
-rw-r--r--   1 root supergroup        126 2017-12-15 14:59 /sqoop/shop_role/part-m-00000
-rw-r--r--   1 root supergroup         40 2017-12-15 14:59 /sqoop/shop_role/part-m-00001
[root@centos1 hadoop]# hadoop fs -cat /sqoop/shop_role/part-m-00000
17/12/15 15:02:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11	超级管理员	超级管理员	经理
16	高级管理员	高级管理员	研发
18	普通管理员	普通管理员	经理
[root@centos1 hadoop]#


[b]5. --where 筛选数据并导入符合条件的数据[/b]

我们现在来玩一个更高级的,我们使用where条件来筛选数据并导入符合条件的数据,增加的参数是--where 'ID>=3 and ID<=8',顾名思义,就是要把ID从3到8的数据导入到服务器。

[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --table shop_role --target-dir /sqoop/shop_role -m 2 --fields-terminated-by '\t' --columns 'r_id,r_name,r_description,r_alias' --where 'r_id>10 and r_id < 50'
Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/12/15 15:07:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/15 15:07:09 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/15 15:07:09 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/12/15 15:07:09 INFO tool.CodeGenTool: Beginning code generation
17/12/15 15:07:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `shop_role` AS t LIMIT 1
17/12/15 15:07:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `shop_role` AS t LIMIT 1
17/12/15 15:07:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop
Note: /tmp/sqoop-root/compile/2e08cd2ec396f0238e634ba17076b048/shop_role.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/12/15 15:07:12 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/2e08cd2ec396f0238e634ba17076b048/shop_role.jar
17/12/15 15:07:12 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/12/15 15:07:12 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/12/15 15:07:12 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/12/15 15:07:12 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/12/15 15:07:12 INFO mapreduce.ImportJobBase: Beginning import of shop_role
17/12/15 15:07:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/12/15 15:07:12 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/12/15 15:07:13 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/12/15 15:07:13 INFO client.RMProxy: Connecting to ResourceManager at centos1/192.168.20.241:8032
17/12/15 15:07:17 INFO db.DBInputFormat: Using read commited transaction isolation
17/12/15 15:07:17 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`r_id`), MAX(`r_id`) FROM `shop_role` WHERE ( r_id>10 and r_id < 50 )
17/12/15 15:07:17 WARN db.TextSplitter: Generating splits for a textual index column.
17/12/15 15:07:17 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
17/12/15 15:07:17 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
17/12/15 15:07:17 INFO mapreduce.JobSubmitter: number of splits:2
17/12/15 15:07:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1513320327311_0004
17/12/15 15:07:17 INFO impl.YarnClientImpl: Submitted application application_1513320327311_0004
17/12/15 15:07:17 INFO mapreduce.Job: The url to track the job: http://centos1:8088/proxy/application_1513320327311_0004/ 17/12/15 15:07:17 INFO mapreduce.Job: Running job: job_1513320327311_0004
17/12/15 15:07:26 INFO mapreduce.Job: Job job_1513320327311_0004 running in uber mode : false
17/12/15 15:07:26 INFO mapreduce.Job:  map 0% reduce 0%
17/12/15 15:07:35 INFO mapreduce.Job:  map 50% reduce 0%
17/12/15 15:07:36 INFO mapreduce.Job:  map 100% reduce 0%
17/12/15 15:07:36 INFO mapreduce.Job: Job job_1513320327311_0004 completed successfully
17/12/15 15:07:36 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=265438
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=227
HDFS: Number of bytes written=166
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=14464
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=14464
Total vcore-seconds taken by all map tasks=14464
Total megabyte-seconds taken by all map tasks=14811136
Map-Reduce Framework
Map input records=4
Map output records=4
Input split bytes=227
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=202
CPU time spent (ms)=2640
Physical memory (bytes) snapshot=313946112
Virtual memory (bytes) snapshot=4181368832
Total committed heap usage (bytes)=175112192
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=166
17/12/15 15:07:36 INFO mapreduce.ImportJobBase: Transferred 166 bytes in 23.1148 seconds (7.1815 bytes/sec)
17/12/15 15:07:36 INFO mapreduce.ImportJobBase: Retrieved 4 records.
[root@centos1 sqoop-1.4.6]#


6. 使用query语句来筛选我们的数据

接下来我们玩更高级一点的,我们使用query语句来筛选我们的数据,这意味着我们可以导入多张表的数据,我们还是来个简单的,命令如下。我们发现使用query语句的话,就不用指定table了,由于数量很少,现在我们指定mapper的数量为1。我们执行下面的命令,该命令目前有个问题。

[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --query 'select * from shop_role where r_id = 11' --target-dir /sqoop/shop_role -m 1 --fields-terminated-by '\t'
Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/12/15 15:21:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/15 15:21:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/15 15:21:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/12/15 15:21:56 INFO tool.CodeGenTool: Beginning code generation
17/12/15 15:21:56 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [select * from shop_role where r_id = 11] must contain '$CONDITIONS' in WHERE clause.
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

执行上面的命令会出现如下所示的错误信息:must contain '$CONDITIONS' in WHERE clause.的意思是在我们的query的where条件当中必须有$CONDITIONS'这个条件,这个条件就相当于一个占位符,动态接收传过来的参数,从而查询出符合条件的结果。

我们在上面的命令语句当中加上$CONDITIONS,如下所示。然后再执行,就可以执行成功。

[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --query 'select * from shop_role where r_id >10 and r_id < 40 and $CONDITIONS' --target-dir /sqoop/shop_role -m 1 --fields-terminated-by '\t'
......
[root@centos1 hadoop]# hadoop fs -cat /sqoop/shop_role/part-m-00000
11	超级管理员	20150316095648	超级管理员	经理
16	高级管理员	20150825070151	高级管理员	研发
18	普通管理员	20150316095658	普通管理员	经理
20	b2c	1449022961714	仅可操作B2C功能。	manager
[root@centos1 hadoop]#


--split-by

假如我们要把-m 1改成-m 2的话,导入操作会失败,我们来看一下命令执行及异常信息,如下所示。异常信息的意思是,我们没有指定mapper按什么规则来分割数据。即我这个mapper应该读取哪些数据,一个mapper的时候没有问题是因为它一个mapper就读取了所有数据,现在mapper的数量是2了,那么我第一个mapper读取多少数据,第二个mapper就读取第一个mapper剩下的数据,现在两个mapper缺少一个分割数据的条件,找一个唯一标识的一列作为分割条件,这样两个mapper便可以迅速知道表中一共有多少条数据,两者分别需要读取多少数据。

[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --query 'select * from shop_role where r_id >10 and r_id < 40 and $CONDITIONS' --target-dir /sqoop/shop_role -m 2 --fields-terminated-by '\t'
17/12/15 15:30:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/15 15:30:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
When importing query results in parallel, you must specify --split-by.

[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --query 'select * from shop_role where r_id >10 and r_id < 40 and $CONDITIONS' --target-dir /sqoop/shop_role -m 2 --fields-terminated-by '\t' --split-by shop_role.r_id

执行上面的命令,导入操作便可以成功。如下图所示。发现确实生成了两个文件,并且两个文件中的内容加起来刚好就是我们query中的条件r_id>10 and r_id < 40。这里我们再详细说说$CONDITIONS'的作用,sqoop首先根据shop_role.r_id将数据统计出来,然后传给$CONDITIONS',query语句就知道一共有多条数据了,假如第一个mapper读取了2条数据,那么也会把这个2传给$CONDITIONS,这样第二个mapper在读取数据的时候便可以根据第一个mapper读取的数量读取剩下的内容。

4.sqoop export 将服务器上的数据导入到数据库中

[root@centos1 sqoop-1.4.6]# ./bin/sqoop export --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --export-dir /sqoop/shop_role -m 1 --table shop_role_copy --fields-terminated-by '\t'




========================================================
========================================================

Sqoop命令汇总

Sqoop的本质还是一个命令行工具,和HDFS,MapReduce相比,并没有什么高深的理论。

我们可以通过sqoop help命令来查看sqoop的命令选项,如下:

16/11/13 20:10:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]
Available commands:
codegen            Generate code to interact with database records
create-hive-table  Import a table definition into Hive
eval               Evaluate a SQL statement and display the results
export             Export an HDFS directory to a database table
help               List available commands
import             Import a table from a database to HDFS
import-all-tables  Import tables from a database to HDFS
import-mainframe   Import datasets from a mainframe server to HDFS
job                Work with saved jobs
list-databases     List available databases on a server
list-tables        List available tables in a database
merge              Merge results of incremental imports
metastore          Run a standalone Sqoop metastore
version            Display version information
See 'sqoop help COMMAND' for information on a specific command.

其中使用频率最高的选项还是import 和 export 选项。

1. codegen

将关系型数据库表的记录映射为一个Java文件,Java class类以及相关的jar包,该命令将数据库表的记录映射为一个Java文件,在该Java文件中对应有表的各个字段。生成的jar和class文件在Metastore功能使用时会用到。该命令选项的参数如下图所示:



举例:

sqoop codegen --connect jdbc:mysql://localhost:3306/test --table order_info -outdir /home/xiaosi/test/ --username root -password root

上面实例以test数据库的order_info表来生成Java代码,其中-outdir指定了Java代码生成的路径

运行结果信息如下:

16/11/13 21:50:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
16/11/13 21:50:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/11/13 21:50:38 INFO tool.CodeGenTool: Beginning code generation
16/11/13 21:50:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1
16/11/13 21:50:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1
16/11/13 21:50:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.2
注: /tmp/sqoop-xiaosi/compile/ea41fe40e1f12f6b052ad9fe4a5d9710/order_info.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
16/11/13 21:50:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-xiaosi/compile/ea41fe40e1f12f6b052ad9fe4a5d9710/order_info.jar

我们还可以使用-bindir指定编译成的class文件以及将生成文件打包为jar的jar包文件输出路径:

16/11/13 21:53:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
16/11/13 21:53:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/11/13 21:53:58 INFO tool.CodeGenTool: Beginning code generation
16/11/13 21:53:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1
16/11/13 21:53:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1
16/11/13 21:53:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.2
注: /home/xiaosi/data/order_info.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
16/11/13 21:53:59 INFO orm.CompilationManager: Writing jar file: /home/xiaosi/data/order_info.jar

上面实例指定编译成的class文件(order_info.class)以及将生成文件打包为jar的jar包文件(order_info.jar)输出路径为/home/xiaosi/data路径,java文件(order_info.java)路径为/home/xiaosi/test

2. create-hive-table

这个命令上一篇文章[Sqoop导入与导出]中已经使用过了,作用就是生成与关系数据库表的表结构对应的Hive表。该命令选项的参数如下图所示:



举例:

sqoop create-hive-table --connect jdbc:mysql://localhost:3306/test --table employee --username root -password root --fields-terminated-by ','


3. eval

eval命令选项可以让Sqoop使用SQL语句对关系性数据库进行操作,在使用import这种工具进行数据导入的时候,可以预先了解相关的SQL语句是否正确,并能将结果显示在控制台。

3.1 选择查询评估计算

使用eval工具,我们可以评估计算任何类型的SQL查询。我们以test数据库的order_info表为例子:

sqoop eval --connect jdbc:mysql://localhost:3306/test --username root --query "select * from order_info limit 3" -P

运行结果信息:

16/11/13 22:25:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
16/11/13 22:25:22 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
------------------------------------------------------------
| id                   | order_time           | business   |
------------------------------------------------------------
| 358574046793404      | 2016-04-05           | flight     |
| 358574046794733      | 2016-08-03           | hotel      |
| 358574050631177      | 2016-05-08           | vacation   |
------------------------------------------------------------


3.2 插入评估计算

Sqoop的eval工具可以适用于两个模拟和定义的SQL语句。这意味着,我们可以使用eval的INSERT语句了。下面的命令用于在test数据库的order_info表中插入新行:

sqoop eval --connect jdbc:mysql://localhost:3306/test --username root --query "insert into order_info (id, order_time, business) values('358574050631166', '2016-11-13', 'hotel')" -P

运行结果信息输出:

16/11/13 22:29:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
16/11/13 22:29:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/11/13 22:29:44 INFO tool.EvalSqlTool: 1 row(s) updated.

如果命令成功执行,会在控制台上显示更新的行的状态。或者我们可以在mysql中查询我们刚插入的那条信息:

mysql> select * from order_info where id = "358574050631166";
+-----------------+------------+----------+
| id              | order_time | business |
+-----------------+------------+----------+
| 358574050631166 | 2016-11-13 | hotel    |
+-----------------+------------+----------+
1 row in set (0.00 sec)


4. export

从HDFS中将数据导出到关系性数据库中。该命令选项的参数如下图所示:



举例:

在HDFS文件中的员工数据的一个例子,数据如下:

hadoop fs -text /user/xiaosi/employee/* | less
yoona,qunar,创新事业部
xiaosi,qunar,创新事业部
jim,ali,淘宝
kom,ali,淘宝
lucy,baidu,搜索
jim,ali,淘宝

在将HDFS中数据导出到关系性数据库时,必须在关系性数据库中新建一张来接受数据的表,如下:

CREATE TABLE `employee` (
`name` varchar(255) DEFAULT NULL,
`company` varchar(255) DEFAULT NULL,
`depart` varchar(255) DEFAULT NULL
);

下面执行导出操作,命令如下:

sqoop export --connect jdbc:mysql://localhost:3306/test --table employee --export-dir /user/xiaosi/employee --username root -m 1 --fields-terminated-by ',' -P

运行结果信息输出:

16/11/13 23:40:49 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 16/11/13 23:40:49 INFO mapreduce.Job: Running job: job_local611430785_0001
16/11/13 23:40:49 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/11/13 23:40:49 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.sqoop.mapreduce.NullOutputCommitter
16/11/13 23:40:49 INFO mapred.LocalJobRunner: Waiting for map tasks
16/11/13 23:40:49 INFO mapred.LocalJobRunner: Starting task: attempt_local611430785_0001_m_000000_0
16/11/13 23:40:49 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/11/13 23:40:49 INFO mapred.MapTask: Processing split: Paths:/user/xiaosi/employee/part-m-00000:0+120
16/11/13 23:40:49 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
16/11/13 23:40:49 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
16/11/13 23:40:49 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
16/11/13 23:40:49 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
16/11/13 23:40:49 INFO mapred.LocalJobRunner:
16/11/13 23:40:49 INFO mapred.Task: Task:attempt_local611430785_0001_m_000000_0 is done. And is in the process of committing
16/11/13 23:40:49 INFO mapred.LocalJobRunner: map
16/11/13 23:40:49 INFO mapred.Task: Task 'attempt_local611430785_0001_m_000000_0' done.
16/11/13 23:40:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local611430785_0001_m_000000_0
16/11/13 23:40:49 INFO mapred.LocalJobRunner: map task executor complete.
16/11/13 23:40:50 INFO mapreduce.Job: Job job_local611430785_0001 running in uber mode : false
16/11/13 23:40:50 INFO mapreduce.Job:  map 100% reduce 0%
16/11/13 23:40:50 INFO mapreduce.Job: Job job_local611430785_0001 completed successfully
16/11/13 23:40:50 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=22247825
FILE: Number of bytes written=22732498
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=126
HDFS: Number of bytes written=0
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=136
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=245366784
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
16/11/13 23:40:50 INFO mapreduce.ExportJobBase: Transferred 126 bytes in 2.3492 seconds (53.6344 bytes/sec)
16/11/13 23:40:50 INFO mapreduce.ExportJobBase: Exported 6 records.

导出完毕之后,我们可以在mysql中通过employee表进行查询:

mysql> select name, company from employee;
+--------+---------+
| name   | company |
+--------+---------+
| yoona  | qunar   |
| xiaosi | qunar   |
| jim    | ali     |
| kom    | ali     |
| lucy   | baidu   |
| jim    | ali     |
+--------+---------+
6 rows in set (0.00 sec)


5. import

将数据表中的数据导入HDFS或者Hive中,该命令选项的参数如下图所示:



举例:

sqoop import --connect jdbc:mysql://localhost:3306/test --target-dir /user/xiaosi/data/order_info --query 'select * from order_info where $CONDITIONS' -m 1 --username root -P

如上代码从查询结果中导入数据到HDFS中,存储路径由--target-dir参数指定。这里,使用了--query选项,不能同时与--table选项使用。同时,变量$CONDITIONS必须在WHERE语句之后,供Sqoop进程运行命令过程中使用。

运行结果信息如下:

16/11/14 12:08:50 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 16/11/14 12:08:50 INFO mapreduce.Job: Running job: job_local127577466_0001
16/11/14 12:08:50 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/11/14 12:08:50 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/11/14 12:08:50 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/11/14 12:08:50 INFO mapred.LocalJobRunner: Waiting for map tasks
16/11/14 12:08:50 INFO mapred.LocalJobRunner: Starting task: attempt_local127577466_0001_m_000000_0
16/11/14 12:08:50 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/11/14 12:08:50 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/11/14 12:08:50 INFO db.DBInputFormat: Using read commited transaction isolation
16/11/14 12:08:50 INFO mapred.MapTask: Processing split: 1=1 AND 1=1
16/11/14 12:08:50 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1
16/11/14 12:08:50 INFO db.DBRecordReader: Executing query: select * from order_info where ( 1=1 ) AND ( 1=1 )
16/11/14 12:08:50 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
16/11/14 12:08:50 INFO mapred.LocalJobRunner:
16/11/14 12:08:51 INFO mapred.Task: Task:attempt_local127577466_0001_m_000000_0 is done. And is in the process of committing
16/11/14 12:08:51 INFO mapred.LocalJobRunner:
16/11/14 12:08:51 INFO mapred.Task: Task attempt_local127577466_0001_m_000000_0 is allowed to commit now
16/11/14 12:08:51 INFO output.FileOutputCommitter: Saved output of task 'attempt_local127577466_0001_m_000000_0' to hdfs://localhost:9000/user/xiaosi/data/order_info/_temporary/0/task_local127577466_0001_m_000000
16/11/14 12:08:51 INFO mapred.LocalJobRunner: map
16/11/14 12:08:51 INFO mapred.Task: Task 'attempt_local127577466_0001_m_000000_0' done.
16/11/14 12:08:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local127577466_0001_m_000000_0
16/11/14 12:08:51 INFO mapred.LocalJobRunner: map task executor complete.
16/11/14 12:08:51 INFO mapreduce.Job: Job job_local127577466_0001 running in uber mode : false
16/11/14 12:08:51 INFO mapreduce.Job:  map 100% reduce 0%
16/11/14 12:08:51 INFO mapreduce.Job: Job job_local127577466_0001 completed successfully
16/11/14 12:08:51 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=22247784
FILE: Number of bytes written=22732836
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=3710
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Map input records=111
Map output records=111
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=245366784
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=3710
16/11/14 12:08:51 INFO mapreduce.ImportJobBase: Transferred 3.623 KB in 2.5726 seconds (1.4083 KB/sec)
16/11/14 12:08:51 INFO mapreduce.ImportJobBase: Retrieved 111 records.

我们可以查看HDFS由参数--target-dir指定的路径查看导入的数据:

hadoop fs -text /user/xiaosi/data/order_info/* | less
358574046793404,2016-04-05,flight
358574046794733,2016-08-03,hotel
358574050631177,2016-05-08,vacation
358574050634213,2015-04-28,train
358574050634692,2016-04-05,tuan
358574050650524,2015-07-26,hotel
358574050654773,2015-01-23,flight
358574050668658,2015-01-23,hotel
358574050730771,2016-11-06,train
358574050731241,2016-05-08,car
358574050743865,2015-01-23,vacation
358574050767666,2015-04-28,train
358574050767971,2015-07-26,flight
358574050808288,2016-05-08,hotel
358574050816828,2015-01-23,hotel
358574050818220,2015-04-28,car
358574050821877,2013-08-03,flight

再看一个例子:

sqoop import --connect jdbc:mysql://localhost:3306/test --table order_info --columns "business,id,order_time"  -m 1 --username root -P

HDFS上会在/user/xiaosi/目录下新增一个目录order_info,与关系性数据库的表名一致,内容如下:

flight,358574046793404,2016-04-05
hotel,358574046794733,2016-08-03
vacation,358574050631177,2016-05-08
train,358574050634213,2015-04-28
tuan,358574050634692,2016-04-05


6. import-all-tables

将数据库里的所有表导入HDFS中,每个表在HDFS中对应一个独立的目录。该命令选项的参数如下图所示:



7. list-databases

该命令选项可以列出关系性数据库的所有数据库名,命令如下:

sqoop list-databases --connect jdbc:mysql://localhost:3306 --username root -P

运行结果信息如下:

16/11/14 14:30:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
16/11/14 14:30:14 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hive_db
mysql
performance_schema
phpmyadmin
test


8. list-tables

该命令选项可以列出关系性数据库的某一个数据库的所有表名,命令如下:

sqoop list-tables --connect jdbc:mysql://localhost:3306/test --username root -P

运行结果信息如下:

16/11/14 14:32:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
16/11/14 14:32:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
PageView
book
bookID
cc
city_click
country
country2
cup
employee
flightOrder
hotel_book_info
hotel_info
order_info
stu
stu2
stu3
stuInfo
student


9. merge

该命令选项的作用是将HDFS上的两份数据进行合并,在合并的同时进行数据去重。该命令选项的参数如下图所示:





例如,在HDFS的路径/user/xiaosi/old下由一份导入数据,如下:

id name
1 a
2 b
3 c

在HDFS的路径/user/xiaosi/new下也有一份数据,但是在导入时间在第一份之后,如下:

id name
1 a2
2 b
3 c

那么合并的结果为:

id name
1 a2
2 b
3 c

运行如下命令:

sqoop merge -new-data /user/xiaosi/new/part-m-00000 -onto /user/xiaosi/old/part-m-00000 -target-dir /user/xiaosi/final -jar-file /home/xiaosi/test/testmerge.jar -class-name testmerge -merge-key id

备注:

在一份数据集中,多行不应具有相同的主键,否则会发生数据丢失。

10. metastore

记录Sqoop作业的元数据信息,如果不启动Metastore实例,则默认的元数据存储目录为~/.sqoop。如果要更改存储目录,可以在配置文件sqoop-site.xml中进行更改。

启动Metastore实例:

sqoop metastore

运行结果信息如下:

16/11/14 14:44:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/11/14 14:44:40 WARN hsqldb.HsqldbMetaStore: The location for metastore data has not been explicitly set. Placing shared metastore files in /home/xiaosi/.sqoop/shared-metastore.db
[Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) entered
[Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) exited
[Server@52308be6]: [Thread[main,5,main]]: setDatabasePath(0,file:/home/xiaosi/.sqoop/shared-metastore.db)
[Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) entered
[Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) exited
[Server@52308be6]: [Thread[main,5,main]]: setDatabaseName(0,sqoop)
[Server@52308be6]: [Thread[main,5,main]]: putPropertiesFromString(): [hsqldb.write_delay=false]
[Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) entered
[Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) exited
[Server@52308be6]: Initiating startup sequence...
[Server@52308be6]: Server socket opened successfully in 3 ms.
[Server@52308be6]: Database [index=0, id=0, db=file:/home/xiaosi/.sqoop/shared-metastore.db, alias=sqoop] opened sucessfully in 153 ms.
[Server@52308be6]: Startup sequence completed in 157 ms.
[Server@52308be6]: 2016-11-14 14:44:40.414 HSQLDB server 1.8.0 is online
[Server@52308be6]: To close normally, connect and execute SHUTDOWN SQL
[Server@52308be6]: From command line, use [Ctrl]+[C] to abort abruptly
16/11/14 14:44:40 INFO hsqldb.HsqldbMetaStore: Server started on port 16000 with protocol HSQL


11. job

该命令选项可以生产一个Sqoop的作业,但是不会立即执行,需要手动执行,该命令选项目的在于尽可能的服用Sqoop命令。该命令选项的参数如下图所示:



举例:

sqoop job -create listTablesJob -- list-tables --connect jdbc:mysql://localhost:3306/test --username root -P

上面代码实现一个job,显示关系性数据库test数据库中所有的表。

sqoop job -exec listTablesJob

上面代码执行我们已经定义好的Job,输出结果信息如下:

16/11/14 19:51:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
16/11/14 19:51:47 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
PageView
book
bookID
cc
city_click
country
country2
cup
employee
flightOrder
hotel_book_info
hotel_info
order_info
stu
stu2
stu3
stuInfo
student

备注:

-- 和 list-tables(Job 所要执行的Sqoop命令) 不能挨着。

Sqoop 导入导出

#从Mysql中抽取数据到HDFS.问题:文件太多,全他妈是小文件;目标目录如果已经存在会报错
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --target-dir /user/sqoop/mysql/input -m 1
#向已经存在HDFS目录追加数据
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --append --target-dir /user/test/sqoop
#name 是string类型的,如果是null,导入的时候用nothing替换
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --null-string ‘nothing‘ --append --target-dir /user/test/sqoop
#age是int类型,如果是null,导入的时候用-1替换
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --null-string ‘nothing‘ --null-non-string -1 --append --target-dir /user/test/sqoop
#仅仅导入id,name两个字段
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --columns id,name --null-string ‘nothing‘ --append --target-dir /user/test/sqoop
#字段间以|分割
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --columns id,name --fields-terminated-by ‘|‘ --null-string ‘nothing‘ --append --target-dir /user/test/sqoop
#只导入name不为null的id,name
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --columns id,name --where "name is not null" --fields-terminated-by ‘|‘ --null-string ‘nothing‘ --append --target-dir /user/test/sqoop
#使用--query代替--table --cloumns --where
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --query "select id,name from st where id > 10 and \$CONDITIONS" --split-by id --fields-terminated-by ‘|‘ --null-string ‘nothing‘ --append --target-dir /user/test/sqoop
#将所有数据放到一个文件中(东东那么少)
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --query "select id,name from st where id > 10 and \$CONDITIONS" --split-by id --fields-terminated-by ‘|‘ --null-string ‘nothing‘ --append --target-dir /user/test/sqoop -m 1
#查看Mysql有哪些数据库
sqoop list-databases --connect jdbc:mysql://192.168.56.151:3306/ --username root --password 123456
#查看Mysql数据库mysql中有哪些表
sqoop list-tables --connect jdbc:mysql://192.168.56.151:3306/mysql --username root --password 123456
#查看ORACLE数据库中有哪些数据库
sqoop list-databases --connect jdbc:oracle:thin:@10.10.244.136:1521:wilson --username system --password 123456
#将Oracle中system.ost表导入HDFS
sqoop import --connect jdbc:oracle:thin:@10.10.244.136:1521:wilson --username system --password 123456 --table SYSTEM.OST --delete-target-dir --target-dir /user/test/sqoop
#只导入到一个文件中
sqoop import --connect jdbc:oracle:thin:@10.10.244.136:1521:wilson --username system --password 123456 --table SYSTEM.OST --delete-target-dir --target-dir /user/test/sqoop -m 1
#hdfs到mysql
sqoop export --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test1 --export-dir /user/sqoop/mysql/output --fields-terminated-by ','
#hdfs到oracle
sqoop export --connect jdbc:oracle:thin:@192.168.56.150:1521/orcl --username yue --password yue --table TEST2 --export-dir /user/sqoop/mysql/output --fields-terminated-by ','


sqoop on hive

#mysql 到hive
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --target-dir /user/hive/warehouse/ip140.db/test_sqoop/dt=2016-04-15 --fields-terminated-by '\t'
#oracle 到hive
sqoop import --connect jdbc:oracle:thin:@192.168.56.150:1521/orcl --username yue --password yue --table TEST1 --target-dir /user/hive/warehouse/ip140.db/test_sqoop/dt=2016-04-14 --fields-terminated-by '\t'

#hive创建表
create table test_sqoop(id int,name string)
partitioned by (dt string)
row format delimited fields terminated by '\t' stored as textfile;

#修复表分区
MSCK REPAIR TABLE test_sqoop;

#导出数据和hdfs相同


sqoop on hbase

#从Mysql中抽取数据到Hbase
sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table mt --hbase-create-table --hbase-table mt --column-family cf --hbase-row-key year,month,day,sta_id
#将ORACLE中数据导入到Hbase
sqoop import --connect jdbc:oracle:thin:@192.168.56.150:1521/orcl --username yue --password yue --table TEST1 --hbase-create-table --hbase-table user:testoracle --column-family cf --hbase-row-key ID
#将oracle中数据导入到hbase,用多个字段做row-key
sqoop import --connect jdbc:oracle:thin:@192.168.56.151:1521/orcl --username yue --password yue --table SYSTEM.OMT --hbase-create-table --hbase-table omt --column-family cf --hbase-row-key YEAR,MONTH,DAY,STA_ID -m 1
#从Hbase中抽取数据到Mysql(目前不可以)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Sqoop Hadoop