Sqoop数据移植
2017-12-15 00:00
10 查看
摘要: Sqoop数据移植
这节课我们一起学习一下Sqoop,Sqoop是专门用来迁移数据的,它可以把数据库中的数据迁移到HDFS文件系统,当然也可以从HDFS文件系统导回到数据库。
我来说一下Sqoop的使用场景,假如你们公司有个项目运行好长时间了,积累了大量的数据,现在想升级项目并换种数据库进行存储原来的数据,那么我们就需要先把数据都存放到另一个地方,然后再用新数据库的语句把这些数据插入到新的数据库。在没有Sqoop之前,我们要做到这一点是很困难的,但是现在有了Sqoop,事情就变的简单多了,Sqoop是运行在Hadoop之上的一个工具,底层运用了MapReduce的技术,多台设备并行执行任务,速度当然大大提高,而且不用我们写这方面的代码,它提供了非常强大的命令,我们只需要知道怎样使用这些命令,再加上一些SQL语句就可以轻轻松松实现数据的迁移工作。
接下来我们便正式开始学习怎样使用Sqoop。
首先就是解压缩,重命名为sqoop,然后在文件/etc/profile中设置环境变量SQOOP_HOME。
把mysql的jdbc驱动mysql-connector-java-5.1.10.jar复制到sqoop项目的lib目录下。
在conf目录下,有两个文件sqoop-site.xml和sqoop-site-template.xml内容是完全一样的,不必在意,我们只关心sqoop-site.xml即可。
好了,搞定了,下面就可以运行了。
点击上图的"t_clue"文件夹之后我们会键入到如下图所示的界面,可以看到有4个结果文件,也就是说使用了4个mapper来参与导入操作了,我们还发现文件的名字中中间都是"m",“m”代表的意思是mapper生成的文件,"r"代表的意思是reducer生成的文件。列与列之间默认是以","分隔的。
sqoop 导数据的时候注意:表数据库存储引擎是InnoDB才可以正常导入,如果为MyISAM 会出现过如下错误:
Caused by: java.sql.SQLException: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '>='
执行上面的命令会出现如下所示的错误信息:must contain '$CONDITIONS' in WHERE clause.的意思是在我们的query的where条件当中必须有$CONDITIONS'这个条件,这个条件就相当于一个占位符,动态接收传过来的参数,从而查询出符合条件的结果。
我们在上面的命令语句当中加上$CONDITIONS,如下所示。然后再执行,就可以执行成功。
执行上面的命令,导入操作便可以成功。如下图所示。发现确实生成了两个文件,并且两个文件中的内容加起来刚好就是我们query中的条件r_id>10 and r_id < 40。这里我们再详细说说$CONDITIONS'的作用,sqoop首先根据shop_role.r_id将数据统计出来,然后传给$CONDITIONS',query语句就知道一共有多条数据了,假如第一个mapper读取了2条数据,那么也会把这个2传给$CONDITIONS,这样第二个mapper在读取数据的时候便可以根据第一个mapper读取的数量读取剩下的内容。
========================================================
========================================================
我们可以通过sqoop help命令来查看sqoop的命令选项,如下:
其中使用频率最高的选项还是import 和 export 选项。
举例:
上面实例以test数据库的order_info表来生成Java代码,其中-outdir指定了Java代码生成的路径
运行结果信息如下:
我们还可以使用-bindir指定编译成的class文件以及将生成文件打包为jar的jar包文件输出路径:
上面实例指定编译成的class文件(order_info.class)以及将生成文件打包为jar的jar包文件(order_info.jar)输出路径为/home/xiaosi/data路径,java文件(order_info.java)路径为/home/xiaosi/test
举例:
运行结果信息:
运行结果信息输出:
如果命令成功执行,会在控制台上显示更新的行的状态。或者我们可以在mysql中查询我们刚插入的那条信息:
举例:
在HDFS文件中的员工数据的一个例子,数据如下:
在将HDFS中数据导出到关系性数据库时,必须在关系性数据库中新建一张来接受数据的表,如下:
下面执行导出操作,命令如下:
运行结果信息输出:
导出完毕之后,我们可以在mysql中通过employee表进行查询:
举例:
如上代码从查询结果中导入数据到HDFS中,存储路径由--target-dir参数指定。这里,使用了--query选项,不能同时与--table选项使用。同时,变量$CONDITIONS必须在WHERE语句之后,供Sqoop进程运行命令过程中使用。
运行结果信息如下:
我们可以查看HDFS由参数--target-dir指定的路径查看导入的数据:
再看一个例子:
HDFS上会在/user/xiaosi/目录下新增一个目录order_info,与关系性数据库的表名一致,内容如下:
运行结果信息如下:
运行结果信息如下:
例如,在HDFS的路径/user/xiaosi/old下由一份导入数据,如下:
在HDFS的路径/user/xiaosi/new下也有一份数据,但是在导入时间在第一份之后,如下:
那么合并的结果为:
运行如下命令:
备注:
在一份数据集中,多行不应具有相同的主键,否则会发生数据丢失。
启动Metastore实例:
运行结果信息如下:
举例:
上面代码实现一个job,显示关系性数据库test数据库中所有的表。
上面代码执行我们已经定义好的Job,输出结果信息如下:
备注:
-- 和 list-tables(Job 所要执行的Sqoop命令) 不能挨着。
这节课我们一起学习一下Sqoop,Sqoop是专门用来迁移数据的,它可以把数据库中的数据迁移到HDFS文件系统,当然也可以从HDFS文件系统导回到数据库。
我来说一下Sqoop的使用场景,假如你们公司有个项目运行好长时间了,积累了大量的数据,现在想升级项目并换种数据库进行存储原来的数据,那么我们就需要先把数据都存放到另一个地方,然后再用新数据库的语句把这些数据插入到新的数据库。在没有Sqoop之前,我们要做到这一点是很困难的,但是现在有了Sqoop,事情就变的简单多了,Sqoop是运行在Hadoop之上的一个工具,底层运用了MapReduce的技术,多台设备并行执行任务,速度当然大大提高,而且不用我们写这方面的代码,它提供了非常强大的命令,我们只需要知道怎样使用这些命令,再加上一些SQL语句就可以轻轻松松实现数据的迁移工作。
接下来我们便正式开始学习怎样使用Sqoop。
1.安装
我们使用的版本是sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz首先就是解压缩,重命名为sqoop,然后在文件/etc/profile中设置环境变量SQOOP_HOME。
把mysql的jdbc驱动mysql-connector-java-5.1.10.jar复制到sqoop项目的lib目录下。
2.重命名配置文件
在${SQOOP_HOME}/conf中执行命令mv sqoop-env-template.sh sqoop-env.sh
在conf目录下,有两个文件sqoop-site.xml和sqoop-site-template.xml内容是完全一样的,不必在意,我们只关心sqoop-site.xml即可。
3.修改配置文件sqoop-env.sh
#Set path to where bin/hadoop is available export HADOOP_COMMON_HOME=/usr/local/hadoop/ #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME=/usr/local/hadoop #set the path to where bin/hbase is available export HBASE_HOME=/usr/local/hbase #Set the path to where bin/hive is available export HIVE_HOME=/usr/local/hive #Set the path for where zookeper config dir is export ZOOCFGDIR=/usr/local/zk
好了,搞定了,下面就可以运行了。
4.执行sqoop命令 mysql -> hdfs
1.sqoop list-databases 显示所有数据库
[root@centos1 sqoop-1.4.6]# ./bin/sqoop list-databases --connect jdbc:mysql://192.168.20.224:3306 --username root --password root Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 17/12/15 13:29:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/12/15 13:29:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/12/15 13:29:53 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. information_schema bak2 bak3 bbc5.1 hive mysql online performance_schema qf_crm_pms_new sakila sentry test world
2.sqoop import 移植整张表中的数据
[root@centos1 sqoop-1.4.6]# ./bin/sqoop list-databases --connect jdbc:mysql://192.168.20.224:3306 --username root --password root ..... [root@centos1 src]# hadoop fs -cat /user/root/t_clue/part-m-00000
点击上图的"t_clue"文件夹之后我们会键入到如下图所示的界面,可以看到有4个结果文件,也就是说使用了4个mapper来参与导入操作了,我们还发现文件的名字中中间都是"m",“m”代表的意思是mapper生成的文件,"r"代表的意思是reducer生成的文件。列与列之间默认是以","分隔的。
sqoop 导数据的时候注意:表数据库存储引擎是InnoDB才可以正常导入,如果为MyISAM 会出现过如下错误:
Caused by: java.sql.SQLException: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '>='
3. --target-dir指定存放服务器哪个目录 -m 要启mapper的数量
我们上面以最简单的方式导入了一下数据,现在我们再多使用两个参数来进入导入操作,我们使用的命令如下,可以看到我们加了两个参数,--target-dir(指定要存放到服务器的哪个目录下)和-m(指定要起的mapper的数量,注意:m前面是一个"-",其它参数前面是两个"--",由于用不到reducer合并数据,因此起几个mapper就会生成几个文件。)[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --table shop_activity --target-dir /sqoop/td1 -m 2
4. --fields-terminated-by 列与列的分隔符,--columns 指定要导出的列
我们接着来增加参数,--fields-terminated-by '\t'意思是指定列与列的分隔符为制表符,--columns 'ID,Name,Age'意思是我们要导入的只有ID、Name和Age这三列。如下所示[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --table shop_role --target-dir /sqoop/shop_role -m 2 --fields-terminated-by '\t' --columns 'r_id,r_name,r_description,r_alias' Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 17/12/15 14:59:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/12/15 14:59:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/12/15 14:59:34 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/12/15 14:59:34 INFO tool.CodeGenTool: Beginning code generation 17/12/15 14:59:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `shop_role` AS t LIMIT 1 17/12/15 14:59:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `shop_role` AS t LIMIT 1 17/12/15 14:59:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop Note: /tmp/sqoop-root/compile/8b37357b383ec68014ebcb4a5d12c4af/shop_role.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/12/15 14:59:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/8b37357b383ec68014ebcb4a5d12c4af/shop_role.jar 17/12/15 14:59:36 WARN manager.MySQLManager: It looks like you are importing from mysql. 17/12/15 14:59:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 17/12/15 14:59:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 17/12/15 14:59:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 17/12/15 14:59:36 INFO mapreduce.ImportJobBase: Beginning import of shop_role 17/12/15 14:59:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/12/15 14:59:37 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/12/15 14:59:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/12/15 14:59:38 INFO client.RMProxy: Connecting to ResourceManager at centos1/192.168.20.241:8032 17/12/15 14:59:41 INFO db.DBInputFormat: Using read commited transaction isolation 17/12/15 14:59:41 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`r_id`), MAX(`r_id`) FROM `shop_role` 17/12/15 14:59:41 WARN db.TextSplitter: Generating splits for a textual index column. 17/12/15 14:59:41 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records. 17/12/15 14:59:41 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column. 17/12/15 14:59:41 INFO mapreduce.JobSubmitter: number of splits:2 17/12/15 14:59:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1513320327311_0003 17/12/15 14:59:41 INFO impl.YarnClientImpl: Submitted application application_1513320327311_0003 17/12/15 14:59:41 INFO mapreduce.Job: The url to track the job: http://centos1:8088/proxy/application_1513320327311_0003/ 17/12/15 14:59:41 INFO mapreduce.Job: Running job: job_1513320327311_0003 17/12/15 14:59:51 INFO mapreduce.Job: Job job_1513320327311_0003 running in uber mode : false 17/12/15 14:59:51 INFO mapreduce.Job: map 0% reduce 0% 17/12/15 15:00:01 INFO mapreduce.Job: map 100% reduce 0% 17/12/15 15:00:02 INFO mapreduce.Job: Job job_1513320327311_0003 completed successfully 17/12/15 15:00:02 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=265102 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=227 HDFS: Number of bytes written=166 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=2 Other local map tasks=2 Total time spent by all maps in occupied slots (ms)=15792 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=15792 Total vcore-seconds taken by all map tasks=15792 Total megabyte-seconds taken by all map tasks=16171008 Map-Reduce Framework Map input records=4 Map output records=4 Input split bytes=227 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=372 CPU time spent (ms)=3190 Physical memory (bytes) snapshot=303677440 Virtual memory (bytes) snapshot=4180291584 Total committed heap usage (bytes)=174587904 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=166 17/12/15 15:00:02 INFO mapreduce.ImportJobBase: Transferred 166 bytes in 24.7169 seconds (6.7161 bytes/sec) 17/12/15 15:00:02 INFO mapreduce.ImportJobBase: Retrieved 4 records. [root@centos1 sqoop-1.4.6]#
[root@centos1 hadoop]# hadoop fs -ls /sqoop/shop_role 17/12/15 15:02:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items -rw-r--r-- 1 root supergroup 0 2017-12-15 15:00 /sqoop/shop_role/_SUCCESS -rw-r--r-- 1 root supergroup 126 2017-12-15 14:59 /sqoop/shop_role/part-m-00000 -rw-r--r-- 1 root supergroup 40 2017-12-15 14:59 /sqoop/shop_role/part-m-00001 [root@centos1 hadoop]# hadoop fs -cat /sqoop/shop_role/part-m-00000 17/12/15 15:02:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 11 超级管理员 超级管理员 经理 16 高级管理员 高级管理员 研发 18 普通管理员 普通管理员 经理 [root@centos1 hadoop]#
[b]5. --where 筛选数据并导入符合条件的数据[/b]
我们现在来玩一个更高级的,我们使用where条件来筛选数据并导入符合条件的数据,增加的参数是--where 'ID>=3 and ID<=8',顾名思义,就是要把ID从3到8的数据导入到服务器。[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --table shop_role --target-dir /sqoop/shop_role -m 2 --fields-terminated-by '\t' --columns 'r_id,r_name,r_description,r_alias' --where 'r_id>10 and r_id < 50' Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 17/12/15 15:07:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/12/15 15:07:09 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/12/15 15:07:09 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/12/15 15:07:09 INFO tool.CodeGenTool: Beginning code generation 17/12/15 15:07:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `shop_role` AS t LIMIT 1 17/12/15 15:07:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `shop_role` AS t LIMIT 1 17/12/15 15:07:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop Note: /tmp/sqoop-root/compile/2e08cd2ec396f0238e634ba17076b048/shop_role.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/12/15 15:07:12 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/2e08cd2ec396f0238e634ba17076b048/shop_role.jar 17/12/15 15:07:12 WARN manager.MySQLManager: It looks like you are importing from mysql. 17/12/15 15:07:12 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 17/12/15 15:07:12 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 17/12/15 15:07:12 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 17/12/15 15:07:12 INFO mapreduce.ImportJobBase: Beginning import of shop_role 17/12/15 15:07:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/12/15 15:07:12 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/12/15 15:07:13 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/12/15 15:07:13 INFO client.RMProxy: Connecting to ResourceManager at centos1/192.168.20.241:8032 17/12/15 15:07:17 INFO db.DBInputFormat: Using read commited transaction isolation 17/12/15 15:07:17 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`r_id`), MAX(`r_id`) FROM `shop_role` WHERE ( r_id>10 and r_id < 50 ) 17/12/15 15:07:17 WARN db.TextSplitter: Generating splits for a textual index column. 17/12/15 15:07:17 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records. 17/12/15 15:07:17 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column. 17/12/15 15:07:17 INFO mapreduce.JobSubmitter: number of splits:2 17/12/15 15:07:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1513320327311_0004 17/12/15 15:07:17 INFO impl.YarnClientImpl: Submitted application application_1513320327311_0004 17/12/15 15:07:17 INFO mapreduce.Job: The url to track the job: http://centos1:8088/proxy/application_1513320327311_0004/ 17/12/15 15:07:17 INFO mapreduce.Job: Running job: job_1513320327311_0004 17/12/15 15:07:26 INFO mapreduce.Job: Job job_1513320327311_0004 running in uber mode : false 17/12/15 15:07:26 INFO mapreduce.Job: map 0% reduce 0% 17/12/15 15:07:35 INFO mapreduce.Job: map 50% reduce 0% 17/12/15 15:07:36 INFO mapreduce.Job: map 100% reduce 0% 17/12/15 15:07:36 INFO mapreduce.Job: Job job_1513320327311_0004 completed successfully 17/12/15 15:07:36 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=265438 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=227 HDFS: Number of bytes written=166 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Job Counters Launched map tasks=2 Other local map tasks=2 Total time spent by all maps in occupied slots (ms)=14464 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=14464 Total vcore-seconds taken by all map tasks=14464 Total megabyte-seconds taken by all map tasks=14811136 Map-Reduce Framework Map input records=4 Map output records=4 Input split bytes=227 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=202 CPU time spent (ms)=2640 Physical memory (bytes) snapshot=313946112 Virtual memory (bytes) snapshot=4181368832 Total committed heap usage (bytes)=175112192 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=166 17/12/15 15:07:36 INFO mapreduce.ImportJobBase: Transferred 166 bytes in 23.1148 seconds (7.1815 bytes/sec) 17/12/15 15:07:36 INFO mapreduce.ImportJobBase: Retrieved 4 records. [root@centos1 sqoop-1.4.6]#
6. 使用query语句来筛选我们的数据
接下来我们玩更高级一点的,我们使用query语句来筛选我们的数据,这意味着我们可以导入多张表的数据,我们还是来个简单的,命令如下。我们发现使用query语句的话,就不用指定table了,由于数量很少,现在我们指定mapper的数量为1。我们执行下面的命令,该命令目前有个问题。[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --query 'select * from shop_role where r_id = 11' --target-dir /sqoop/shop_role -m 1 --fields-terminated-by '\t' Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 17/12/15 15:21:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/12/15 15:21:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/12/15 15:21:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/12/15 15:21:56 INFO tool.CodeGenTool: Beginning code generation 17/12/15 15:21:56 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [select * from shop_role where r_id = 11] must contain '$CONDITIONS' in WHERE clause. at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
执行上面的命令会出现如下所示的错误信息:must contain '$CONDITIONS' in WHERE clause.的意思是在我们的query的where条件当中必须有$CONDITIONS'这个条件,这个条件就相当于一个占位符,动态接收传过来的参数,从而查询出符合条件的结果。
我们在上面的命令语句当中加上$CONDITIONS,如下所示。然后再执行,就可以执行成功。
[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --query 'select * from shop_role where r_id >10 and r_id < 40 and $CONDITIONS' --target-dir /sqoop/shop_role -m 1 --fields-terminated-by '\t' ...... [root@centos1 hadoop]# hadoop fs -cat /sqoop/shop_role/part-m-00000 11 超级管理员 20150316095648 超级管理员 经理 16 高级管理员 20150825070151 高级管理员 研发 18 普通管理员 20150316095658 普通管理员 经理 20 b2c 1449022961714 仅可操作B2C功能。 manager [root@centos1 hadoop]#
--split-by
假如我们要把-m 1改成-m 2的话,导入操作会失败,我们来看一下命令执行及异常信息,如下所示。异常信息的意思是,我们没有指定mapper按什么规则来分割数据。即我这个mapper应该读取哪些数据,一个mapper的时候没有问题是因为它一个mapper就读取了所有数据,现在mapper的数量是2了,那么我第一个mapper读取多少数据,第二个mapper就读取第一个mapper剩下的数据,现在两个mapper缺少一个分割数据的条件,找一个唯一标识的一列作为分割条件,这样两个mapper便可以迅速知道表中一共有多少条数据,两者分别需要读取多少数据。[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --query 'select * from shop_role where r_id >10 and r_id < 40 and $CONDITIONS' --target-dir /sqoop/shop_role -m 2 --fields-terminated-by '\t' 17/12/15 15:30:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/12/15 15:30:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. When importing query results in parallel, you must specify --split-by.
[root@centos1 sqoop-1.4.6]# ./bin/sqoop import --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --query 'select * from shop_role where r_id >10 and r_id < 40 and $CONDITIONS' --target-dir /sqoop/shop_role -m 2 --fields-terminated-by '\t' --split-by shop_role.r_id
执行上面的命令,导入操作便可以成功。如下图所示。发现确实生成了两个文件,并且两个文件中的内容加起来刚好就是我们query中的条件r_id>10 and r_id < 40。这里我们再详细说说$CONDITIONS'的作用,sqoop首先根据shop_role.r_id将数据统计出来,然后传给$CONDITIONS',query语句就知道一共有多条数据了,假如第一个mapper读取了2条数据,那么也会把这个2传给$CONDITIONS,这样第二个mapper在读取数据的时候便可以根据第一个mapper读取的数量读取剩下的内容。
4.sqoop export 将服务器上的数据导入到数据库中
[root@centos1 sqoop-1.4.6]# ./bin/sqoop export --connect jdbc:mysql://192.168.20.224:3306/bbc5.1 --username root --password root --export-dir /sqoop/shop_role -m 1 --table shop_role_copy --fields-terminated-by '\t'
========================================================
========================================================
Sqoop命令汇总
Sqoop的本质还是一个命令行工具,和HDFS,MapReduce相比,并没有什么高深的理论。我们可以通过sqoop help命令来查看sqoop的命令选项,如下:
16/11/13 20:10:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS import-mainframe Import datasets from a mainframe server to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version information See 'sqoop help COMMAND' for information on a specific command.
其中使用频率最高的选项还是import 和 export 选项。
1. codegen
将关系型数据库表的记录映射为一个Java文件,Java class类以及相关的jar包,该命令将数据库表的记录映射为一个Java文件,在该Java文件中对应有表的各个字段。生成的jar和class文件在Metastore功能使用时会用到。该命令选项的参数如下图所示:举例:
sqoop codegen --connect jdbc:mysql://localhost:3306/test --table order_info -outdir /home/xiaosi/test/ --username root -password root
上面实例以test数据库的order_info表来生成Java代码,其中-outdir指定了Java代码生成的路径
运行结果信息如下:
16/11/13 21:50:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Enter password: 16/11/13 21:50:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 16/11/13 21:50:38 INFO tool.CodeGenTool: Beginning code generation 16/11/13 21:50:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1 16/11/13 21:50:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1 16/11/13 21:50:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.2 注: /tmp/sqoop-xiaosi/compile/ea41fe40e1f12f6b052ad9fe4a5d9710/order_info.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 16/11/13 21:50:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-xiaosi/compile/ea41fe40e1f12f6b052ad9fe4a5d9710/order_info.jar
我们还可以使用-bindir指定编译成的class文件以及将生成文件打包为jar的jar包文件输出路径:
16/11/13 21:53:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Enter password: 16/11/13 21:53:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 16/11/13 21:53:58 INFO tool.CodeGenTool: Beginning code generation 16/11/13 21:53:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1 16/11/13 21:53:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `order_info` AS t LIMIT 1 16/11/13 21:53:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.2 注: /home/xiaosi/data/order_info.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 16/11/13 21:53:59 INFO orm.CompilationManager: Writing jar file: /home/xiaosi/data/order_info.jar
上面实例指定编译成的class文件(order_info.class)以及将生成文件打包为jar的jar包文件(order_info.jar)输出路径为/home/xiaosi/data路径,java文件(order_info.java)路径为/home/xiaosi/test
2. create-hive-table
这个命令上一篇文章[Sqoop导入与导出]中已经使用过了,作用就是生成与关系数据库表的表结构对应的Hive表。该命令选项的参数如下图所示:举例:
sqoop create-hive-table --connect jdbc:mysql://localhost:3306/test --table employee --username root -password root --fields-terminated-by ','
3. eval
eval命令选项可以让Sqoop使用SQL语句对关系性数据库进行操作,在使用import这种工具进行数据导入的时候,可以预先了解相关的SQL语句是否正确,并能将结果显示在控制台。3.1 选择查询评估计算
使用eval工具,我们可以评估计算任何类型的SQL查询。我们以test数据库的order_info表为例子:sqoop eval --connect jdbc:mysql://localhost:3306/test --username root --query "select * from order_info limit 3" -P
运行结果信息:
16/11/13 22:25:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Enter password: 16/11/13 22:25:22 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. ------------------------------------------------------------ | id | order_time | business | ------------------------------------------------------------ | 358574046793404 | 2016-04-05 | flight | | 358574046794733 | 2016-08-03 | hotel | | 358574050631177 | 2016-05-08 | vacation | ------------------------------------------------------------
3.2 插入评估计算
Sqoop的eval工具可以适用于两个模拟和定义的SQL语句。这意味着,我们可以使用eval的INSERT语句了。下面的命令用于在test数据库的order_info表中插入新行:sqoop eval --connect jdbc:mysql://localhost:3306/test --username root --query "insert into order_info (id, order_time, business) values('358574050631166', '2016-11-13', 'hotel')" -P
运行结果信息输出:
16/11/13 22:29:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Enter password: 16/11/13 22:29:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 16/11/13 22:29:44 INFO tool.EvalSqlTool: 1 row(s) updated.
如果命令成功执行,会在控制台上显示更新的行的状态。或者我们可以在mysql中查询我们刚插入的那条信息:
mysql> select * from order_info where id = "358574050631166"; +-----------------+------------+----------+ | id | order_time | business | +-----------------+------------+----------+ | 358574050631166 | 2016-11-13 | hotel | +-----------------+------------+----------+ 1 row in set (0.00 sec)
4. export
从HDFS中将数据导出到关系性数据库中。该命令选项的参数如下图所示:举例:
在HDFS文件中的员工数据的一个例子,数据如下:
hadoop fs -text /user/xiaosi/employee/* | less yoona,qunar,创新事业部 xiaosi,qunar,创新事业部 jim,ali,淘宝 kom,ali,淘宝 lucy,baidu,搜索 jim,ali,淘宝
在将HDFS中数据导出到关系性数据库时,必须在关系性数据库中新建一张来接受数据的表,如下:
CREATE TABLE `employee` ( `name` varchar(255) DEFAULT NULL, `company` varchar(255) DEFAULT NULL, `depart` varchar(255) DEFAULT NULL );
下面执行导出操作,命令如下:
sqoop export --connect jdbc:mysql://localhost:3306/test --table employee --export-dir /user/xiaosi/employee --username root -m 1 --fields-terminated-by ',' -P
运行结果信息输出:
16/11/13 23:40:49 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 16/11/13 23:40:49 INFO mapreduce.Job: Running job: job_local611430785_0001 16/11/13 23:40:49 INFO mapred.LocalJobRunner: OutputCommitter set in config null 16/11/13 23:40:49 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.sqoop.mapreduce.NullOutputCommitter 16/11/13 23:40:49 INFO mapred.LocalJobRunner: Waiting for map tasks 16/11/13 23:40:49 INFO mapred.LocalJobRunner: Starting task: attempt_local611430785_0001_m_000000_0 16/11/13 23:40:49 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 16/11/13 23:40:49 INFO mapred.MapTask: Processing split: Paths:/user/xiaosi/employee/part-m-00000:0+120 16/11/13 23:40:49 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file 16/11/13 23:40:49 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start 16/11/13 23:40:49 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length 16/11/13 23:40:49 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false 16/11/13 23:40:49 INFO mapred.LocalJobRunner: 16/11/13 23:40:49 INFO mapred.Task: Task:attempt_local611430785_0001_m_000000_0 is done. And is in the process of committing 16/11/13 23:40:49 INFO mapred.LocalJobRunner: map 16/11/13 23:40:49 INFO mapred.Task: Task 'attempt_local611430785_0001_m_000000_0' done. 16/11/13 23:40:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local611430785_0001_m_000000_0 16/11/13 23:40:49 INFO mapred.LocalJobRunner: map task executor complete. 16/11/13 23:40:50 INFO mapreduce.Job: Job job_local611430785_0001 running in uber mode : false 16/11/13 23:40:50 INFO mapreduce.Job: map 100% reduce 0% 16/11/13 23:40:50 INFO mapreduce.Job: Job job_local611430785_0001 completed successfully 16/11/13 23:40:50 INFO mapreduce.Job: Counters: 20 File System Counters FILE: Number of bytes read=22247825 FILE: Number of bytes written=22732498 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=126 HDFS: Number of bytes written=0 HDFS: Number of read operations=12 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Map-Reduce Framework Map input records=6 Map output records=6 Input split bytes=136 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=0 Total committed heap usage (bytes)=245366784 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 16/11/13 23:40:50 INFO mapreduce.ExportJobBase: Transferred 126 bytes in 2.3492 seconds (53.6344 bytes/sec) 16/11/13 23:40:50 INFO mapreduce.ExportJobBase: Exported 6 records.
导出完毕之后,我们可以在mysql中通过employee表进行查询:
mysql> select name, company from employee; +--------+---------+ | name | company | +--------+---------+ | yoona | qunar | | xiaosi | qunar | | jim | ali | | kom | ali | | lucy | baidu | | jim | ali | +--------+---------+ 6 rows in set (0.00 sec)
5. import
将数据表中的数据导入HDFS或者Hive中,该命令选项的参数如下图所示:举例:
sqoop import --connect jdbc:mysql://localhost:3306/test --target-dir /user/xiaosi/data/order_info --query 'select * from order_info where $CONDITIONS' -m 1 --username root -P
如上代码从查询结果中导入数据到HDFS中,存储路径由--target-dir参数指定。这里,使用了--query选项,不能同时与--table选项使用。同时,变量$CONDITIONS必须在WHERE语句之后,供Sqoop进程运行命令过程中使用。
运行结果信息如下:
16/11/14 12:08:50 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 16/11/14 12:08:50 INFO mapreduce.Job: Running job: job_local127577466_0001 16/11/14 12:08:50 INFO mapred.LocalJobRunner: OutputCommitter set in config null 16/11/14 12:08:50 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/11/14 12:08:50 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 16/11/14 12:08:50 INFO mapred.LocalJobRunner: Waiting for map tasks 16/11/14 12:08:50 INFO mapred.LocalJobRunner: Starting task: attempt_local127577466_0001_m_000000_0 16/11/14 12:08:50 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 16/11/14 12:08:50 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 16/11/14 12:08:50 INFO db.DBInputFormat: Using read commited transaction isolation 16/11/14 12:08:50 INFO mapred.MapTask: Processing split: 1=1 AND 1=1 16/11/14 12:08:50 INFO db.DBRecordReader: Working on split: 1=1 AND 1=1 16/11/14 12:08:50 INFO db.DBRecordReader: Executing query: select * from order_info where ( 1=1 ) AND ( 1=1 ) 16/11/14 12:08:50 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false 16/11/14 12:08:50 INFO mapred.LocalJobRunner: 16/11/14 12:08:51 INFO mapred.Task: Task:attempt_local127577466_0001_m_000000_0 is done. And is in the process of committing 16/11/14 12:08:51 INFO mapred.LocalJobRunner: 16/11/14 12:08:51 INFO mapred.Task: Task attempt_local127577466_0001_m_000000_0 is allowed to commit now 16/11/14 12:08:51 INFO output.FileOutputCommitter: Saved output of task 'attempt_local127577466_0001_m_000000_0' to hdfs://localhost:9000/user/xiaosi/data/order_info/_temporary/0/task_local127577466_0001_m_000000 16/11/14 12:08:51 INFO mapred.LocalJobRunner: map 16/11/14 12:08:51 INFO mapred.Task: Task 'attempt_local127577466_0001_m_000000_0' done. 16/11/14 12:08:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local127577466_0001_m_000000_0 16/11/14 12:08:51 INFO mapred.LocalJobRunner: map task executor complete. 16/11/14 12:08:51 INFO mapreduce.Job: Job job_local127577466_0001 running in uber mode : false 16/11/14 12:08:51 INFO mapreduce.Job: map 100% reduce 0% 16/11/14 12:08:51 INFO mapreduce.Job: Job job_local127577466_0001 completed successfully 16/11/14 12:08:51 INFO mapreduce.Job: Counters: 20 File System Counters FILE: Number of bytes read=22247784 FILE: Number of bytes written=22732836 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=0 HDFS: Number of bytes written=3710 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Map-Reduce Framework Map input records=111 Map output records=111 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=0 Total committed heap usage (bytes)=245366784 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=3710 16/11/14 12:08:51 INFO mapreduce.ImportJobBase: Transferred 3.623 KB in 2.5726 seconds (1.4083 KB/sec) 16/11/14 12:08:51 INFO mapreduce.ImportJobBase: Retrieved 111 records.
我们可以查看HDFS由参数--target-dir指定的路径查看导入的数据:
hadoop fs -text /user/xiaosi/data/order_info/* | less 358574046793404,2016-04-05,flight 358574046794733,2016-08-03,hotel 358574050631177,2016-05-08,vacation 358574050634213,2015-04-28,train 358574050634692,2016-04-05,tuan 358574050650524,2015-07-26,hotel 358574050654773,2015-01-23,flight 358574050668658,2015-01-23,hotel 358574050730771,2016-11-06,train 358574050731241,2016-05-08,car 358574050743865,2015-01-23,vacation 358574050767666,2015-04-28,train 358574050767971,2015-07-26,flight 358574050808288,2016-05-08,hotel 358574050816828,2015-01-23,hotel 358574050818220,2015-04-28,car 358574050821877,2013-08-03,flight
再看一个例子:
sqoop import --connect jdbc:mysql://localhost:3306/test --table order_info --columns "business,id,order_time" -m 1 --username root -P
HDFS上会在/user/xiaosi/目录下新增一个目录order_info,与关系性数据库的表名一致,内容如下:
flight,358574046793404,2016-04-05 hotel,358574046794733,2016-08-03 vacation,358574050631177,2016-05-08 train,358574050634213,2015-04-28 tuan,358574050634692,2016-04-05
6. import-all-tables
将数据库里的所有表导入HDFS中,每个表在HDFS中对应一个独立的目录。该命令选项的参数如下图所示:7. list-databases
该命令选项可以列出关系性数据库的所有数据库名,命令如下:sqoop list-databases --connect jdbc:mysql://localhost:3306 --username root -P
运行结果信息如下:
16/11/14 14:30:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Enter password: 16/11/14 14:30:14 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. information_schema hive_db mysql performance_schema phpmyadmin test
8. list-tables
该命令选项可以列出关系性数据库的某一个数据库的所有表名,命令如下:sqoop list-tables --connect jdbc:mysql://localhost:3306/test --username root -P
运行结果信息如下:
16/11/14 14:32:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Enter password: 16/11/14 14:32:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. PageView book bookID cc city_click country country2 cup employee flightOrder hotel_book_info hotel_info order_info stu stu2 stu3 stuInfo student
9. merge
该命令选项的作用是将HDFS上的两份数据进行合并,在合并的同时进行数据去重。该命令选项的参数如下图所示:例如,在HDFS的路径/user/xiaosi/old下由一份导入数据,如下:
id name 1 a 2 b 3 c
在HDFS的路径/user/xiaosi/new下也有一份数据,但是在导入时间在第一份之后,如下:
id name 1 a2 2 b 3 c
那么合并的结果为:
id name 1 a2 2 b 3 c
运行如下命令:
sqoop merge -new-data /user/xiaosi/new/part-m-00000 -onto /user/xiaosi/old/part-m-00000 -target-dir /user/xiaosi/final -jar-file /home/xiaosi/test/testmerge.jar -class-name testmerge -merge-key id
备注:
在一份数据集中,多行不应具有相同的主键,否则会发生数据丢失。
10. metastore
记录Sqoop作业的元数据信息,如果不启动Metastore实例,则默认的元数据存储目录为~/.sqoop。如果要更改存储目录,可以在配置文件sqoop-site.xml中进行更改。启动Metastore实例:
sqoop metastore
运行结果信息如下:
16/11/14 14:44:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 16/11/14 14:44:40 WARN hsqldb.HsqldbMetaStore: The location for metastore data has not been explicitly set. Placing shared metastore files in /home/xiaosi/.sqoop/shared-metastore.db [Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) entered [Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) exited [Server@52308be6]: [Thread[main,5,main]]: setDatabasePath(0,file:/home/xiaosi/.sqoop/shared-metastore.db) [Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) entered [Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) exited [Server@52308be6]: [Thread[main,5,main]]: setDatabaseName(0,sqoop) [Server@52308be6]: [Thread[main,5,main]]: putPropertiesFromString(): [hsqldb.write_delay=false] [Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) entered [Server@52308be6]: [Thread[main,5,main]]: checkRunning(false) exited [Server@52308be6]: Initiating startup sequence... [Server@52308be6]: Server socket opened successfully in 3 ms. [Server@52308be6]: Database [index=0, id=0, db=file:/home/xiaosi/.sqoop/shared-metastore.db, alias=sqoop] opened sucessfully in 153 ms. [Server@52308be6]: Startup sequence completed in 157 ms. [Server@52308be6]: 2016-11-14 14:44:40.414 HSQLDB server 1.8.0 is online [Server@52308be6]: To close normally, connect and execute SHUTDOWN SQL [Server@52308be6]: From command line, use [Ctrl]+[C] to abort abruptly 16/11/14 14:44:40 INFO hsqldb.HsqldbMetaStore: Server started on port 16000 with protocol HSQL
11. job
该命令选项可以生产一个Sqoop的作业,但是不会立即执行,需要手动执行,该命令选项目的在于尽可能的服用Sqoop命令。该命令选项的参数如下图所示:举例:
sqoop job -create listTablesJob -- list-tables --connect jdbc:mysql://localhost:3306/test --username root -P
上面代码实现一个job,显示关系性数据库test数据库中所有的表。
sqoop job -exec listTablesJob
上面代码执行我们已经定义好的Job,输出结果信息如下:
16/11/14 19:51:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Enter password: 16/11/14 19:51:47 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. PageView book bookID cc city_click country country2 cup employee flightOrder hotel_book_info hotel_info order_info stu stu2 stu3 stuInfo student
备注:
-- 和 list-tables(Job 所要执行的Sqoop命令) 不能挨着。
Sqoop 导入导出
#从Mysql中抽取数据到HDFS.问题:文件太多,全他妈是小文件;目标目录如果已经存在会报错 sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --target-dir /user/sqoop/mysql/input -m 1 #向已经存在HDFS目录追加数据 sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --append --target-dir /user/test/sqoop #name 是string类型的,如果是null,导入的时候用nothing替换 sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --null-string ‘nothing‘ --append --target-dir /user/test/sqoop #age是int类型,如果是null,导入的时候用-1替换 sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --null-string ‘nothing‘ --null-non-string -1 --append --target-dir /user/test/sqoop #仅仅导入id,name两个字段 sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --columns id,name --null-string ‘nothing‘ --append --target-dir /user/test/sqoop #字段间以|分割 sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --columns id,name --fields-terminated-by ‘|‘ --null-string ‘nothing‘ --append --target-dir /user/test/sqoop #只导入name不为null的id,name sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --columns id,name --where "name is not null" --fields-terminated-by ‘|‘ --null-string ‘nothing‘ --append --target-dir /user/test/sqoop #使用--query代替--table --cloumns --where sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --query "select id,name from st where id > 10 and \$CONDITIONS" --split-by id --fields-terminated-by ‘|‘ --null-string ‘nothing‘ --append --target-dir /user/test/sqoop #将所有数据放到一个文件中(东东那么少) sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --query "select id,name from st where id > 10 and \$CONDITIONS" --split-by id --fields-terminated-by ‘|‘ --null-string ‘nothing‘ --append --target-dir /user/test/sqoop -m 1 #查看Mysql有哪些数据库 sqoop list-databases --connect jdbc:mysql://192.168.56.151:3306/ --username root --password 123456 #查看Mysql数据库mysql中有哪些表 sqoop list-tables --connect jdbc:mysql://192.168.56.151:3306/mysql --username root --password 123456 #查看ORACLE数据库中有哪些数据库 sqoop list-databases --connect jdbc:oracle:thin:@10.10.244.136:1521:wilson --username system --password 123456 #将Oracle中system.ost表导入HDFS sqoop import --connect jdbc:oracle:thin:@10.10.244.136:1521:wilson --username system --password 123456 --table SYSTEM.OST --delete-target-dir --target-dir /user/test/sqoop #只导入到一个文件中 sqoop import --connect jdbc:oracle:thin:@10.10.244.136:1521:wilson --username system --password 123456 --table SYSTEM.OST --delete-target-dir --target-dir /user/test/sqoop -m 1 #hdfs到mysql sqoop export --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test1 --export-dir /user/sqoop/mysql/output --fields-terminated-by ',' #hdfs到oracle sqoop export --connect jdbc:oracle:thin:@192.168.56.150:1521/orcl --username yue --password yue --table TEST2 --export-dir /user/sqoop/mysql/output --fields-terminated-by ','
sqoop on hive
#mysql 到hive sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table test --target-dir /user/hive/warehouse/ip140.db/test_sqoop/dt=2016-04-15 --fields-terminated-by '\t' #oracle 到hive sqoop import --connect jdbc:oracle:thin:@192.168.56.150:1521/orcl --username yue --password yue --table TEST1 --target-dir /user/hive/warehouse/ip140.db/test_sqoop/dt=2016-04-14 --fields-terminated-by '\t' #hive创建表 create table test_sqoop(id int,name string) partitioned by (dt string) row format delimited fields terminated by '\t' stored as textfile; #修复表分区 MSCK REPAIR TABLE test_sqoop; #导出数据和hdfs相同
sqoop on hbase
#从Mysql中抽取数据到Hbase sqoop import --connect jdbc:mysql://192.168.56.151:3306/test --username root --password 123456 --table mt --hbase-create-table --hbase-table mt --column-family cf --hbase-row-key year,month,day,sta_id #将ORACLE中数据导入到Hbase sqoop import --connect jdbc:oracle:thin:@192.168.56.150:1521/orcl --username yue --password yue --table TEST1 --hbase-create-table --hbase-table user:testoracle --column-family cf --hbase-row-key ID #将oracle中数据导入到hbase,用多个字段做row-key sqoop import --connect jdbc:oracle:thin:@192.168.56.151:1521/orcl --username yue --password yue --table SYSTEM.OMT --hbase-create-table --hbase-table omt --column-family cf --hbase-row-key YEAR,MONTH,DAY,STA_ID -m 1 #从Hbase中抽取数据到Mysql(目前不可以)
相关文章推荐
- SQOOP 导入数据列中出现换行
- sqoop数据迁移(基于Hadoop和关系数据库服务器之间传送数据)
- Oracle GoldenGate (以下简称ogg)在异种移植os同一种db之间的数据同步。
- 记录Sqlserver2012附加Sqlserver2008的数据库出错的解决方案一、摘要 最近在实验里面用台式编写好了一个软件,想移植到家里的笔记本上。在附加数据的时候却出现了错误,具体也没有提示
- 2018-09-07期 Sqoop将关系型数据导入到HDFS文件系统
- Hadoop数据传输工具sqoop(一)简介
- 1.4 使用Sqoop从MySQL数据库导入数据到HDFS
- sqoop把hive数据导入到DB2
- mysql小批量数据移植
- 使用sqoop将oracle数据导入hdfs集群
- 利用SQOOP将数据从数据库导入到HDFS
- Sqoop增量从MySQL中向hive导入数据
- Hadoop之Sqoop导出hdfs数据到Mysql
- sqoop导入多条数据到mysql(使用crontab定时任务)
- 通过sqoop导入Oracle数据到Hive时异常.IOException: Cannot run program "hive": error=2, No such file or directory
- sqoop导入数据出错ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Access denied
- CALL CALLB CALLP调用(数据移植)
- STM32移植RT-Thread后的串口在调试助手上出现:(mq != RT_NULL) assert failed at rt_mq_recv:2085和串口只发送数据不能接收数据问题
- 使用sqoop实现关系型数据库、HDFS、Hive之间数据的导入导出
- Sqoop安装配置及将mysql数据导入到hdfs中