Sqoop进行MySQL和Hive间的导入导出操作
2016-12-26 16:24
621 查看
MySQL表数据导入到Hive中
1 MySQL建表
2 Sqoop创建Hive表
3 Sqoop导入数据到Hive
Hive表数据导出到MySQL表中
1 MySQL建表
2 Sqoop导出数据到MySQL
Sqoop的eval操作
Sqoop的job操作
1 创建job
2 查看job集合
3 显示job详细信息
4 执行job
5 删除job
Sqoop的codegen操作
Sqoop将MySQL增量数据导入与合并操作
1 MySQL建表
2 MySQL现有数据导入到HDFS
3 MySQL插入增量数据
4 MySQL增量数据导入到HDFS
5 codegen生成jar
6 合并
7 合并后的数据导入到Hive表
1.Sqoop在MySQL的数据和Hive之间进行导入导出的操作
2.Sqoop的eval操作
3.Sqoop的job操作
4.Sqoop的codegen生成java代码
5.Sqoop将MySQL的增量数据进行导入与合并操作,并将合并后的数据导入到Hive中
Sqoop1.4.6的安装及命令使用参见 Sqoop1.4.6的安装及命令行使用
Hive的安装参见Hive2.1.1的安装
注:Hadoop集群需要先启动。
执行成功显示
进入到Hive命令行中,在default数据库中生成了表a,无数据
导入成功
HDFS上查看导入结果
Hive Shell里查看导入的数据
mysql数据库查看结果
–query换成-e也行
或
查看job执行成功后HDFS上的结果
Table 5. Incremental import arguments:
Sqoop supports two types of incremental imports:
You should specify
An alternate table update strategy supported by Sqoop is called
At the end of an incremental import, the value which should be specified as
编译生成了/tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar
注:–class-name a(这里是表名)
检查HDFS,会从HDFS中移动到Hive仓库
本文参考:
http://blog.csdn.net/gdmzlhj1/article/details/50483171
1 MySQL建表
2 Sqoop创建Hive表
3 Sqoop导入数据到Hive
Hive表数据导出到MySQL表中
1 MySQL建表
2 Sqoop导出数据到MySQL
Sqoop的eval操作
Sqoop的job操作
1 创建job
2 查看job集合
3 显示job详细信息
4 执行job
5 删除job
Sqoop的codegen操作
Sqoop将MySQL增量数据导入与合并操作
1 MySQL建表
2 MySQL现有数据导入到HDFS
3 MySQL插入增量数据
4 MySQL增量数据导入到HDFS
5 codegen生成jar
6 合并
7 合并后的数据导入到Hive表
1.Sqoop在MySQL的数据和Hive之间进行导入导出的操作
2.Sqoop的eval操作
3.Sqoop的job操作
4.Sqoop的codegen生成java代码
5.Sqoop将MySQL的增量数据进行导入与合并操作,并将合并后的数据导入到Hive中
Sqoop1.4.6的安装及命令使用参见 Sqoop1.4.6的安装及命令行使用
Hive的安装参见Hive2.1.1的安装
注:Hadoop集群需要先启动。
1.MySQL表数据导入到Hive中
1.1 MySQL建表
进入到mysql命令行,在mydb数据库下建立表’a’,并插入数据mysql> use mydb; Database changed mysql> show tables; +----------------+ | Tables_in_mydb | +----------------+ | user | | user2 | +----------------+ 2 rows in set (0.00 sec) mysql> create table a(id int,name varchar(50)); Query OK, 0 rows affected (0.14 sec) mysql> insert into a values(1,'a1'); Query OK, 1 row affected (0.13 sec) mysql> insert into a values(2,'a2'); Query OK, 1 row affected (0.03 sec) mysql> select * from a; +------+------+ | id | name | +------+------+ | 1 | a1 | | 2 | a2 | +------+------+ 2 rows in set (0.00 sec)
1.2 Sqoop创建Hive表
sqoop create-hive-table --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a --hive-table a --fields-terminated-by ',' --hive-overwrite
执行成功显示
进入到Hive命令行中,在default数据库中生成了表a,无数据
1.3 Sqoop导入数据到Hive
sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a --hive-import --fields-terminated-by ',' --hive-overwrite -m 1
导入成功
HDFS上查看导入结果
hadoop fs -cat /user/hive/warehouse/a/part-m-00000
Hive Shell里查看导入的数据
2. Hive表数据导出到MySQL表中
2.1 MySQL建表
在mydb数据库下建立表’b’mysql> create table b(id int,name varchar(50)); Query OK, 0 rows affected (0.07 sec) mysql> select * from b; Empty set (0.00 sec)
2.2 Sqoop导出数据到MySQL
将Hive仓库a文件夹数据导出到mysql中的表bsqoop export --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table b --export-dir /user/hive/warehouse/a --input-fields-terminated-by ','
mysql数据库查看结果
mysql> select * from b; +------+------+ | id | name | +------+------+ | 2 | a2 | | 1 | a1 | +------+------+ 2 rows in set (0.00 sec)
3. Sqoop的eval操作
sqoop eval连接mysql,直接进行select和DML操作sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --query 'select * from a'
–query换成-e也行
sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root -e "insert into a values (3,'a3')" sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --query "insert into a values (4,'a4')"
[root@cyyun ~]# sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --e 'select * from a' 16/12/23 20:35:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 16/12/23 20:35:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/12/23 20:35:50 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. -------------------------------------- | id | name | -------------------------------------- | 1 | a1 | | 2 | a2 | | 3 | a3 | | 4 | a4 | --------------------------------------
4. Sqoop的job操作
4.1 创建job
创建一个job,名为myjob,作用是将MySQL数据库中的表a数据导入到HDFS的/test/a_old目录下sqoop job --create myjob -- import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a -m 1 --target-dir /test/a_old
4.2 查看job集合
sqoop job --list
4.3 显示job详细信息
[root@cyyun ~]# sqoop job --show myjob 16/12/23 20:41:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Enter password: Job: myjob Tool: import Options: ---------------------------- verbose = false db.connect.string = jdbc:mysql://192.168.1.97:3306/mydb codegen.output.delimiters.escape = 0 codegen.output.delimiters.enclose.required = false codegen.input.delimiters.field = 0 hbase.create.table = false db.require.password = true hdfs.append.dir = false db.table = a codegen.input.delimiters.escape = 0 import.fetch.size = null accumulo.create.table = false codegen.input.delimiters.enclose.required = false db.username = root reset.onemapper = false codegen.output.delimiters.record = 10 import.max.inline.lob.size = 16777216 hbase.bulk.load.enabled = false hcatalog.create.table = false db.clear.staging.table = false codegen.input.delimiters.record = 0 enable.compression = false hive.overwrite.table = false hive.import = false codegen.input.delimiters.enclose = 0 accumulo.batch.size = 10240000 hive.drop.delims = false codegen.output.delimiters.enclose = 0 hdfs.delete-target.dir = false codegen.output.dir = . codegen.auto.compile.dir = true relaxed.isolation = false mapreduce.num.mappers = 1 accumulo.max.latency = 5000 import.direct.split.size = 0 codegen.output.delimiters.field = 44 export.new.update = UpdateOnly incremental.mode = None hdfs.file.format = TextFile codegen.compile.dir = /tmp/sqoop-root/compile/1df5c52bab08ce1b6e9ae7c193b88264 direct.import = false hdfs.target.dir = /test/a_old hive.fail.table.exists = false db.batch = false
4.4 执行job
sqoop job --exec myjob
或
sqoop job --exec myjob -- --username root
查看job执行成功后HDFS上的结果
hadoop fs -ls /test/a_old hadoop fs -cat /test/a_old/part-m-00000
4.5 删除job
sqoop job --delete myjob
5. Sqoop的codegen操作
sqoop codegen --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a
[root@cyyun ~]# ll /tmp/sqoop-root/compile/13bae8fcd124763b61658f7c4ed8937e/ total 28 -rw-r--r--. 1 root root 8715 Dec 23 21:40 a.class -rw-r--r--. 1 root root 3616 Dec 23 21:40 a.jar -rw-r--r--. 1 root root 10346 Dec 23 21:40 a.java
6. Sqoop将MySQL增量数据导入与合并操作
Sqoop官网描述Sqoop User Guide:Table 5. Incremental import arguments:
Argument | Description |
---|---|
--check-column (col) | Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR) |
--incremental (mode) | Specifies how Sqoop determines which rows are new. Legal values for modeinclude appendand lastmodified. |
--last-value (value) | Specifies the maximum value of the check column from the previous import. |
appendand
lastmodified. You can use the
--incrementalargument to specify the type of incremental import to perform.
You should specify
appendmode when importing a table where new rows are continually being added with increasing row id values. You specify the column containing the row’s id with
--check-column. Sqoop imports rows where the check column has a value greater than the one specified with
--last-value.
An alternate table update strategy supported by Sqoop is called
lastmodifiedmode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with
--last-valueare imported.
At the end of an incremental import, the value which should be specified as
--last-valuefor a subsequent import is printed to the screen. When running a subsequent import, you should specify
--last-valuein this way to ensure you import only the new or updated data. This is handled automatically by creating an incremental import as a saved job, which is the preferred mechanism for performing a recurring incremental import. See the section on saved jobs later in this document for more information.
6.1 MySQL建表
建表并导入数据mysql> drop table a; Query OK, 0 rows affected (0.81 sec) mysql> create table a(id int,name varchar(50),crt_date timestamp); Query OK, 0 rows affected (0.24 sec) mysql> insert into a values(1,'a1',sysdate()); Query OK, 1 row affected (0.20 sec) mysql> insert into a values(2,'a2',sysdate()); Query OK, 1 row affected (0.06 sec) mysql> insert into a values(3,'a3',sysdate()); Query OK, 1 row affected (0.07 sec) mysql> select * from a; +------+------+---------------------+ | id | name | crt_date | +------+------+---------------------+ | 1 | a1 | 2016-12-26 17:03:50 | | 2 | a2 | 2016-12-26 17:04:00 | | 3 | a3 | 2016-12-26 17:04:19 | +------+------+---------------------+ 3 rows in set (0.02 sec)
6.2 MySQL现有数据导入到HDFS
sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a -m 1 --target-dir /test/a_1
[root@cyyun ~]# hadoop fs -cat /test/a_1/* 16/12/24 01:20:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1,a1,2016-12-26 17:03:50.0 2,a2,2016-12-26 17:04:00.0 3,a3,2016-12-26 17:04:19.0
6.3 MySQL插入增量数据
mysql> insert into a values(4,'a4',sysdate()); Query OK, 1 row affected (0.04 sec) mysql> insert into a values(5,'a5',sysdate()); Query OK, 1 row affected (0.09 sec) mysql> select * from a; +------+------+---------------------+ | id | name | crt_date | +------+------+---------------------+ | 1 | a1 | 2016-12-26 17:03:50 | | 2 | a2 | 2016-12-26 17:04:00 | | 3 | a3 | 2016-12-26 17:04:19 | | 4 | a4 | 2016-12-26 17:09:23 | | 5 | a5 | 2016-12-26 17:09:36 | +------+------+---------------------+ 5 rows in set (0.00 sec)
6.4 MySQL增量数据导入到HDFS
sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a -m 1 --target-dir /test/a_2 --incremental lastmodified --check-column crt_date --last-value "2016-12-26 17:04:20"
--where crt_date>="2016-12-26 17:04:20",时间要比id=3大一点,不然会把第3条导进去
[root@cyyun ~]# hadoop fs -cat /test/a_2/* 16/12/24 01:27:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 4,a4,2016-12-26 17:09:23.0 5,a5,2016-12-26 17:09:36.0
6.5 codegen生成jar
[root@cyyun ~]# sqoop codegen --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a 16/12/24 01:30:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 16/12/24 01:30:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/12/24 01:30:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 16/12/24 01:30:11 INFO tool.CodeGenTool: Beginning code generation 16/12/24 01:30:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `a` AS t LIMIT 1 16/12/24 01:30:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `a` AS t LIMIT 1 16/12/24 01:30:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop-2.6.4 Note: /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 16/12/24 01:30:15 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar
编译生成了/tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar
6.6 合并
sqoop merge --new-data /test/a_2 --onto /test/a_1 --target-dir /test/a_merge --jar-file /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar --class-name a --merge-key id
注:–class-name a(这里是表名)
[root@cyyun ~]# hadoop fs -cat /test/a_merge/* 16/12/24 01:36:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1,a1,2016-12-26 17:03:50.0 2,a2,2016-12-26 17:04:00.0 3,a3,2016-12-26 17:04:19.0 4,a4,2016-12-26 17:09:23.0 5,a5,2016-12-26 17:09:36.0
6.7 合并后的数据导入到Hive表
hive> drop table a; OK Time taken: 17.45 seconds hive> create table a(id int,name string,crt_date string) row format delimited fields terminated by ','; OK Time taken: 1.586 seconds hive> load data inpath '/test/a_merge/part*' into table a; Loading data to table default.a OK Time taken: 2.073 seconds hive> select * from a; OK 1 a1 2016-12-26 17:03:50.0 2 a2 2016-12-26 17:04:00.0 3 a3 2016-12-26 17:04:19.0 4 a4 2016-12-26 17:09:23.0 5 a5 2016-12-26 17:09:36.0 Time taken: 0.191 seconds, Fetched: 5 row(s) hive>
检查HDFS,会从HDFS中移动到Hive仓库
[root@cyyun ~]# hadoop fs -ls /test/a_merge 16/12/24 01:44:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 root supergroup 0 2016-12-24 01:35 /test/a_merge/_SUCCESS [root@cyyun ~]# hadoop fs -ls /user/hive/warehouse/a 16/12/24 01:45:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rwxr-xr-x 1 root supergroup 135 2016-12-24 01:35 /user/hive/warehouse/a/part-r-00000 [root@cyyun ~]#
本文参考:
http://blog.csdn.net/gdmzlhj1/article/details/50483171
相关文章推荐
- 通过sqoop进行mysql与hive的导入导出
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- 工作中,sqoop导入导出hive,mysql 出现的问题.
- 利用sqoop将hive数据导入导出数据到mysql
- sqoop实现mysql到hive数据库的导入导出
- 使用sqoop导入导出mysql、hdfs、hive出现的问题
- Sqoop导入导出MySQL、Oracle到HDFS、Hive
- 使用Sqoop将HDFS/Hive/HBase与MySQL/Oracle中的数据相互导入、导出
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql (转)
- [总结]----sqoop 同 MySQL 和 HDFS,Hive 导入导出
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- Sqoop_具体总结 使用Sqoop将HDFS/Hive/HBase与MySQL/Oracle中的数据相互导入、导出
- Hadoop Hive概念学习系列之HDFS、Hive、MySQL、Sqoop之间的数据导入导出(强烈建议去看)(十八)
- 测试sqoop从mysql到hive的导入导出
- 使用Sqoop将HDFS/Hive/HBase与MySQL/Oracle中的数据相互导入、导出
- sqoop简单操作-从mysql导入导出数据
- 大数据基础(二)hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安装和sqoop与hdfs,hive,mysql导入导出