您的位置:首页 > 数据库 > SQL

Sqoop进行MySQL和Hive间的导入导出操作

2016-12-26 16:24 621 查看
MySQL表数据导入到Hive中
1 MySQL建表

2 Sqoop创建Hive表

3 Sqoop导入数据到Hive

Hive表数据导出到MySQL表中
1 MySQL建表

2 Sqoop导出数据到MySQL

Sqoop的eval操作

Sqoop的job操作
1 创建job

2 查看job集合

3 显示job详细信息

4 执行job

5 删除job

Sqoop的codegen操作

Sqoop将MySQL增量数据导入与合并操作
1 MySQL建表

2 MySQL现有数据导入到HDFS

3 MySQL插入增量数据

4 MySQL增量数据导入到HDFS

5 codegen生成jar

6 合并

7 合并后的数据导入到Hive表

1.Sqoop在MySQL的数据和Hive之间进行导入导出的操作

2.Sqoop的eval操作

3.Sqoop的job操作

4.Sqoop的codegen生成java代码

5.Sqoop将MySQL的增量数据进行导入与合并操作,并将合并后的数据导入到Hive中

Sqoop1.4.6的安装及命令使用参见 Sqoop1.4.6的安装及命令行使用

Hive的安装参见Hive2.1.1的安装

注:Hadoop集群需要先启动。

1.MySQL表数据导入到Hive中

1.1 MySQL建表

进入到mysql命令行,在mydb数据库下建立表’a’,并插入数据

mysql> use mydb;
Database changed
mysql> show tables;
+----------------+
| Tables_in_mydb |
+----------------+
| user           |
| user2          |
+----------------+
2 rows in set (0.00 sec)

mysql> create table a(id int,name varchar(50));
Query OK, 0 rows affected (0.14 sec)

mysql> insert into a values(1,'a1');
Query OK, 1 row affected (0.13 sec)

mysql> insert into a values(2,'a2');
Query OK, 1 row affected (0.03 sec)

mysql> select * from a;
+------+------+
| id   | name |
+------+------+
|    1 | a1   |
|    2 | a2   |
+------+------+
2 rows in set (0.00 sec)


1.2 Sqoop创建Hive表

sqoop create-hive-table --connect jdbc:mysql://192.168.1.97:3306/mydb --username root
--password root --table a --hive-table a --fields-terminated-by ',' --hive-overwrite


执行成功显示



进入到Hive命令行中,在default数据库中生成了表a,无数据



1.3 Sqoop导入数据到Hive

sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root
--table a --hive-import --fields-terminated-by ',' --hive-overwrite -m 1


导入成功



HDFS上查看导入结果

hadoop fs -cat /user/hive/warehouse/a/part-m-00000




Hive Shell里查看导入的数据



2. Hive表数据导出到MySQL表中

2.1 MySQL建表

在mydb数据库下建立表’b’

mysql> create table b(id int,name varchar(50));
Query OK, 0 rows affected (0.07 sec)

mysql> select * from b;
Empty set (0.00 sec)


2.2 Sqoop导出数据到MySQL

将Hive仓库a文件夹数据导出到mysql中的表b

sqoop export --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root
--table b --export-dir /user/hive/warehouse/a --input-fields-terminated-by ','


mysql数据库查看结果

mysql> select * from b;
+------+------+
| id   | name |
+------+------+
|    2 | a2   |
|    1 | a1   |
+------+------+
2 rows in set (0.00 sec)


3. Sqoop的eval操作

sqoop eval连接mysql,直接进行select和DML操作

sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root
--query 'select * from a'


–query换成-e也行



sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root
-e "insert into a values (3,'a3')"
sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root
--query "insert into a values (4,'a4')"




[root@cyyun ~]# sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root
--password root --e 'select * from a'
16/12/23 20:35:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/12/23 20:35:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/12/23 20:35:50 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
--------------------------------------
| id          | name                 |
--------------------------------------
| 1           | a1                   |
| 2           | a2                   |
| 3           | a3                   |
| 4           | a4                   |
--------------------------------------


4. Sqoop的job操作

4.1 创建job

创建一个job,名为myjob,作用是将MySQL数据库中的表a数据导入到HDFS的/test/a_old目录下

sqoop job --create myjob -- import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root
--password root --table a -m 1 --target-dir /test/a_old




4.2 查看job集合

sqoop job --list




4.3 显示job详细信息

[root@cyyun ~]# sqoop job --show myjob
16/12/23 20:41:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
Job: myjob
Tool: import
Options:
----------------------------
verbose = false
db.connect.string = jdbc:mysql://192.168.1.97:3306/mydb
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
hbase.create.table = false
db.require.password = true
hdfs.append.dir = false
db.table = a
codegen.input.delimiters.escape = 0
import.fetch.size = null
accumulo.create.table = false
codegen.input.delimiters.enclose.required = false
db.username = root
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
codegen.output.delimiters.field = 44
export.new.update = UpdateOnly
incremental.mode = None
hdfs.file.format = TextFile
codegen.compile.dir = /tmp/sqoop-root/compile/1df5c52bab08ce1b6e9ae7c193b88264
direct.import = false
hdfs.target.dir = /test/a_old
hive.fail.table.exists = false
db.batch = false


4.4 执行job

sqoop job --exec myjob


sqoop job --exec myjob -- --username root




查看job执行成功后HDFS上的结果

hadoop fs -ls /test/a_old
hadoop fs -cat /test/a_old/part-m-00000




4.5 删除job

sqoop job --delete myjob




5. Sqoop的codegen操作

sqoop codegen --connect jdbc:mysql://192.168.1.97:3306/mydb --username root
--password root --table a




[root@cyyun ~]# ll /tmp/sqoop-root/compile/13bae8fcd124763b61658f7c4ed8937e/
total 28
-rw-r--r--. 1 root root  8715 Dec 23 21:40 a.class
-rw-r--r--. 1 root root  3616 Dec 23 21:40 a.jar
-rw-r--r--. 1 root root 10346 Dec 23 21:40 a.java


6. Sqoop将MySQL增量数据导入与合并操作

Sqoop官网描述Sqoop User Guide

Table 5. Incremental import arguments:

ArgumentDescription
--check-column (col)
Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)
--incremental (mode)
Specifies how Sqoop determines which rows are new. Legal values for
mode
include
append
and
lastmodified
.
--last-value (value)
Specifies the maximum value of the check column from the previous import.
Sqoop supports two types of incremental imports:
append
and
lastmodified
. You can use the
--incremental
argument to specify the type of incremental import to perform.

You should specify
append
mode when importing a table where new rows are continually being added with increasing row id values. You specify the column containing the row’s id with
--check-column
. Sqoop imports rows where the check column has a value greater than the one specified with
--last-value
.

An alternate table update strategy supported by Sqoop is called
lastmodified
mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with
--last-value
are imported.

At the end of an incremental import, the value which should be specified as
--last-value
for a subsequent import is printed to the screen. When running a subsequent import, you should specify
--last-value
in this way to ensure you import only the new or updated data. This is handled automatically by creating an incremental import as a saved job, which is the preferred mechanism for performing a recurring incremental import. See the section on saved jobs later in this document for more information.

6.1 MySQL建表

建表并导入数据

mysql> drop table a;
Query OK, 0 rows affected (0.81 sec)

mysql> create table a(id int,name varchar(50),crt_date timestamp);
Query OK, 0 rows affected (0.24 sec)

mysql> insert into a values(1,'a1',sysdate());
Query OK, 1 row affected (0.20 sec)

mysql> insert into a values(2,'a2',sysdate());
Query OK, 1 row affected (0.06 sec)

mysql> insert into a values(3,'a3',sysdate());
Query OK, 1 row affected (0.07 sec)

mysql> select * from a;
+------+------+---------------------+
| id   | name | crt_date            |
+------+------+---------------------+
|    1 | a1   | 2016-12-26 17:03:50 |
|    2 | a2   | 2016-12-26 17:04:00 |
|    3 | a3   | 2016-12-26 17:04:19 |
+------+------+---------------------+
3 rows in set (0.02 sec)


6.2 MySQL现有数据导入到HDFS

sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root
--table a -m 1 --target-dir /test/a_1


[root@cyyun ~]# hadoop fs -cat /test/a_1/*
16/12/24 01:20:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1,a1,2016-12-26 17:03:50.0
2,a2,2016-12-26 17:04:00.0
3,a3,2016-12-26 17:04:19.0


6.3 MySQL插入增量数据

mysql> insert into a values(4,'a4',sysdate());
Query OK, 1 row affected (0.04 sec)

mysql> insert into a values(5,'a5',sysdate());
Query OK, 1 row affected (0.09 sec)

mysql> select * from a;
+------+------+---------------------+
| id   | name | crt_date            |
+------+------+---------------------+
|    1 | a1   | 2016-12-26 17:03:50 |
|    2 | a2   | 2016-12-26 17:04:00 |
|    3 | a3   | 2016-12-26 17:04:19 |
|    4 | a4   | 2016-12-26 17:09:23 |
|    5 | a5   | 2016-12-26 17:09:36 |
+------+------+---------------------+
5 rows in set (0.00 sec)


6.4 MySQL增量数据导入到HDFS

sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root
--table a  -m 1 --target-dir /test/a_2 --incremental lastmodified
--check-column crt_date --last-value "2016-12-26 17:04:20"


--where crt_date>="2016-12-26 17:04:20"
,时间要比id=3大一点,不然会把第3条导进去

[root@cyyun ~]# hadoop fs -cat /test/a_2/*
16/12/24 01:27:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
4,a4,2016-12-26 17:09:23.0
5,a5,2016-12-26 17:09:36.0


6.5 codegen生成jar

[root@cyyun ~]# sqoop codegen --connect jdbc:mysql://192.168.1.97:3306/mydb
--username root --password root --table a
16/12/24 01:30:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/12/24 01:30:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/12/24 01:30:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/12/24 01:30:11 INFO tool.CodeGenTool: Beginning code generation
16/12/24 01:30:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `a` AS t LIMIT 1
16/12/24 01:30:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `a` AS t LIMIT 1
16/12/24 01:30:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop-2.6.4
Note: /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/12/24 01:30:15 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar


编译生成了/tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar

6.6 合并

sqoop merge --new-data /test/a_2 --onto /test/a_1
--target-dir /test/a_merge --jar-file /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar
--class-name a --merge-key id


注:–class-name a(这里是表名)

[root@cyyun ~]# hadoop fs -cat /test/a_merge/*
16/12/24 01:36:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1,a1,2016-12-26 17:03:50.0
2,a2,2016-12-26 17:04:00.0
3,a3,2016-12-26 17:04:19.0
4,a4,2016-12-26 17:09:23.0
5,a5,2016-12-26 17:09:36.0


6.7 合并后的数据导入到Hive表

hive> drop table a;
OK
Time taken: 17.45 seconds
hive>  create table a(id int,name string,crt_date string) row format delimited fields terminated by ',';
OK
Time taken: 1.586 seconds
hive> load data inpath '/test/a_merge/part*' into table a;
Loading data to table default.a
OK
Time taken: 2.073 seconds
hive> select * from a;
OK
1   a1  2016-12-26 17:03:50.0
2   a2  2016-12-26 17:04:00.0
3   a3  2016-12-26 17:04:19.0
4   a4  2016-12-26 17:09:23.0
5   a5  2016-12-26 17:09:36.0
Time taken: 0.191 seconds, Fetched: 5 row(s)
hive>


检查HDFS,会从HDFS中移动到Hive仓库

[root@cyyun ~]# hadoop fs -ls /test/a_merge
16/12/24 01:44:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 root supergroup          0 2016-12-24 01:35 /test/a_merge/_SUCCESS
[root@cyyun ~]# hadoop fs -ls /user/hive/warehouse/a
16/12/24 01:45:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rwxr-xr-x   1 root supergroup        135 2016-12-24 01:35 /user/hive/warehouse/a/part-r-00000
[root@cyyun ~]#


本文参考:

http://blog.csdn.net/gdmzlhj1/article/details/50483171
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: