您的位置：首页 > 数据库 > SQL

sqoop--mysql与hdfs数据互导

2013-06-25 15:24 465 查看

此例中hadoop版本为1.2.0，sqoop版本为1.4.3，mysql版本为5.1.61。
一、从mysql到hdfs
1.下载mysql的jdbc驱动
http://dev.mysql.com/downloads/connector/j/
2.配置驱动
解压安装包

[hadoop@node1 ~]$ tar -zxvf mysql-connector-java-5.1.25.tar.gz

拷贝jdbc驱动到lib目录

[hadoop@node1 mysql-connector-java-5.1.25]$ ls
build.xml  CHANGES  COPYING  docs  mysql-connector-java-5.1.25-bin.jar  README  README.txt  src
[hadoop@node1 mysql-connector-java-5.1.25]$ cp mysql-connector-java-5.1.25-bin.jar /home/hadoop/sqoop-1.4.3/lib/

3.编辑导出脚本

[hadoop@node1 bin]$ vi mysql2hdfs.sh
#连接字符串
CONNECTURL=jdbc:mysql://10.190.105.10/employees
#mysql用户名
MYSQLNAME=employee
#mysql用户密码
MYSQLPASSWORD=employee
#要导出的表名
mysqlTableName=dept_emp
#要保存的位置
hdfsPath=/user/hadoop/test/$oralceTableName
./sqoop import --append --connect $CONNECTURL --username $MYSQLNAME --password $MYSQLPASSWORD --target-dir $hdfsPath  --num-mappers 1 --table $mysqlTableName --fields-terminated-by '|'

4.修改mysql中employee用户权限

mysql> grant all on employee.* to 'employee'@'%' identified by 'employee';
Query OK, 0 rows affected (0.00 sec)

mysql>  select user,host from user;
+----------+-------------------+
| user     | host              |
+----------+-------------------+
| employee | %                 |
| employee | 10.190.105.51     |
| root     | 127.0.0.1         |
|          | localhost         |
| employee | localhost         |
| root     | localhost         |
|          | node1.localdomain |
| root     | node1.localdomain |
+----------+-------------------+
8 rows in set (0.00 sec)

5.执行迁移脚本

[hadoop@node1 bin]$ sh mysql2hdfs.sh
Warning: $HADOOP_HOME is deprecated.

13/06/25 21:09:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/06/25 21:09:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/06/25 21:09:32 INFO tool.CodeGenTool: Beginning code generation
13/06/25 21:09:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dept_emp` AS t LIMIT 1
13/06/25 21:09:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dept_emp` AS t LIMIT 1
13/06/25 21:09:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoop-1.2.0
Note: /tmp/sqoop-hadoop/compile/a3daec0ae4148c40fd3f20a023efd37f/dept_emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/25 21:09:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a3daec0ae4148c40fd3f20a023efd37f/dept_emp.jar
13/06/25 21:09:34 WARN manager.MySQLManager: It looks like you are importing from mysql.
13/06/25 21:09:34 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
13/06/25 21:09:34 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
13/06/25 21:09:34 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
13/06/25 21:09:34 WARN manager.CatalogQueryManager: The table dept_emp contains a multi-column primary key. Sqoop will default to the column emp_no only for this job.
13/06/25 21:09:34 WARN manager.CatalogQueryManager: The table dept_emp contains a multi-column primary key. Sqoop will default to the column emp_no only for this job.
13/06/25 21:09:34 INFO mapreduce.ImportJobBase: Beginning import of dept_emp
13/06/25 21:09:36 INFO mapred.JobClient: Running job: job_201306251627_0003
13/06/25 21:09:37 INFO mapred.JobClient:  map 0% reduce 0%
13/06/25 21:09:52 INFO mapred.JobClient:  map 100% reduce 0%
13/06/25 21:09:54 INFO mapred.JobClient: Job complete: job_201306251627_0003
13/06/25 21:09:54 INFO mapred.JobClient: Counters: 18
13/06/25 21:09:54 INFO mapred.JobClient:   Job Counters
13/06/25 21:09:54 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=13834
13/06/25 21:09:54 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/06/25 21:09:54 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/06/25 21:09:54 INFO mapred.JobClient:     Launched map tasks=1
13/06/25 21:09:54 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/06/25 21:09:54 INFO mapred.JobClient:   File Output Format Counters
13/06/25 21:09:54 INFO mapred.JobClient:     Bytes Written=11175033
13/06/25 21:09:54 INFO mapred.JobClient:   FileSystemCounters
13/06/25 21:09:54 INFO mapred.JobClient:     HDFS_BYTES_READ=87
13/06/25 21:09:54 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58770
13/06/25 21:09:54 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=11175033
13/06/25 21:09:54 INFO mapred.JobClient:   File Input Format Counters
13/06/25 21:09:54 INFO mapred.JobClient:     Bytes Read=0
13/06/25 21:09:54 INFO mapred.JobClient:   Map-Reduce Framework
13/06/25 21:09:54 INFO mapred.JobClient:     Map input records=331603
13/06/25 21:09:54 INFO mapred.JobClient:     Physical memory (bytes) snapshot=79998976
13/06/25 21:09:54 INFO mapred.JobClient:     Spilled Records=0
13/06/25 21:09:54 INFO mapred.JobClient:     CPU time spent (ms)=7560
13/06/25 21:09:54 INFO mapred.JobClient:     Total committed heap usage (bytes)=15925248
13/06/25 21:09:54 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=631529472
13/06/25 21:09:54 INFO mapred.JobClient:     Map output records=331603
13/06/25 21:09:54 INFO mapred.JobClient:     SPLIT_RAW_BYTES=87
13/06/25 21:09:54 INFO mapreduce.ImportJobBase: Transferred 10.6573 MB in 18.9752 seconds (575.1249 KB/sec)
13/06/25 21:09:54 INFO mapreduce.ImportJobBase: Retrieved 331603 records.
13/06/25 21:09:54 INFO util.AppendUtils: Creating missing output directory - dept_emp

导入完毕。总共导入331603条记录。
6.查看导入数据

[hadoop@node1 bin]$ hadoop fs -ls /user/hadoop/test/dept_emp
Warning: $HADOOP_HOME is deprecated.

Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2013-06-25 21:09 /user/hadoop/test/dept_emp/_logs
-rw-r--r--   3 hadoop supergroup   11175033 2013-06-25 21:09 /user/hadoop/test/dept_emp/part-m-00000

[hadoop@node1 bin]$ hadoop fs -cat /user/hadoop/test/dept_emp/part-m-00000 | more
Warning: $HADOOP_HOME is deprecated.

10001|d005|1986-06-26|9999-01-01
10002|d007|1996-08-03|9999-01-01
10003|d004|1995-12-03|9999-01-01
。。。。。。。。。。。。。。。。。

7.查看mysql中记录数量

mysql> use employees;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select count(*) from dept_emp;
+----------+
| count(*) |
+----------+
|   331603 |
+----------+
1 row in set (0.13 sec)

二、从hdfs到mysql
1.编辑脚本

#jdbc连接字符串
CONNECTURL=jdbc:mysql://10.190.105.10/employees
#mysql中的用户名
MYSQLNAME=employee
#mysql中用户的密码
MYSQLPASSWORD=employee
#要导入的表名，表要事先建立好
mysqlTableName=countries
#hdfs上文件的保存路径
hdfsPath=/user/hadoop/test/COUNTRIES/part-m-00000
#导出命令行
./sqoop export --connect $CONNECTURL --username $MYSQLNAME --password $MYSQLPASSWORD --export-dir $hdfsPath  --num-mappers 1 --table $mysqlTableName --fields-terminated-by '\001'

2.在mysql中创建需要的表

mysql> create table countries(country_id char(2),country_name varchar(40),region_id int);
Query OK, 0 rows affected (0.01 sec)

3.执行脚本

[hadoop@node1 bin]$ sh hdfs2mysql.sh
Warning: $HADOOP_HOME is deprecated.

13/06/25 22:49:19 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/06/25 22:49:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/06/25 22:49:19 INFO tool.CodeGenTool: Beginning code generation
13/06/25 22:49:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `COUNTRIES` AS t LIMIT 1
13/06/25 22:49:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `COUNTRIES` AS t LIMIT 1
13/06/25 22:49:20 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoop-1.2.0
Note: /tmp/sqoop-hadoop/compile/e40111465da685560ca38925242c7520/COUNTRIES.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/25 22:49:22 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e40111465da685560ca38925242c7520/COUNTRIES.jar
13/06/25 22:49:22 INFO mapreduce.ExportJobBase: Beginning export of COUNTRIES
13/06/25 22:49:23 INFO input.FileInputFormat: Total input paths to process : 1
13/06/25 22:49:23 INFO input.FileInputFormat: Total input paths to process : 1
13/06/25 22:49:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/06/25 22:49:23 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/25 22:49:24 INFO mapred.JobClient: Running job: job_201306251627_0010
13/06/25 22:49:25 INFO mapred.JobClient:  map 0% reduce 0%
13/06/25 22:49:32 INFO mapred.JobClient:  map 100% reduce 0%
13/06/25 22:49:34 INFO mapred.JobClient: Job complete: job_201306251627_0010
13/06/25 22:49:34 INFO mapred.JobClient: Counters: 18
13/06/25 22:49:34 INFO mapred.JobClient:   Job Counters
13/06/25 22:49:34 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=7127
13/06/25 22:49:34 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/06/25 22:49:34 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/06/25 22:49:34 INFO mapred.JobClient:     Rack-local map tasks=1
13/06/25 22:49:34 INFO mapred.JobClient:     Launched map tasks=1
13/06/25 22:49:34 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/06/25 22:49:34 INFO mapred.JobClient:   File Output Format Counters
13/06/25 22:49:34 INFO mapred.JobClient:     Bytes Written=0
13/06/25 22:49:34 INFO mapred.JobClient:   FileSystemCounters
13/06/25 22:49:34 INFO mapred.JobClient:     HDFS_BYTES_READ=472
13/06/25 22:49:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58409
13/06/25 22:49:34 INFO mapred.JobClient:   File Input Format Counters
13/06/25 22:49:34 INFO mapred.JobClient:     Bytes Read=0
13/06/25 22:49:34 INFO mapred.JobClient:   Map-Reduce Framework
13/06/25 22:49:34 INFO mapred.JobClient:     Map input records=25
13/06/25 22:49:34 INFO mapred.JobClient:     Physical memory (bytes) snapshot=84516864
13/06/25 22:49:34 INFO mapred.JobClient:     Spilled Records=0
13/06/25 22:49:34 INFO mapred.JobClient:     CPU time spent (ms)=960
13/06/25 22:49:34 INFO mapred.JobClient:     Total committed heap usage (bytes)=15925248
13/06/25 22:49:34 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=570187776
13/06/25 22:49:34 INFO mapred.JobClient:     Map output records=25
13/06/25 22:49:34 INFO mapred.JobClient:     SPLIT_RAW_BYTES=121
13/06/25 22:49:34 INFO mapreduce.ExportJobBase: Transferred 472 bytes in 11.312 seconds (41.7255 bytes/sec)
13/06/25 22:49:34 INFO mapreduce.ExportJobBase: Exported 25 records.

4.mysql中核实数据

mysql> select * from COUNTRIES;
+------------+--------------------------+-----------+
| country_id | country_name             | region_id |
+------------+--------------------------+-----------+
| AR         | Argentina                |         2 |
| AU         | Australia                |         3 |
| BE         | Belgium                  |         1 |
| BR         | Brazil                   |         2 |
| CA         | Canada                   |         2 |
| CH         | Switzerland              |         1 |
| CN         | China                    |         3 |
| DE         | Germany                  |         1 |
| DK         | Denmark                  |         1 |
| EG         | Egypt                    |         4 |
| FR         | France                   |         1 |
| HK         | HongKong                 |         3 |
| IL         | Israel                   |         4 |
| IN         | India                    |         3 |
| IT         | Italy                    |         1 |
| JP         | Japan                    |         3 |
| KW         | Kuwait                   |         4 |
| MX         | Mexico                   |         2 |
| NG         | Nigeria                  |         4 |
| NL         | Netherlands              |         1 |
| SG         | Singapore                |         3 |
| UK         | United Kingdom           |         1 |
| US         | United States of America |         2 |
| ZM         | Zambia                   |         4 |
| ZW         | Zimbabwe                 |         4 |
+------------+--------------------------+-----------+
25 rows in set (0.00 sec)

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： sqoop mysql hdfs

相关文章推荐

新的分享

章节导航