sqoop--mysql与hdfs数据互导
2013-06-25 15:24
465 查看
此例中hadoop版本为1.2.0,sqoop版本为1.4.3,mysql版本为5.1.61。
一、从mysql到hdfs
1.下载mysql的jdbc驱动
http://dev.mysql.com/downloads/connector/j/
2.配置驱动
解压安装包
拷贝jdbc驱动到lib目录
3.编辑导出脚本
4.修改mysql中employee用户权限
5.执行迁移脚本
导入完毕。总共导入331603条记录。
6.查看导入数据
7.查看mysql中记录数量
二、从hdfs到mysql
1.编辑脚本
2.在mysql中创建需要的表
3.执行脚本
4.mysql中核实数据
一、从mysql到hdfs
1.下载mysql的jdbc驱动
http://dev.mysql.com/downloads/connector/j/
2.配置驱动
解压安装包
[hadoop@node1 ~]$ tar -zxvf mysql-connector-java-5.1.25.tar.gz
拷贝jdbc驱动到lib目录
[hadoop@node1 mysql-connector-java-5.1.25]$ ls build.xml CHANGES COPYING docs mysql-connector-java-5.1.25-bin.jar README README.txt src [hadoop@node1 mysql-connector-java-5.1.25]$ cp mysql-connector-java-5.1.25-bin.jar /home/hadoop/sqoop-1.4.3/lib/
3.编辑导出脚本
[hadoop@node1 bin]$ vi mysql2hdfs.sh #连接字符串 CONNECTURL=jdbc:mysql://10.190.105.10/employees #mysql用户名 MYSQLNAME=employee #mysql用户密码 MYSQLPASSWORD=employee #要导出的表名 mysqlTableName=dept_emp #要保存的位置 hdfsPath=/user/hadoop/test/$oralceTableName ./sqoop import --append --connect $CONNECTURL --username $MYSQLNAME --password $MYSQLPASSWORD --target-dir $hdfsPath --num-mappers 1 --table $mysqlTableName --fields-terminated-by '|'
4.修改mysql中employee用户权限
mysql> grant all on employee.* to 'employee'@'%' identified by 'employee'; Query OK, 0 rows affected (0.00 sec) mysql> select user,host from user; +----------+-------------------+ | user | host | +----------+-------------------+ | employee | % | | employee | 10.190.105.51 | | root | 127.0.0.1 | | | localhost | | employee | localhost | | root | localhost | | | node1.localdomain | | root | node1.localdomain | +----------+-------------------+ 8 rows in set (0.00 sec)
5.执行迁移脚本
[hadoop@node1 bin]$ sh mysql2hdfs.sh Warning: $HADOOP_HOME is deprecated. 13/06/25 21:09:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 13/06/25 21:09:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 13/06/25 21:09:32 INFO tool.CodeGenTool: Beginning code generation 13/06/25 21:09:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dept_emp` AS t LIMIT 1 13/06/25 21:09:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dept_emp` AS t LIMIT 1 13/06/25 21:09:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoop-1.2.0 Note: /tmp/sqoop-hadoop/compile/a3daec0ae4148c40fd3f20a023efd37f/dept_emp.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 13/06/25 21:09:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a3daec0ae4148c40fd3f20a023efd37f/dept_emp.jar 13/06/25 21:09:34 WARN manager.MySQLManager: It looks like you are importing from mysql. 13/06/25 21:09:34 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 13/06/25 21:09:34 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 13/06/25 21:09:34 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 13/06/25 21:09:34 WARN manager.CatalogQueryManager: The table dept_emp contains a multi-column primary key. Sqoop will default to the column emp_no only for this job. 13/06/25 21:09:34 WARN manager.CatalogQueryManager: The table dept_emp contains a multi-column primary key. Sqoop will default to the column emp_no only for this job. 13/06/25 21:09:34 INFO mapreduce.ImportJobBase: Beginning import of dept_emp 13/06/25 21:09:36 INFO mapred.JobClient: Running job: job_201306251627_0003 13/06/25 21:09:37 INFO mapred.JobClient: map 0% reduce 0% 13/06/25 21:09:52 INFO mapred.JobClient: map 100% reduce 0% 13/06/25 21:09:54 INFO mapred.JobClient: Job complete: job_201306251627_0003 13/06/25 21:09:54 INFO mapred.JobClient: Counters: 18 13/06/25 21:09:54 INFO mapred.JobClient: Job Counters 13/06/25 21:09:54 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=13834 13/06/25 21:09:54 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/06/25 21:09:54 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/06/25 21:09:54 INFO mapred.JobClient: Launched map tasks=1 13/06/25 21:09:54 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 13/06/25 21:09:54 INFO mapred.JobClient: File Output Format Counters 13/06/25 21:09:54 INFO mapred.JobClient: Bytes Written=11175033 13/06/25 21:09:54 INFO mapred.JobClient: FileSystemCounters 13/06/25 21:09:54 INFO mapred.JobClient: HDFS_BYTES_READ=87 13/06/25 21:09:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58770 13/06/25 21:09:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=11175033 13/06/25 21:09:54 INFO mapred.JobClient: File Input Format Counters 13/06/25 21:09:54 INFO mapred.JobClient: Bytes Read=0 13/06/25 21:09:54 INFO mapred.JobClient: Map-Reduce Framework 13/06/25 21:09:54 INFO mapred.JobClient: Map input records=331603 13/06/25 21:09:54 INFO mapred.JobClient: Physical memory (bytes) snapshot=79998976 13/06/25 21:09:54 INFO mapred.JobClient: Spilled Records=0 13/06/25 21:09:54 INFO mapred.JobClient: CPU time spent (ms)=7560 13/06/25 21:09:54 INFO mapred.JobClient: Total committed heap usage (bytes)=15925248 13/06/25 21:09:54 INFO mapred.JobClient: Virtual memory (bytes) snapshot=631529472 13/06/25 21:09:54 INFO mapred.JobClient: Map output records=331603 13/06/25 21:09:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 13/06/25 21:09:54 INFO mapreduce.ImportJobBase: Transferred 10.6573 MB in 18.9752 seconds (575.1249 KB/sec) 13/06/25 21:09:54 INFO mapreduce.ImportJobBase: Retrieved 331603 records. 13/06/25 21:09:54 INFO util.AppendUtils: Creating missing output directory - dept_emp
导入完毕。总共导入331603条记录。
6.查看导入数据
[hadoop@node1 bin]$ hadoop fs -ls /user/hadoop/test/dept_emp Warning: $HADOOP_HOME is deprecated. Found 2 items drwxr-xr-x - hadoop supergroup 0 2013-06-25 21:09 /user/hadoop/test/dept_emp/_logs -rw-r--r-- 3 hadoop supergroup 11175033 2013-06-25 21:09 /user/hadoop/test/dept_emp/part-m-00000 [hadoop@node1 bin]$ hadoop fs -cat /user/hadoop/test/dept_emp/part-m-00000 | more Warning: $HADOOP_HOME is deprecated. 10001|d005|1986-06-26|9999-01-01 10002|d007|1996-08-03|9999-01-01 10003|d004|1995-12-03|9999-01-01 。。。。。。。。。。。。。。。。。
7.查看mysql中记录数量
mysql> use employees; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select count(*) from dept_emp; +----------+ | count(*) | +----------+ | 331603 | +----------+ 1 row in set (0.13 sec)
二、从hdfs到mysql
1.编辑脚本
#jdbc连接字符串 CONNECTURL=jdbc:mysql://10.190.105.10/employees #mysql中的用户名 MYSQLNAME=employee #mysql中用户的密码 MYSQLPASSWORD=employee #要导入的表名,表要事先建立好 mysqlTableName=countries #hdfs上文件的保存路径 hdfsPath=/user/hadoop/test/COUNTRIES/part-m-00000 #导出命令行 ./sqoop export --connect $CONNECTURL --username $MYSQLNAME --password $MYSQLPASSWORD --export-dir $hdfsPath --num-mappers 1 --table $mysqlTableName --fields-terminated-by '\001'
2.在mysql中创建需要的表
mysql> create table countries(country_id char(2),country_name varchar(40),region_id int); Query OK, 0 rows affected (0.01 sec)
3.执行脚本
[hadoop@node1 bin]$ sh hdfs2mysql.sh Warning: $HADOOP_HOME is deprecated. 13/06/25 22:49:19 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 13/06/25 22:49:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 13/06/25 22:49:19 INFO tool.CodeGenTool: Beginning code generation 13/06/25 22:49:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `COUNTRIES` AS t LIMIT 1 13/06/25 22:49:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `COUNTRIES` AS t LIMIT 1 13/06/25 22:49:20 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoop-1.2.0 Note: /tmp/sqoop-hadoop/compile/e40111465da685560ca38925242c7520/COUNTRIES.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 13/06/25 22:49:22 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e40111465da685560ca38925242c7520/COUNTRIES.jar 13/06/25 22:49:22 INFO mapreduce.ExportJobBase: Beginning export of COUNTRIES 13/06/25 22:49:23 INFO input.FileInputFormat: Total input paths to process : 1 13/06/25 22:49:23 INFO input.FileInputFormat: Total input paths to process : 1 13/06/25 22:49:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/06/25 22:49:23 WARN snappy.LoadSnappy: Snappy native library not loaded 13/06/25 22:49:24 INFO mapred.JobClient: Running job: job_201306251627_0010 13/06/25 22:49:25 INFO mapred.JobClient: map 0% reduce 0% 13/06/25 22:49:32 INFO mapred.JobClient: map 100% reduce 0% 13/06/25 22:49:34 INFO mapred.JobClient: Job complete: job_201306251627_0010 13/06/25 22:49:34 INFO mapred.JobClient: Counters: 18 13/06/25 22:49:34 INFO mapred.JobClient: Job Counters 13/06/25 22:49:34 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=7127 13/06/25 22:49:34 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/06/25 22:49:34 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/06/25 22:49:34 INFO mapred.JobClient: Rack-local map tasks=1 13/06/25 22:49:34 INFO mapred.JobClient: Launched map tasks=1 13/06/25 22:49:34 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 13/06/25 22:49:34 INFO mapred.JobClient: File Output Format Counters 13/06/25 22:49:34 INFO mapred.JobClient: Bytes Written=0 13/06/25 22:49:34 INFO mapred.JobClient: FileSystemCounters 13/06/25 22:49:34 INFO mapred.JobClient: HDFS_BYTES_READ=472 13/06/25 22:49:34 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58409 13/06/25 22:49:34 INFO mapred.JobClient: File Input Format Counters 13/06/25 22:49:34 INFO mapred.JobClient: Bytes Read=0 13/06/25 22:49:34 INFO mapred.JobClient: Map-Reduce Framework 13/06/25 22:49:34 INFO mapred.JobClient: Map input records=25 13/06/25 22:49:34 INFO mapred.JobClient: Physical memory (bytes) snapshot=84516864 13/06/25 22:49:34 INFO mapred.JobClient: Spilled Records=0 13/06/25 22:49:34 INFO mapred.JobClient: CPU time spent (ms)=960 13/06/25 22:49:34 INFO mapred.JobClient: Total committed heap usage (bytes)=15925248 13/06/25 22:49:34 INFO mapred.JobClient: Virtual memory (bytes) snapshot=570187776 13/06/25 22:49:34 INFO mapred.JobClient: Map output records=25 13/06/25 22:49:34 INFO mapred.JobClient: SPLIT_RAW_BYTES=121 13/06/25 22:49:34 INFO mapreduce.ExportJobBase: Transferred 472 bytes in 11.312 seconds (41.7255 bytes/sec) 13/06/25 22:49:34 INFO mapreduce.ExportJobBase: Exported 25 records.
4.mysql中核实数据
mysql> select * from COUNTRIES; +------------+--------------------------+-----------+ | country_id | country_name | region_id | +------------+--------------------------+-----------+ | AR | Argentina | 2 | | AU | Australia | 3 | | BE | Belgium | 1 | | BR | Brazil | 2 | | CA | Canada | 2 | | CH | Switzerland | 1 | | CN | China | 3 | | DE | Germany | 1 | | DK | Denmark | 1 | | EG | Egypt | 4 | | FR | France | 1 | | HK | HongKong | 3 | | IL | Israel | 4 | | IN | India | 3 | | IT | Italy | 1 | | JP | Japan | 3 | | KW | Kuwait | 4 | | MX | Mexico | 2 | | NG | Nigeria | 4 | | NL | Netherlands | 1 | | SG | Singapore | 3 | | UK | United Kingdom | 1 | | US | United States of America | 2 | | ZM | Zambia | 4 | | ZW | Zimbabwe | 4 | +------------+--------------------------+-----------+ 25 rows in set (0.00 sec)
相关文章推荐
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- Sqoop实现MySql/Oracle与Hdfs/Hbase互导数据
- 用sqoop进行mysql和hdfs系统间的数据互导
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- Sqoop实现Mysql与HDFS互导数据,Mysql与Hbase,Oracle与Hbase的互导最后给出命令。
- sqoop实现Mysql、Oracle与hdfs之间数据的互导
- 用Sqoop进行Mysql 与HDFS / Hbase的互导数据
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- Sqoop实现MySql/Oracle与Hdfs/Hbase互导数据
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- 利用Sqoop实现MySQL与HDFS数据互导
- 用sqoop进行mysql和hdfs系统间的数据互导
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- 利用sqoop把Mysql中的表数据导出到HDFS下的文本文件里
- 使用sqoop将mysql数据导入到hdfs