您的位置:首页 > 数据库 > SQL

Sqoop1.4.5+hadoop2.2.0进行Mysql到HDFS的数据转换

2015-03-16 22:32 239 查看
正如上一篇记录的那样,采用sqoop1.99.4 + hadoop2.2.0来将mysql的表数据导入到HDFS的时候,死活没有找到如何制定字段分隔符号,这才有了试用sqoop1.4.5这番折腾。从架构上来将,Sqoop2确实在安全性等方面有很好的提升,但是Sqoop2目前还不推荐在生产环境中使用,它很多功能还缺失,不够完善,不过,对我们小规模的使用Hadoop的公司来讲,Sqoop1.4.X足够用了,毕竟我等跨部门、多人员来使用的情况还是比较少的,命令行好用的很啊!

(1)安装环境

操作系统:Linux(centos6.5)

JDK版本:1.7.0_45

Hadoop版本:hadoop2.2.0

Sqoop版本:sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz

hadoop安装目录:/home/hadoop/hadoop-2.2.0

Sqoop2安装目录:/home/hadoop/sqoop-1.4.5

Hadoop和Sqoop都是同一个用户hadoop下面,hadoop用户的的家目录:/home/hadoop

2)修改Sqoop配置文件

cd /home/hadoop/sqoop-1.4.5/conf

cp sqoop-env-template.sh sqoop-env.sh

在文件sqoop-env.sh的末尾追加如下几个环境变量设置:

#add by zhanzk

export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.2.0

export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce

export HIVE_HOME=/home/hadoop/hive-0.12.0

(3)修改hadoop用户的环境变量

编辑文件:/home/hadoop/.bash_profile,追加如下内容:

export SQOOP_HOME=/home/hadoop/sqoop-1.4.5

export PATH=$PATH:$SQOOP_HOME/bin

export LOGDIR=$SQOOP_HOME/logs

(4)将mysql的jdbc驱动程序放到$SQOOP_HOME/lib目录下

将 mysql-connector-java-5.1.15.jar 复制到 :/home/hadoop/sqoop-1.4.5/lib目录下

(5)试用sqoop

1 、用Sqoop来列出192.168.0.1下的数据库

进入$SQOOP_HOME/bin目录下执行如下命令:

./sqoop list-databases --connect jdbc:mysql://192.168.0.1:3306/mydb?characterEncoding=UTF-8 --username test --password 'test'

2、将表book下的数据导入到HDFS中去

进入$SQOOP_HOME/bin目录下执行如下命令:

./sqoop import --connect jdbc:mysql://192.168.0.1:3306/mydb?characterEncoding=UTF-8 --username test --password 'test' --target-dir '/user/hive/warehouse/book' --table book ;

注意:我么这里也出现问题了:

15/03/15 22:30:33 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@54b0a583 is still active. No statements may be issued when any streaming result sets are open and in use on a
given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.

java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@54b0a583 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming
result sets before attempting more queries.

at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:930)

at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2694)

at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1868)

at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2109)

at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2642)

at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2571)

at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1464)

at com.mysql.jdbc.ConnectionImpl.getMaxBytesPerChar(ConnectionImpl.java:3030)

at com.mysql.jdbc.Field.getMaxBytesPerCharacter(Field.java:592)

at com.mysql.jdbc.ResultSetMetaData.getPrecision(ResultSetMetaData.java:444)

at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:285)

at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)

at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)

at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)

at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)

at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)

at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)

at org.apache.sqoop.Sqoop.run(Sqoop.java:143)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)

at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

15/03/15 22:30:33 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter

at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1584)

at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)

at org.apache.sqoop.Sqoop.run(Sqoop.java:143)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)

at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

不过幸运的是找到这篇文章:http://my.oschina.net/u/1169607/blog/352225,将 mysql-connector-java-5.1.15.jar 更换为mysql-connector-java-5.1.32-bin.jar 即可消除上述错误。

至此还是没有成功,又出现如下错误了:

[hadoop@host25 bin]$ ./sqoop import --connect jdbc:mysql://192.168.0.1:3306/mydb?characterEncoding=UTF-8 --username test --password 'test' --target-dir '/user/hive/warehouse/book' --table t_book ;

Warning: /home/hadoop/sqoop-1.4.5/../hbase does not exist! HBase imports will fail.

Please set $HBASE_HOME to the root of your HBase installation.

Warning: /home/hadoop/sqoop-1.4.5/../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /home/hadoop/sqoop-1.4.5/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

Warning: /home/hadoop/sqoop-1.4.5/../zookeeper does not exist! Accumulo imports will fail.

Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.

15/03/15 23:10:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5

15/03/15 23:10:55 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

15/03/15 23:10:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

15/03/15 23:10:56 INFO tool.CodeGenTool: Beginning code generation

15/03/15 23:10:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_book` AS t LIMIT 1

15/03/15 23:10:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_book` AS t LIMIT 1

15/03/15 23:10:56 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce

Note: /tmp/sqoop-hadoop/compile/c798c2a151fc7c3baed090b15aa6e2cb/book.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

15/03/15 23:10:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/c798c2a151fc7c3baed090b15aa6e2cb/book.jar

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/InputFormat

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at org.apache.sqoop.manager.ImportJobContext.<init>(ImportJobContext.java:51)

at com.cloudera.sqoop.manager.ImportJobContext.<init>(ImportJobContext.java:33)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:483)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)

at org.apache.sqoop.Sqoop.run(Sqoop.java:143)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)

at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.InputFormat

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 58 more

至此百思不得其解,怎么会找不到mapred的类呢,琢磨之后马上意识到问题了,在我的Hadoop环境中配置了如下的环境变量:

export HADOOP_PREFIX="/home/hadoop/hadoop-2.2.0"

export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}

这个与sqoop-env.sh中配置的环境变量冲突啊:

#add by zhanzk

export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.2.0

export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce

export HIVE_HOME=/home/hadoop/hive-0.12.0

这才导致找不到mapreduce的包,所以现在有个简单办法,即是把mapreduce相关的jar包复制到$SQOOP_HOME/lib下面来,就什么事情也没有了。

cp /home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/*.jar /home/hadoop/sqoop-1.4.5/lib

至此问题才算真正解决了,再次导出mysql的数据到hdfs中的时候,终于在HDFS的/user/hive/warehouse/book这个目录下找到了输出的文件数据了。

虽说是导入数据到HDFS中成功了,但是系统中依然有如下错误:

15/03/16 13:07:12 INFO mapreduce.Job: Task Id : attempt_1426431271248_0007_m_000003_0, Status : FAILED

Error: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLException: Access denied for user 'test'@'192.168.0.2' (using password: YES)

at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)

at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)

at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

Caused by: java.lang.RuntimeException: java.sql.SQLException: Access denied for user 'test'@'192.168.0.1' (using password: YES)

at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:220)

at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:165)

... 9 more

Caused by: java.sql.SQLException: Access denied for user ''test'@'192.168.0.2' (using password: YES)

at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1094)

at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4208)

at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4140)

at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:925)

at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1747)

at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1287)

at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2494)

at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2527)

at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2309)

at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:834)

at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:46)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)

at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:419)

at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:344)

at java.sql.DriverManager.getConnection(DriverManager.java:571)

at java.sql.DriverManager.getConnection(DriverManager.java:215)

at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)

at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:213)

... 10 more

这个错误就简单了,我的数据库mydb并没有对192.168.0.1这个节点授权,完成授权后,问题自然就消失了。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: