您的位置:首页 > 运维架构

sqoop-1.4.6安装及常见问题解决

2017-06-20 23:39 218 查看
环境:ubuntu16.0.4,hadoop-2.7.3,hbase-1.3.0,zookeeper-3.4.10,hive-2.1.1

从网上下载sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz。点我下载

然后解压到用户主目录下面

ubuntuserver@Master:~$tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

重命名

ubuntuserver@Master:~$sudo mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop-1.4.6

移动到/usr/local

ubuntuserver@Master:~$sudo mv sqoop-1.4.6 /usr/local/

修改sqoop-env.sh文件

ubuntuserver@Master:/usr/local/sqoop-1.4.6/conf$sudo
cp sqoop-env-template.sh sqoop-env.sh

ubuntuserver@Master:/usr/local/sqoop-1.4.6/conf$sudo vim sqoop-env.sh

修改内容如下:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0 #
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*

# Set Hadoop-specific environment variables here.

#Set path to where bin/hadoop is available
#export HADOOP_COMMON_HOME=

export HADOOP_COMMON_HOME=/usr/local/hadoop

#Set path to where hadoop-*-core.jar is available
#export HADOOP_MAPRED_HOME=

export HADOOP_MAPRED_HOME=/usr/local/hadoop

#set the path to where bin/hbase is available
#export HBASE_HOME=

export HBASE_HOME=/usr/local/hbase-1.3.0

#Set the path to where bin/hive is available
#export HIVE_HOME=

export HIVE_HOME=/usr/local/apache-hive-2.1.1-bin

#Set the path for where zookeper config dir is
#export ZOOCFGDIR=

export ZOOCFGDIR=/usr/local/zookeeper-3.4.10


修改环境变量

ubuntuserver@Master:~$sudo vim /etc/profile

添加以下内容:

export SQOOP_HOME=/usr/local/sqoop-1.4.6

export PATH=$SQOOP_HOME/bin:$PATH

保存文件,执行source /etc/profile使环境变量生效。

执行configure-sqoop命令,如果报如下错误的话:

ubuntuserver@Master:~$ configure-sqoop
Warning: /usr/local/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
ubuntuserver@Master:~$


在$SQOOP_HOME/bin目录下面修改configure-sqoop文件,注释掉以下内容:

#if [ -z "${HCAT_HOME}" ]; then
#  if [ -d "/usr/lib/hive-hcatalog" ]; then
#    HCAT_HOME=/usr/lib/hive-hcatalog
#  elif [ -d "/usr/lib/hcatalog" ]; then
#    HCAT_HOME=/usr/lib/hcatalog
#  else
#    HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
#    if [ ! -d ${HCAT_HOME} ]; then
#       HCAT_HOME=${SQOOP_HOME}/../hcatalog
#    fi
#  fi
#fi
#if [ -z "${ACCUMULO_HOME}" ]; then
#  if [ -d "/usr/lib/accumulo" ]; then
#    ACCUMULO_HOME=/usr/lib/accumulo
#  else
#    ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
#  fi
#fi

## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi

#if [ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi


接下来验证是否可以使用。先执行start-dfs.sh与start-yarn.sh启动hadoop集群。然后执行hive命令进入到CLI。



现在可以看到要上传的表(techbbs_2017_06_18),表结构,已经表的内容。

然后进入到mysql,创建数据库以及表。



验证是否可以连接mysql:

执行命令:sqoop list-databases -connect jdbc:mysql://localhost:3306 -username root -password 123456



注意,经本人实验,这里的连接语句中的localhost要根据hive的元数据库配置来写,如果是使用的本地mysql数据库作为元数据库,则需要写localhost,如果是使用的远程服务器的mysql作为hive的元数据库,则应该要写ip地址。

可以看到红色框中有个techbbs,表明连接成功了。

现在开始从hdfs导入数据到mysql中,执行以下命令:

sqoop export --connect jdbc:mysql://localhost:3306/techbbs --username root --password 123456 --table techbbs_logs_stat --fields-terminated-by '\001' --export-dir '/user/hive/warehouse/techbbs_2017_06_18'

注意该命令中没有换行。



显示成功导入了一条数据到mysql中,现在到mysql中看是否一致:



可以看到mysql中的数据跟在hdfs中查询到的数据是完全一致的。

如果在导入的时候报如下的错误:

java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2631)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
at org.apache.sqoop.mapreduce.ExportJobBase.getInputPath(ExportJobBase.java:223)
at org.apache.sqoop.mapreduce.ExportJobBase.getInputFileType(ExportJobBase.java:505)
at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:68)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:424)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:912)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.NoClassDefFoundError: com/amazonaws/AmazonServiceException
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 23 more
Caused by: java.lang.ClassNotFoundException: com.amazonaws.AmazonServiceException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 28 more


解决方法:

把$HADOOP_HOME/share/hadoop/tools/lib中的aws-java-sdk*.jar文件复制到sqoop的lib目录当中,则不会报错了。

可以参考点此参考该问题解决方法
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  sqoop