sqoop1.4.7的安装及使用(hadoop2.7环境)
2019-03-18 14:47
176 查看
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_28356739/article/details/88638801
一、sqoop简介
Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql…)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。
说明:本测试hadoop是单机伪分布式环境,如果读者想要学习如何搭建伪分布式hadoop环境请看本人这篇文章([Hadoop+Hive+HBase+Kylin 伪分布式安装指南(https://www.geek-share.com/detail/2763619456.html)),本文章只介绍如何安装部署sqoop,后续通过sqoop将Oracle数据导入到hadoop环境实验在另外一篇文章,待测试完将会把连接放到此文章。
二、环境配置
三、安装Sqoop
1. 下载,解压到指定目录
下载连接:
点此下载
创建安装目录,通过xshell上传安装包
[root@hadoop hadoop]# pwd /hadoop [root@hadoop hadoop]# mkdir sqoop [root@hadoop hadoop]# cd sqoop/ [root@hadoop sqoop]# ls sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz [root@hadoop sqoop]# tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz [root@hadoop sqoop]# ls sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz [root@hadoop sqoop]# rm -rf *gz [root@hadoop sqoop]# mv sqoop-1.4.7.bin__hadoop-2.6.0/* .
2、修改配置文件sqoop-env.sh
在sqoop/conf目录下有一个文件sqoop-env-template.sh,把它复制为sqoop-env.sh并修改
[root@hadoop sqoop]# cd conf/ [root@hadoop conf]# cp sqoop-env-template.sh sqoop-env.sh #Set path to where bin/hadoop is available [root@hadoop conf]# vim sqoop-env.sh 根据自己情况修改,另外,你还装了Zookeeper的话则最后一句也要配置。 export HADOOP_COMMON_HOME=/hadoop/ #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME=/hadoop/ #set the path to where bin/hbase is available export HBASE_HOME=/hadoop/hbase/ #Set the path to where bin/hive is available export HIVE_HOME=/hadoop/hive #Set the path for where zookeper config dir is #export ZOOCFGDIR=
3. 配置环境变量
我测试用户为root用户,直接修改/etc/profile加入下面内容:
export SQOOP_HOME=/hadoop/sqoop export PATH=$PATH:${SQOOP_HOME}/bin export CLASSPATH=$CLASSPATH:${SQOOP_HOME}/lib
然后使环境变量生效
[root@hadoop conf]# source /etc/profile
4. 复制相关依赖包到$SQOOP_HOME/lib
因为我是将Oracle数据导入到hive,所以复制环境数据库所在虚拟机(195.168.1.6)的Oracle的OJDBC包到/hadoop/sqoop/lib下
[oracle@source ~]$ cd $ORACLE_HOME/jdbc/lib [oracle@source lib]$ pwd /u01/app/oracle/product/11.2.0/db_1/jdbc/lib [oracle@source lib]$ ls ojdbc5dms_g.jar ojdbc5_g.jar ojdbc6dms_g.jar ojdbc6_g.jar simplefan.jar ojdbc5dms.jar ojdbc5.jar ojdbc6dms.jar ojdbc6.jar 上面是数据库所在虚拟机Oraclejar包位置及信息。将ojdbc包传到hadoop虚拟机 [oracle@source lib]$ scp ojdbc6.jar root@192.168.1.66:/hadoop/sqoop/lib root@192.168.1.66's password: ojdbc6.jar 100% 2675KB 2.6MB/s 00:00
5、修改$SQOOP_HOME/bin/configure-sqoop
注释掉HCatalog,Accumulo检查(除非你准备使用HCatalog,Accumulo等HADOOP上的组件)
##Moved to be a runtime check in sqoop. #if[ ! -d "${HCAT_HOME}" ]; then # echo "Warning: $HCAT_HOME does notexist! HCatalog jobs will fail." # echo 'Please set $HCAT_HOME to the root ofyour HCatalog installation.' #fi #if[ ! -d "${ACCUMULO_HOME}" ]; then # echo "Warning: $ACCUMULO_HOME does notexist! Accumulo imports will fail." # echo 'Please set $ACCUMULO_HOME to the rootof your Accumulo installation.' #fi #Add HCatalog to dependency list #if[ -e "${HCAT_HOME}/bin/hcat" ]; then # TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat-classpath` # if [ -z "${HIVE_CONF_DIR}" ]; then # TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR} # fi # SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH} #fi #Add Accumulo to dependency list #if[ -e "$ACCUMULO_HOME/bin/accumulo" ]; then # for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*accumulo.*jar |cut -d':' -f2`; do # SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn # done # for jn in `$ACCUMULO_HOME/bin/accumuloclasspath | grep file:.*zookeeper.*jar |cut -d':' -f2`; do # SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn # done #fi
6、 测试与Oracle的连接
[root@hadoop sqoop]# pwd /hadoop/sqoop [root@hadoop sqoop]# sqoop list-databases --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger Warning: /hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 19/03/18 14:25:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/03/18 14:25:57 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consi der using -P instead.19/03/18 14:25:57 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 19/03/18 14:25:57 INFO manager.SqlManager: Using default fetchSize of 1000 19/03/18 14:25:58 INFO manager.OracleManager: Time zone has been set to GMT SYS SYSTEM SCOTT TEST ADMRG OGG OUTLN MGMT_VIEW FLOWS_FILES MDSYS ORDSYS EXFSYS DBSNMP WMSYS APPQOSSYS APEX_030200 OWBSYS_AUDIT ORDDATA CTXSYS ANONYMOUS SYSMAN XDB ORDPLUGINS OWBSYS SI_INFORMTN_SCHEMA OLAPSYS ORACLE_OCM XS$NULL BI PM MDDATA IX SH DIP OE APEX_PUBLIC_USER HR SPATIAL_CSW_ADMIN_USR SPATIAL_WFS_ADMIN_USR
相关文章推荐
- 在VMWare Workstation8.0上使用ubuntu11安装和配置Hadoop群集环境03_hadoop的安装
- Sqoop1.4.5-安装测试部署(基于Hadoop2.2.0环境)
- Sqoop安装与使用(sqoop-1.4.5 on hadoop 1.0.4)
- Hadoop2.7.3环境下Sqoop1.99.5安装
- 在VMWare Workstation上使用RedHat Linux安装和配置Hadoop群集环境04_Hadoop的安装
- 【hadoop sqoop】sqoop1的安装和使用
- linux环境下Hadoop的安装与使用
- mac环境 安装和使用sqoop
- 在VMWare Workstation上使用RedHat Linux安装和配置Hadoop群集环境01_虚拟机的安装
- hadoop2.7环境的编译安装
- hadoop集群安装以及Hive、sqoop的使用
- 基于hadoop2.7集群的Spark2.0,Sqoop1.4.6,Mahout0.12.2完全分布式安装
- hadoop-2.7 在windows环境下安装
- VMWare Workstation上使用RedHat Linux安装和配置Hadoop群集环境03_配置虚拟机之间SSH无密码登录
- 在VMWare Workstation上使用RedHat Linux安装和配置Hadoop群集环境01_虚拟机的安装
- 在VMWare Workstation上使用RedHat Linux安装和配置Hadoop群集环境06_WordCount示例
- python2.7虚拟环境virtualenv安装及使用
- 在VMWare Workstation8.0上使用ubuntu11安装和配置Hadoop群集环境02_配置虚拟机之间SSH无密码登录
- 【心血之作】linux虚拟机下安装配置Hadoop(完全分布式)生态环境(hadoop2.2.0,HBase0.98,Hive0.13(连接oracle),sqoop1.4.4(连接oracle)
- hadoop(八) - sqoop安装与使用