您的位置:首页 > 大数据

大数据基础(二)hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安装和sqoop与hdfs,hive,mysql导入导出

2016-05-15 21:10 1206 查看
hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安装

2016.05.15

本文测试环境:

hadoop2.6.2 ubuntu 14.04.04 amd64 jdk1.8

安装版本:

maven 3.3.9 hbase 1.15 hive 1.2.1 sqoop2(1.99.6)和sqoop1(1.4.6)

另外,本文参考了一些文章,基本上都有原文链接。

前提:hadoop安装:

参考:http://blog.csdn.net/xanxus46/article/details/45133977

本文的安装教程可以辅助基本的hadoop日志分析,详细教程,参考:
http://www.cnblogs.com/edisonchou/p/4449082.html
一、maven

1、安装jdk

2、下载:
http://maven.apache.org/download.cgi
wget http://mirrors.cnnic.cn/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
3、解压:

tar -xzf apache-maven-3.3.9-bin.tar.gz

4、配置环境变量

vi ~/.bashrc

export MAVEN_HOME=/home/Hadoop/apache-maven-3.3.9

export PATH=$MAVEN_HOME/bin:$PATH

生效:

source ~/.bashrc

5、验证

$mvn --version

结果:

root@spark:/usr/local/maven/apache-maven-3.3.9# mvn --version

Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)

Maven home: /usr/local/maven/apache-maven-3.3.9

Java version: 1.8.0_65, vendor: Oracle Corporation

Java home: /usr/lib/java/jdk1.8.0_65/jre

Default locale: en_HK, platform encoding: UTF-8

OS name: "linux", version: "3.19.0-58-generic", arch: "amd64", family: "unix"

root@spark:/usr/local/maven/apache-maven-3.3.9#
http://www.linuxidc.com/Linux/2015-03/114619.htm
二、hbase

1、下载:
http://mirrors.hust.edu.cn/apache/hbase/stable/ http://mirrors.hust.edu.cn/apache/hbase/stable/hbase-1.1.5-bin.tar.gz
2、解压:

HBase的安装也有三种模式:单机模式、伪分布模式和完全分布式模式,在这里只介绍完全分布模式。前提是Hadoop集群和Zookeeper已经安装完毕,并能正确运行。

第一步:下载安装包,解压到合适位置,并将权限分配给hadoop用户(运行hadoop的账户,比如root)

这里下载的是hbase-1.1.5,Hadoop集群使用的是2.6,将其解压到/usr/local下

tar -zxvf hbase-1.1.5-bin.tar.gz

mkdir /usr/local/hbase

mv hbase-1.1.5 /usr/local/hbase

cd /usr/local

chmod -R 775 hbase

chmod -R root: hbase

3、环境变量

$vi ~/.bashrc

export HBASE_HOME=/usr/local/hbase/hbase-1.1.5

PATH=$HBASE_HOME/bin:$PATH

source ~/.bashrc

4、配置文件

4.1 jdk[有默认的jdk,可以不改]

sudo vim /opt/hbase/conf/hbase-env.sh

修改$JAVA_HOME为jdk安装目录,这里是/opt/jdk1.8.0

4.2 hbase-site.xml

/usr/local/hbase/hbase-1.1.5/conf/hbase-site.xml

<configuration>

<property>

<name>hbase.rootdir</name>

<value>hdfs://spark:9000/hbase</value>

</property>

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

</configuration>

5、验证

先启动hadoop

sbin/start-dfs.sh

sbin/start-yarn.sh

$hbase shell

结果:

root@spark:/usr/local/hbase/hbase-1.1.5/bin# hbase shell

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/hbase/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 1.1.5, r239b80456118175b340b2e562a5568b5c744252e, Sun May 8 20:29:26 PDT 2016

hbase(main):001:0>
http://blog.csdn.net/xanxus46/article/details/45133977
集群安装:http://blog.sina.com.cn/s/blog_6145ed810102vtws.html

三、hive

1、下载:http://apache.fayea.com/hive/stable/
http://apache.fayea.com/hive/stable/apache-hive-1.2.1-bin.tar.gz
2、解压:

tar xvzf apache-hive-1.2.1-bin.tar.gz

3、环境变量

root@spark:/home/alex/xdowns# vi ~/.bashrc

export HIVE_HOME=/usr/local/hive/apache-hive-1.2.1-bin

export PATH=$PATH:$HIVE_HOME/bin

root@spark:/home/alex/xdowns# source ~/.bashrc

4、修改配置文件

首先将hive-env.sh.template和hive-default.xml.template进行复制并改名为hive-env.sh和hive-site.xml。

/home/hadoop/apache-hive-1.0.0-bin/conf/hive-env.sh修改,如下所示:

export HADOOP_HEAPSIZE=1024

# Set HADOOP_HOME to point to a specific hadoop install directory

HADOOP_HOME=/home/hadoop/hadoop-2.5.2

# Hive Configuration Directory can be controlled by:

export HIVE_CONF_DIR=/home/hadoop/apache-hive-1.0.0-bin/conf

# Folder containing extra ibraries required for hive compilation/execution can be controlled by:

export HIVE_AUX_JARS_PATH=/home/hadoop/apache-hive-1.0.0-bin/lib

/home/hadoop/apache-hive-1.0.0-bin/conf/hive-site.xml修改,如下所示:

<property>

<name>hive.metastore.warehouse.dir</name>

<value>hdfs://Master:9000/hbase</value>

</property>

<property>

<name>hive.querylog.location</name>

<value>/usr/hadoop/hive/log</value>

<description>

存放hive相关日志的目录

</description>

</property>

5、连接MySQL【可选】

5.1 禁用mysql 绑定本机

由于 mysql的默认安装只允许本地登录,所以需要修改配置文件将地址绑定注释掉:

vi /etc/mysql/my.cnf

#bind-address = 127.0.0.1

5.2 重启mysql: service mysql restart

5.3 登录msql,mysql -uroot -proot

创建database: hive

create database hive;

show databases;

mysql> show databases;

+--------------------+

| Database |

+--------------------+

| information_schema |

| hive |

| mysql |

5.4 修改hive配置文件hive-site.xml

修改以下属性:

<configuration>

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://192.168.10.180:3306/hive?characterEncoding=UTF-8</value>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>root</value>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>alextong</value>

</property>

</configuration>

5.5 把mySQL的JDBC驱动包复制到Hive的lib目录下

这里下载的版本是:mysql-connector-Java-5.0.8-bin.jar
http://dev.mysql.com/downloads/connector/j/5.0.html
tar xvzf mysql-connector-java-5.0.8.tar.gz

mv mysql-connector-java-5.0.8-bin.jar apache-hive-1.2.1-bin/lib

6、验证:

6.1 先启动hadoop

先启动hadoop

start-dfs.sh

start-yarn.sh

6.2 hive

$hive

xxxxx

>hive

6.3 在hive上建立数据表

6.3.1

hive> show databases;

OK

default

Time taken: 1.078 seconds, Fetched: 1 row(s)

hive>

6.3.2 建数据库 hive

hive>create table test(id int,name string);

6.4 在mysql验证

2)登录mySQL查看meta信息

use hive;

show tables;

mysql> use hive;

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Database changed

mysql> show tables;

+---------------------------+

| Tables_in_hive |

+---------------------------+

| BUCKETING_COLS |

| CDS |

| COLUMNS_V2 |

| DATABASE_PARAMS |

| DBS |

| FUNCS |

| FUNC_RU |

| GLOBAL_PRIVS |

| PARTITIONS |

| PARTITION_KEYS |

| PART_COL_STATS |

| ROLES |

| SDS |

| SD_PARAMS |

| SEQUENCE_TABLE |

| SERDES |

| SERDE_PARAMS |

| SKEWED_COL_NAMES |

| SKEWED_COL_VALUE_LOC_MAP |

| SKEWED_STRING_LIST |

| SKEWED_STRING_LIST_VALUES |

| SKEWED_VALUES |

| SORT_COLS |

| TABLE_PARAMS |

| TAB_COL_STATS |

| TBLS |

| VERSION |

+---------------------------+

select* from TBLS;

成功

6.5 详细验证

6.5.1 文档

root@spark:~# vi add.txt

5

2

:wq

6.5.2 上传hdfs

root@spark:~# hadoop fs -put /home/alex/xdowns/add.txt /user

root@spark:~# hadoop fs -ls /user

-rw-r--r-- 1 root supergroup 148 2016-05-15 16:03 /user/add.txt

6.5.3 hive建表

hive> create table tester(id int);

OK

Time taken: 0.301 seconds

6.5.4 hive load

a.在hdfs上的文件

hive> load data inpath 'hdfs://spark:9000/user/add.txt' into table tester;

Loading data to table default.tester

Table default.tester stats: [numFiles=1, totalSize=3]

OK

load完成后 hdfs上的文件自动删除

6.5.5 hive select查询结果

hive> select * from tester;

OK

5

2

Time taken: 0.313 seconds, Fetched: 2 row(s)

hive>

6.5.6 mysql 查询结果

mysql> SELECT * FROM TBLS;

+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+

| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |

+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+

| 1 | 1463298658 | 1 | 0 | root | 0 | 1 | test | MANAGED_TABLE | NULL | NULL |

| 2 | 1463299661 | 1 | 0 | root | 0 | 2 | testadd | MANAGED_TABLE | NULL | NULL |

| 6 | 1463300857 | 2 | 0 | root | 0 | 6 | testadd | MANAGED_TABLE | NULL | NULL |

| 11 | 1463301301 | 1 | 0 | root | 0 | 11 | test_add | MANAGED_TABLE | NULL | NULL |

| 12 | 1463301398 | 1 | 0 | root | 0 | 12 | tester | MANAGED_TABLE | NULL | NULL |

+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+

5 rows in set (0.01 sec)

b.如果是本地文件

hive> load data local inpath 'add.txt' into table testadd;

退出quit;

7、报错

7.1 java.io

Exception in thread "main"Java.lang.RuntimeException: java.lang.IllegalArgumentException:java.net.URISyntaxException: Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D

办法:
http://blog.csdn.net/zwx19921215/article/details/42776589
把hive-site.xml里所有含system:java.io.tmpdir的像换成绝对路径,比如/usr/local/hive/log

<name>hive.exec.local.scratchdir</name>

<value>/usr/local/hive/log</value>

<description>Local scratch space for Hive jobs</description>

</property>

<property>

<name>hive.downloaded.resources.dir</name>

<value>/user/local/hive/log</value>

<name>hive.querylog.location</name>

<value>/usr/local/hive/log</value>

<description>Location of Hive run time structured log file</description>

</property>

7.2 jline

[ERROR] Terminal initialization failed; falling back to unsupported

java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected

办法:
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started http://stackoverflow.com/questions/28997441/hive-startup-error-terminal-initialization-failed-falling-back-to-unsupporte
vi ~/.bashrc

export HADOOP_USER_CLASSPATH_FIRST=true

source ~/.bashrc

7.3 字符集的问题

2 For direct MetaStore DB connections, we don’t support retries at the client level.

当在Hive中创建表的时候报错:

create table years (year string, event string) row format delimited fields terminated by '\t';

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)

这是由于字符集的问题,需要配置MySQL的字符集:

mysql> alter database hive character set latin1;

四、sqoop2安装,sqoop1见下一步五

(最好在安装sqoop之前,把Hbase和Hive安装上)

1.4.6也支持hadoop 2.6.2

1、下载:
http://mirror.bit.edu.cn/apache/sqoop/1.99.6/ http://mirror.bit.edu.cn/apache/sqoop/1.99.6/sqoop-1.99.6-bin-hadoop200.tar.gz
2、解压:

tar xvzf sqoop-1.99.6-bin-hadoop200.tar.gz

3、环境变量

vi ~/.bashrc

export SQOOP_HOME=/usr/local/sqoop/sqoop-1.99.6-bin-hadoop200

export PATH=$SQOOP_HOME/bin:$PATH

export CATALINA_HOME=$SQOOP_HOME/server

export LOGDIR=$SQOOP_HOME/logs

source ~/.bashrc

4、配置文件

4.1 配置${SQOOP_HOME}/server/conf/catalina.properties 文件【含hive的jar文件替换】

找到common.loader行,删除hadoop和hive所有jar路径,加入本机hadoop2的jar路径 【一行里边,不要换行】

common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/yarn/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/yarn/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/hdfs/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/hdfs/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/mapreduce/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/mapreduce/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/tools/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/tools/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/common/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/common/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/httpfs/tomcat/lib/*.jar,/usr/local/hive/apache-hive-1.2.1-bin/lib/*.jar

【如果还需要导入hive或hbase,对应的jar包也需要加入

由于添加的jar包中包含了log4j.jar,为了防止jar包冲突,删除sqoop中的log4j.jar

[grid@hadoop6 sqoop-1.99.3]$ mv ./server/webapps/sqoop/WEB-INF/lib/log4j-1.2.16.jar ./server/webapps/sqoop/WEB-INF/lib/log4j-1.2.16.jar.bak】

4.2 配置${SQOOP_HOME}/server/conf/sqoop.properties 文件

# Hadoop configuration directory

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop/hadoop-2.6.2/etc/hadoop/

5、替换@LOGDIR@ 和@BASEDIR@ :【可选】

/usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/base

/usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/logs

6、jdbc驱动

然后找到你的数据库jdbc驱动复制到sqoop/lib目录下,如果不存在则创建.

下载mysql驱动包 mysql-connector-java-5.1.16-bin.jar 并放到 /usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/server/lib 目录下

7、启动

7.1先启动hadoop

./start-dfs

./start-yarn

7.2 启动sqoop

7.2.1启动 [root@db12c sqoop]# ./bin/sqoop.sh server start

Sqoop home directory: /home/likehua/sqoop/sqoop

Setting SQOOP_HTTP_PORT: 12000

Setting SQOOP_ADMIN_PORT: 12001

Using CATALINA_OPTS:

Adding to CATALINA_OPTS: -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001

Using CATALINA_BASE: /home/likehua/sqoop/sqoop/server

Using CATALINA_HOME: /home/likehua/sqoop/sqoop/server

Using CATALINA_TMPDIR: /home/likehua/sqoop/sqoop/server/temp

Using JRE_HOME: /usr/local/jdk1.7.0

Using CLASSPATH: /home/likehua/sqoop/sqoop/server/bin/bootstrap.jar

(sqoop服务端是一个跑在tomcat上的服务程序)

[关闭 sqoop server :./bin/sqoop.sh server stop]

7.2.2启动sqoop客户端:

注意:使用sqoop2-shell如果有hadoop jar包warning,说明jar包在4.1时没有配置完全或者有错误,重新按教程配置,注意common.loader不要多行。

此外,sqoop2(1.99.x)没有一些旧命令,比如输入sqoop是不会进入shell的。

[root@db12c sqoop]# bin/sqoop.sh client

Sqoop home directory: /home/likehua/sqoop/sqoop

Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000> show version --all#显示版本:show version --all显示连接器:show connector --all创建连接:create connection --cid 1

client version:

Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b

Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013

server version:

Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b

Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013

Protocol version:

[1]

sqoop:000>

主要参考:http://www.th7.cn/db/nosql/201510/134172.shtml
http://www.cnblogs.com/likehua/p/3825489.html
exit退出

8、sqoop从hive导出数据到mysql

启动

cd /usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/bin

./sqoop2-shell

为客户端配置服务器

sqoop:000> set server --host spark --port 12000 --webapp sqoop

Server is set successfully

五、Sqoop1安装

Hadoop 2.6.2下Sqoop 1.4.6安装以及Hive,HDFS,MySQL导入导出

环境:

Ubuntu 14.04.04 amd64 jdk1.8 Hadoop 2.6.2 ,Hive,Hbase

注:Sqoop2(1.99.6)使用起来有点不顺手,等Sqoop2完善了再用,而用Sqoop1(1.4.6)操作简单,两个都可以选择。

Sqoop2的安装可以参考本人的相关文章。

参考:http://www.tuicool.com/articles/FZRJbuz

1、下载:
http://www.apache.org/dyn/closer.lua/sqoop/1.4.6 http://apache.fayea.com/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
2、解压:

tar -zxvf sqoop-1.4.4-cdh5.1.2.tar.gz

3、配置

cd /usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/conf

cp sqoop-env-template.sh sqoop-env.sh

vi sqoop-env.sh

添加如下:【如果有hive,hbase,zookeeper要改,最好都有】

#Set path to where bin/hadoop is available

export HADOOP_COMMON_HOME=/home/hadoop/hadoop

#Set path to where hadoop-*-core.jar is available

export HADOOP_MAPRED_HOME=/home/hadoop/hadoop

#set the path to where bin/hbase is available

export HBASE_HOME=/home/hadoop/hbase

#Set the path to where bin/hive is available

export HIVE_HOME=/home/hadoop/hive

#Set the path for where zookeper config dir is

export ZOOCFGDIR=/home/hadoop/zookeeper

4、添加MySQL connector jar包

cp ~/hive/lib/mysql-connector-java-5.1.30.jar ~/sqoop/lib/

或者自己下一个放到相应的路径下

5、添加环境变量

vi ~/.bashrc

export SQOOP_HOME=/home/hadoop/sqoop

export PATH=$PATH:$SBT_HOME/bin:$SQOOP_HOME/bin

export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib

source ~/.bashrc

6、测试MySQL数据库的连接

sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P

提示错误:

Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.

添加zookeeper home

vi ~/.bashrc

export ZOOKEEPER_HOME=/opt/zookeeper/zookeeper

export path=${ZOOKEEPER_HOME}/bin:$PATH

重新测试:

sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P

仍有提示ACCUMULO_HOME之类的,不管,输入密码MySQL密码

Enter password:

2016-05-18 19:16:15,336 INFO [main] manager.MySQLManager: Preparing to use a MySQL streaming resultset.

information_schema

hive

mysql

performance_schema

test_hdfs

7、MySQL数据库的表导入HDFS

注意:先启动hadoop,否则有connection refused错误

root@spark:~# /usr/local/hadoop/hadoop-2.6.2/sbin/start-dfs.sh

root@spark:~# /usr/local/hadoop/hadoop-2.6.2/sbin/start-yarn.sh

之后:

root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop import -m 1 --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --target-dir /user/test111

注释:

-m 1是map的数量,

--target-dir必须是空目录,否则报错文件夹已存在,如果想自动删除,改为--delete-target-dir

如果有xxx streaming xxx .close()就添加 --driver com.mysql.jdbc.Driver

参考:http://stackoverflow.com/questions/26375269/sqoop-error-manager-sqlmanager-error-reading-from-database-java-sql-sqlexcept
http://www.cognoschina.net/home/space.php?uid=173321&do=blog&id=121081
8、MySQL数据库的表导入Hive

就是在第7步后边加--hive-import,注意导入的路径是你在hive-site.xml里边指定的路径

root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop import -m 1 --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --target-dir /user/test222 --hive-import

9、HDFS导入MySQL

sqoop export,--export-dir就是hdfs所在路径,其他和第7步一样,注意--table必须是空表,先用mysql创建好

root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop export --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --export-dir /user/test111

10、Hive导入MySQL

与HDFS导入MySQL相同,注意--table必须是空表,先用mysql创建好

root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop export --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --export-dir /user/test222

第二个例子

root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop export --connect jdbc:mysql://192.168.10.180:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable3 --export-dir /hbase/tester

2016-05-18 20:57:29,927 INFO [main] mapreduce.ExportJobBase: Transferred 125 bytes in 50.8713 seconds (2.4572 bytes/sec)

2016-05-18 20:57:29,968 INFO [main] mapreduce.ExportJobBase: Exported 2 records.

11、Sqoop job

sqoop job --create myjob -- import --connect jdbc:mysql://192.168.10.180:3306/test --username root --password 123456 --table mytabs --fields-terminated-by '\t'

其中myjob表示作业名称。在job中保存密码,默认在调用时还会要求输入密码,需要将密码直接保存在job中下次可以免密码直接执行,可以将/conf/sqoop-site.xml中的sqoop.metastore.client.record.password注视去掉。

其他相关作业命令:① job sqoop job --list,查看作业列表;② job sqoop job --delete myjob,删除作业。

可以参考:
http://www.th7.cn/db/mysql/201405/54683.shtml
#########################################

问题解决参考:

hive问题及解决

1.hiveserver2启动后,beeline不能连接的涉及的问题:

原因:权限问题

解决:

/user/hive/warehouse

/tmp

/history (如果配置了jobserver 那么/history也需要调整)

这三个目录,hive在运行时要读取写入目录里的内容,所以把权限放开,设置权限:

hadoop fs -chmod -R 777 /tmp

hadoop fs -chmod -R 777 /user/hive/warehouse

2.beeline 链接拒绝报错信息

原因:官方的一个bug

解决:

hive.server2.long.polling.timeout

hive.server2.thrift.bind.host 注意把host改成自己的host

3.字符集问题、乱码的、显示字符长度问题的

原因:字符集的问题,乱码问题

解决:hive-site.xml中配置的mysql数据库中去 alter database hive character set latin1;

类似附件中的图片显示错误。

4.FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don’t support retries at the client level.)

这个是由于我的mysql不再本地(默认使用本地数据库),这里需要配置远端元数据服务器

hive.metastore.uris

thrift://lza01:9083

Thrift URI for the remote metastore. Used by metastore client to connect to rem

ote metastore. 然后在hive服务端启动元数据存储服务 hive –service metastore

5.FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes

修改mysql的字符集

alter database hive character set latin1;

转载请注明:云帆大数据学院(http://www.yfteach.com) » hive安装问题及解决方法

版权声明:本文为博主原创文章,未经博主允许不得转载。

目录(?)[+]

1 Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT…

当启动Hive的时候报错:

Caused by: javax.jdo.JDOException: Couldnt obtain a new sequence (unique id) : Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited
to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.

NestedThrowables:

java.sql.SQLException: Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is
READ COMMITTED or READ UNCOMMITTED.

1

2

3

1

2

3

这个问题是由于hive的元数据存储MySQL配置不当引起的,可以这样解决:

mysql> set global binlog_format='MIXED';

1

1

2 For direct MetaStore DB connections, we don’t support retries at the client level.

当在Hive中创建表的时候报错:

create table years (year string, event string) row format delimited fields terminated by '\t';

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)

1

2

1

2

这是由于字符集的问题,需要配置MySQL的字符集:

mysql> alter database hive character set latin1;

1

1

3 HiveConf of name hive.metastore.local does not exist

当执行Hive客户端时候出现如下错误:

WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist

1

1

这是由于在0.10 0.11或者之后的HIVE版本 hive.metastore.local 属性不再使用。将该参数从hive-site.xml删除即可。

4 Permission denied: user=anonymous, access=EXECUTE, inode=”/tmp”

在启动Hive报如下错误:

(Permission denied: user=anonymous, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx------

1

1

这是由于Hive没有hdfs:/tmp目录的权限,赋权限即可:

hadoop dfs -chmod -R 777 /tmp

1

1

5 未完待续
http://blog.csdn.net/cjfeii/article/details/49363653
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hadoop hbase maven hive sqoop