您的位置:首页 > 其它

Hive表中四种不同数据导出方式以及如何自定义导出列分隔符

2014-12-29 15:00 567 查看
问题导读:

1、Hive表数据四种导出方式是?

2、导出命令中LOCAL的作用及有无的区别?

3、导出命令中是否可以向导入命令一样使用INTO?

4、如何自定义导出文件的列分隔符?

5、hive的-e和-f参数的作用及如何使用其来导出数据?

6、hive shell环境中source命令的作用?

我们以test1表为例演示几种数据导出方式,test1表中的数据内容如下:

hive> select * from test1;
OK
Tom     24.0    NanJing Nanjing University
Jack    29.0    NanJing Southeast China University
Mary Kake       21.0    SuZhou  Suzhou University
John Doe        24.0    YangZhou        YangZhou University
Bill King       23.0    XuZhou  Xuzhou Normal University
Time taken: 0.064 seconds, Fetched: 5 row(s)


一、导出数据至本地文件系统

(1) 导出数据

hive> insert overwrite local directory "/home/hadoopUser/data"
> select name,age,address
> from test1;


/home/hadoopUser/data是一个目录,一个文件或者多个文件将会被写到该目录下,具体个数取决于调用的reducer的个数。此处的local表示导出到本地文件系统,如果不加local,则表示导出到分布式文件系统。

(2) 查看导出的数据

Tom^A24.0^ANanJing
Jack^A29.0^ANanJing
Mary Kake^A21.0^ASuZhou
John Doe^A24.0^AYangZhou
Bill King^A23.0^AXuZhou


在本地文件系统目录/home/hadoopUser/data中生成了一个000000_0文件,里面的内容如上所示。每个列以^A(八进制\001)分隔。

二、导出数据至分布式文件系统

(1) 导出数据

hive> insert overwrite  directory "/output"
> select name,age,address
> from test1;

此处与第一种方法的不同之处就在于没有local关键字,目录/output为分布式文件系统的一个目录,也可以写全:hdfs://master-server/output
(2) 查看导出的数据
结果和第一种方法一样
注意:如果我们将overwrite改成into,就像数据导入表一样,结果会怎么样。

hive> insert into local directory "/home/hadoopUser/data"
> select name,age,address
> from test1;
NoViableAltException(85@[184:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:116)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:4945)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:40208)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.tableOrPartition(HiveParser_IdentifiersParser.java:10233)
at org.apache.hadoop.hive.ql.parse.HiveParser.tableOrPartition(HiveParser.java:40210)
at org.apache.hadoop.hive.ql.parse.HiveParser.insertClause(HiveParser.java:39685)
at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:37647)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:36898)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:36774)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1338)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 1:12 missing TABLE at 'local' near 'into' in table name


和Hive数据导入不一样,不能使用INSERT INTO命令导出表中数据。
三、将表中数据导出至另外一个表中

此种方法与我写的另外一篇博客内容相似,此处不再累赘。数据导入表中的一种方法,即如何将查询结果插入表中,附上链接:通过查询语句向表中插入数据

四、hive -e和-f参数的使用及数据导出

(1) 使用hive -e数据导出

用户可能有时候期望执行一个或者多个查询(使用分号分隔),执行结束后hive CLI立即退出。hive使用-e 命令在终端执行一条或者多条语句,语句使用引号引起来,还可以将查询的结果重定向到一个文件中。

注意:不需要在Hive shell中执行,直接在终端执行即可。

需要先使用"use hive"进入对应的数据库,否则会报错。如下

[hadoopUser@secondmgt ~]$ hive -e "use hive;select * from test1" > /home/hadoopUser/data/myquery
14/12/29 18:53:18 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/12/29 18:53:18 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/12/29 18:53:18 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/12/29 18:53:18 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/12/29 18:53:18 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/12/29 18:53:18 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/12/29 18:53:18 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/12/29 18:53:18 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in file:/home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/conf/hive-log4j.properties
OK
Time taken: 0.597 seconds
OK
Time taken: 0.66 seconds, Fetched: 5 row(s)

查看导出文件结果

Tom     24.0    NanJing Nanjing University
Jack    29.0    NanJing Southeast China University
Mary Kake       21.0    SuZhou  Suzhou University
John Doe        24.0    YangZhou        YangZhou University
Bill King       23.0    XuZhou  Xuzhou Normal University
注意:默认是以\t作为列分隔符的。

(2) 使用hive -f 执行查询文件

有时候我们将多条查询语句写入到一个文件中,使用hive -f 可以执行该文件中的一条或者多条语句。按照惯例,一般把这些Hive查询文件保存为具有.q或者.hql后缀名的文件。

编辑query.hql,内容如下:

use hive;
select name,age from test1;
执行命令及过程如下:

[hadoopUser@secondmgt ~]$ hive -f data/query.hql
14/12/29 19:07:38 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/12/29 19:07:38 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/12/29 19:07:38 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/12/29 19:07:38 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/12/29 19:07:38 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/12/29 19:07:38 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/12/29 19:07:38 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/12/29 19:07:38 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in file:/home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/conf/hive-log4j.properties
OK
Time taken: 0.564 seconds
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1419317102229_0038, Tracking URL = http://secondmgt:8088/proxy/application_1419317102229_0038/ Kill Command = /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop job  -kill job_1419317102229_0038
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-12-29 19:07:55,168 Stage-1 map = 0%,  reduce = 0%
2014-12-29 19:08:06,671 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.93 sec
MapReduce Total cumulative CPU time: 2 seconds 930 msec
Ended Job = job_1419317102229_0038
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 2.93 sec   HDFS Read: 415 HDFS Write: 63 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 930 msec
OK
Tom     24.0
Jack    29.0
Mary Kake       21.0
John Doe        24.0
Bill King       23.0
Time taken: 27.17 seconds, Fetched: 5 row(s)


(3) 在Hive shell中使用source命令执行一个脚本文件

hive> source /home/hadoopUser/data/query.hql;
OK
Time taken: 0.394 seconds
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1419317102229_0039, Tracking URL = http://secondmgt:8088/proxy/application_1419317102229_0039/ Kill Command = /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop job  -kill job_1419317102229_0039
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-12-29 19:13:03,718 Stage-1 map = 0%,  reduce = 0%
2014-12-29 19:13:15,459 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.7 sec
MapReduce Total cumulative CPU time: 2 seconds 700 msec
Ended Job = job_1419317102229_0039
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 2.7 sec   HDFS Read: 415 HDFS Write: 63 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 700 msec
OK
Tom     24.0
Jack    29.0
Mary Kake       21.0
John Doe        24.0
Bill King       23.0
Time taken: 27.424 seconds, Fetched: 5 row(s)


附:

自定义导出文件列分隔符

在默认情况下,从Hive表中导出的文件是以^A作为列分隔符的,但是有时候此方法分隔显示的不直观,为此Hive允许指定列的分隔符。我们还是以上面第一种方法导出表中数据为例。

(1) 导出表中数据

hive> insert overwrite local directory "/home/hadoopUser/data"
> row format delimited
> fields terminated by '\t'
> select * from test1;


此处是以\t作为列的分隔符的。

(2) 查看导出结果

Tom     24.0    NanJing Nanjing University
Jack    29.0    NanJing Southeast China University
Mary Kake       21.0    SuZhou  Suzhou University
John Doe        24.0    YangZhou        YangZhou University
Bill King       23.0    XuZhou  Xuzhou Normal University

和第一种方法默认以^A导出相比,显得更直观了一些

推荐:Hive 五种数据导入方式介绍
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐