hive环境搭建及入门
2013-01-18 10:31
316 查看
hadoop集成hive比较简单,故简单说下,首先先确保已经安装好了hadoop【hadoop安装前面文章已经介绍】,然后下载hive安装包,并解压到相应目录,然后在hive的conf配置文件,编辑hive-env.xml文件,加入如下配置:
export JAVA_HOME=/home/jdk1.6.0_13
export HIVE_HOME=/usr/hadoop/hive-0.9.0/
export PATH=$PATH:/usr/hadoop/hive-0.9.0/bin
export HADOOP_HOME=/usr/hadoop/hadoop-1.0.3
然后保存即可,然后进入bin目录,执行hive进入hive命令行,输入show tables等命令即可进行相关hive命令操作了。以下为一个实例。
1. 首先建立测试数据:/home/nslab/HiveTestData/sample.txt,内容如下:
1990 23 2
1990 21 2
1990 25 2
1991 32 2
1991 34 2
1991 24 2
1992 42 2
1992 32 2
1992 25 2
2. 开启hive,输入下面的命令,得到相应的结果:
语句一:创建表records,每一行内容以 '\t' 分割
语句二:将sample.txt文件中的内容导入到表records中,overwrite表示覆盖原表中的内容。
语句三:选择每一年中最高的气温。
根据得到的信息,hive中的sql操作是使用mapreduce完成的。
[nslab@Namenode hive]$ bin/hive
Logging initialized using configuration in jar:file:/home/nslab/HiveInstall/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive history file=/tmp/nslab/hive_job_log_nslab_201207022051_1374061098.txt
hive> create table records(year string,temperature int,quality int)
> row format delimited
> fields terminated by '\t';
OK
Time taken: 3.504 seconds
hive> load data local inpath '/home/nslab/HiveTestData/sample.txt'
> overwrite into table records;
Copying data from file:/home/nslab/HiveTestData/sample.txt
Copying file: file:/home/nslab/HiveTestData/sample.txt
Loading data to table default.records
Deleted hdfs://10.13.87.212:9000/user/hive/warehouse/records
OK
Time taken: 0.683 seconds
hive> select year, max(temperature) from records group by year;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201207012110_0001, Tracking URL = http://Namenode:50030/jobdetails.jsp?jobid=job_201207012110_0001 Kill Command = /home/nslab/HadoopInstall/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=10.13.87.212:9001 -kill job_201207012110_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2012-07-02 20:58:48,870 Stage-1 map = 0%, reduce = 0%
2012-07-02 20:59:04,186 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:05,201 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:06,275 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:07,291 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:08,301 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:09,314 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:10,377 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:11,535 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:12,566 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:13,579 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:14,591 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:15,609 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:16,656 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:17,667 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:18,701 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:19,755 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:20,772 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:21,797 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:22,825 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
MapReduce Total cumulative CPU time: 3 seconds 320 msec
Ended Job = job_201207012110_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Accumulative CPU: 3.32 sec HDFS Read: 308 HDFS Write: 24 SUCESS
Total MapReduce CPU Time Spent: 3 seconds 320 msec
OK
1990 25
1991 34
1992 42
Time taken: 73.644 seconds
hive>
3. 在hdfs上查看,网页如下:
可见,hive创建的表存放在hdfs上,存放的目录是hive.site.xml中配置,查询操作通过mapreduce作业完成。但是类似语句select * from tablename;并不使用mapreduce job完成。
4. 如果想执行一段已经写好的脚本命令,可以使用下面的命令:
bin/hive -f script.q
script.q为脚本的名称。
5. 在如果想在一个单独的hive绘画中使用特殊的hive设置,可以使用set。
1. 使用set查看设置:
2. 只输入一个set,会列出所有的设置。
3. 设置新的属性,格式类似下面:
set hive.enforce.bucketing=true;
6. hive设置的优先级(从高到低):
1. Hive set命令。
2. 命令行选择 -hiveconf
3. hive-site.xml
4. hive-default.xml
5. hadoop-site.xml(或者是core-site.xml hdfs-site.xml mapred-site.xml)
6. hadoop-default.xml(或者是core-default.xml hdfs-default.xml mapred-default.xml)。
7. hive的日志信息存放在 /tmp/$USER/hive.log,出错时hadoop的mapred task logs也可以查看,本环境在/tmp/nslab下查看。
命令:hive -hiveconf hive.root.logger=DEBUG,console 将调试信息打印到控制台。
8. 如果要通过服务端方式启动hive,则执行hive --service hiveserver 50031 & ,其中&表示后台运行。
[root@bida bin]# jps
30635 Jps
23326 RunJar
可以查看到进程id。
export JAVA_HOME=/home/jdk1.6.0_13
export HIVE_HOME=/usr/hadoop/hive-0.9.0/
export PATH=$PATH:/usr/hadoop/hive-0.9.0/bin
export HADOOP_HOME=/usr/hadoop/hadoop-1.0.3
然后保存即可,然后进入bin目录,执行hive进入hive命令行,输入show tables等命令即可进行相关hive命令操作了。以下为一个实例。
1. 首先建立测试数据:/home/nslab/HiveTestData/sample.txt,内容如下:
1990 23 2
1990 21 2
1990 25 2
1991 32 2
1991 34 2
1991 24 2
1992 42 2
1992 32 2
1992 25 2
2. 开启hive,输入下面的命令,得到相应的结果:
语句一:创建表records,每一行内容以 '\t' 分割
语句二:将sample.txt文件中的内容导入到表records中,overwrite表示覆盖原表中的内容。
语句三:选择每一年中最高的气温。
根据得到的信息,hive中的sql操作是使用mapreduce完成的。
[nslab@Namenode hive]$ bin/hive
Logging initialized using configuration in jar:file:/home/nslab/HiveInstall/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
Hive history file=/tmp/nslab/hive_job_log_nslab_201207022051_1374061098.txt
hive> create table records(year string,temperature int,quality int)
> row format delimited
> fields terminated by '\t';
OK
Time taken: 3.504 seconds
hive> load data local inpath '/home/nslab/HiveTestData/sample.txt'
> overwrite into table records;
Copying data from file:/home/nslab/HiveTestData/sample.txt
Copying file: file:/home/nslab/HiveTestData/sample.txt
Loading data to table default.records
Deleted hdfs://10.13.87.212:9000/user/hive/warehouse/records
OK
Time taken: 0.683 seconds
hive> select year, max(temperature) from records group by year;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201207012110_0001, Tracking URL = http://Namenode:50030/jobdetails.jsp?jobid=job_201207012110_0001 Kill Command = /home/nslab/HadoopInstall/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=10.13.87.212:9001 -kill job_201207012110_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2012-07-02 20:58:48,870 Stage-1 map = 0%, reduce = 0%
2012-07-02 20:59:04,186 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:05,201 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:06,275 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:07,291 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:08,301 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:09,314 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:10,377 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:11,535 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:12,566 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:13,579 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:14,591 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:15,609 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.2 sec
2012-07-02 20:59:16,656 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:17,667 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:18,701 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:19,755 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:20,772 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:21,797 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
2012-07-02 20:59:22,825 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.32 sec
MapReduce Total cumulative CPU time: 3 seconds 320 msec
Ended Job = job_201207012110_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Accumulative CPU: 3.32 sec HDFS Read: 308 HDFS Write: 24 SUCESS
Total MapReduce CPU Time Spent: 3 seconds 320 msec
OK
1990 25
1991 34
1992 42
Time taken: 73.644 seconds
hive>
3. 在hdfs上查看,网页如下:
可见,hive创建的表存放在hdfs上,存放的目录是hive.site.xml中配置,查询操作通过mapreduce作业完成。但是类似语句select * from tablename;并不使用mapreduce job完成。
4. 如果想执行一段已经写好的脚本命令,可以使用下面的命令:
bin/hive -f script.q
script.q为脚本的名称。
5. 在如果想在一个单独的hive绘画中使用特殊的hive设置,可以使用set。
1. 使用set查看设置:
2. 只输入一个set,会列出所有的设置。
3. 设置新的属性,格式类似下面:
set hive.enforce.bucketing=true;
6. hive设置的优先级(从高到低):
1. Hive set命令。
2. 命令行选择 -hiveconf
3. hive-site.xml
4. hive-default.xml
5. hadoop-site.xml(或者是core-site.xml hdfs-site.xml mapred-site.xml)
6. hadoop-default.xml(或者是core-default.xml hdfs-default.xml mapred-default.xml)。
7. hive的日志信息存放在 /tmp/$USER/hive.log,出错时hadoop的mapred task logs也可以查看,本环境在/tmp/nslab下查看。
命令:hive -hiveconf hive.root.logger=DEBUG,console 将调试信息打印到控制台。
8. 如果要通过服务端方式启动hive,则执行hive --service hiveserver 50031 & ,其中&表示后台运行。
[root@bida bin]# jps
30635 Jps
23326 RunJar
可以查看到进程id。
相关文章推荐
- Hive环境搭建与入门
- Hive入门--1.简介与环境搭建
- Hive环境搭建与入门(转)
- Hive 之 入门环境搭建
- Hive环境搭建与入门(转)
- hive数据仓库入门及其环境搭建
- Windows Mobile 的开发环境搭建入门
- React Native 入门了解及环境搭建
- 在Android开发中使用Ant 一:环境的搭建及入门
- ArcGIS API for Silverlight/ 开发入门 环境搭建
- JAVA微信开发教程入门篇 -01 开发环境搭建
- mybatis入门-开发环境搭建
- 【官方搭建入门】JEEWX 捷微开发环境搭建必读
- MyBatis学习:基础环境搭建与入门
- Hive基本原理及环境搭建
- Hive环境搭建
- React初级入门环境搭建
- Android开发入门1之开发环境搭建
- PhoneGap开发环境搭建及入门
- Spring MVC入门基础项目环境搭建