Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.3 Hive 快速入门)
2017-08-06 21:15
801 查看
第11章 Hive:SQL on Hadoop
11.3 Hive快速入门
11.3.1 HQL介绍
Hive查询语言(Hive QL,可以简称HQL)的语法和SQL非常相似,实际上HQL基本上实现了SQL-92标准,并做了一些扩展。但是仍然存在一些差异,比如不支持行级操作,不支持事务处理。HQL更接近MySQL的SQL方言,对于熟悉SQL语言的开发者而言,HQL很容易上手。11.3.2 进入Hive Shell
Hive Shell即是Hive的CLI交互模式,输入hive回车即可进入。[root@node3 ~]# hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/opt/hive-2.1.1/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive>
输入“quit;”即可退出HiveShell
hive> quit; [root@node3 ~]#
11.3.3 Hive数据库
(1)显示数据库实例Hive数据库本质上就是表的目录,或者说命名空间。如果创建数据表没有指定数据库,则使用默认数据库default。与MySQL一样,可以通过show databases命令查看已经存在的数据库。
hive> show databases; OK default Time taken: 1.451 seconds, Fetched: 1 row(s) hive>
(2)创建数据库
使用create database语句创建数据库。
hive> create database test; OK Time taken: 0.414 seconds hive> show databases; OK default test Time taken: 0.021 seconds, Fetched: 2 row(s) hive>
(3)查看数据库结构
describe 命令(可以简写desc)来查看数据库定义,包括:数据库名称、数据库在 HDFS 目录、HDFS
hive> desc database test; OK test hdfs://cetc/user/hive/warehouse/test.db root USER Time taken: 0.223 seconds, Fetched: 1 row(s) hive>
Hive会为每个创建的数据库在HDFS上创建一个目录,该数据库的表会以子目录形式存储,表中的数据会以表目录下的文件形式存储。对于default数据库,默认的缺省数据库没有自己的目录,default数据库下的表默认存放在/user/hive/warehouse目录下。
(4)切换当前数据库
hive> use test; OK Time taken: 1.217 seconds hive>
11.3.4 创建表
在默认default数据库下创建表usershive> create table users(id int,name char(20)); OK Time taken: 0.481 seconds hive>
在test数据库下创建表tb1
hive> create table test.tb1( > id int, > name string, > price double > ); OK Time taken: 2.044 seconds hive>
11.3.4 查看表
(1)显示default数据库下的所有表hive> show tables; OK users Time taken: 0.231 seconds, Fetched: 1 row(s) hive>
(2)查看表结构
hive> desc users; OK id int name char(20) Time taken: 0.171 seconds, Fetched: 2 row(s) hive>
(3)查看test数据库下的数据表
hive> use test; OK Time taken: 0.037 seconds hive> show tables; OK tb1 Time taken: 0.036 seconds, Fetched: 1 row(s) hive>
也可以通过下面命令实现上面功能
hive> show tables in test; OK tb1 Time taken: 0.04 seconds, Fetched: 1 row(s) hive>
查看tb1表结构
hive> desc tb1; OK id int name string price double Time taken: 0.138 seconds, Fetched: 3 row(s) hive>
11.3.5 插入数据
(1)显式字段插入hive> insert into users(id,name) values(1,'aa'); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20170810094706_d75cb718-9742-4b25-864c-44a63c27d064 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1502371586068_0001, Tracking URL = http://node1:8088/proxy/application_1502371586068_0001/ Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502371586068_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2017-08-10 09:47:34,602 Stage-1 map = 0%, reduce = 0% 2017-08-10 09:47:50,211 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.36 sec MapReduce Total cumulative CPU time: 2 seconds 360 msec Ended Job = job_1502371586068_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://cetc/user/hive/warehouse/users/.hive-staging_hive_2017-08-10_09-47-06_012_378591815767107640-1/-ext-10000 Loading data to table default.users MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.36 sec HDFS Read: 4225 HDFS Write: 92 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 360 msec OK Time taken: 46.216 seconds hive>
(2)隐式字段插入
hive> insert into users values(2,'bb'); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20170810095011_cf3ab969-c1bb-459b-a1eb-f9b411ec0614 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1502371586068_0002, Tracking URL = http://node1:8088/proxy/application_1502371586068_0002/ Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502371586068_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2017-08-10 09:50:28,186 Stage-1 map = 0%, reduce = 0% 2017-08-10 09:50:41,103 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.34 sec MapReduce Total cumulative CPU time: 2 seconds 340 msec Ended Job = job_1502371586068_0002 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://cetc/user/hive/warehouse/users/.hive-staging_hive_2017-08-10_09-50-11_441_8940366110116441398-1/-ext-10000 Loading data to table default.users MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.34 sec HDFS Read: 4228 HDFS Write: 92 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 340 msec OK Time taken: 32.393 seconds hive>
(3)插入多条记录
hive> insert into users values(3,'cc'),(4,'dd'),(5,'ee'); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20170810095507_48fc0ce3-7789-4427-b35c-3809586f208f Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1502371586068_0003, Tracking URL = http://node1:8088/proxy/application_1502371586068_0003/ Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502371586068_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2017-08-10 09:55:24,363 Stage-1 map = 0%, reduce = 0% 2017-08-10 09:55:36,294 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.35 sec MapReduce Total cumulative CPU time: 2 seconds 350 msec Ended Job = job_1502371586068_0003 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://cetc/user/hive/warehouse/users/.hive-staging_hive_2017-08-10_09-55-07_839_6570192849374755881-1/-ext-10000 Loading data to table default.users MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.35 sec HDFS Read: 4238 HDFS Write: 138 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 350 msec OK Time taken: 31.315 seconds hive>
可见,Hive每次向表中插入一条记录,都会转换为MapReduce程序,效率低下。
11.3.6 简单查询
(1)字段查询hive> select id,name from users; OK 1 aa 2 bb 3 cc 4 dd 5 ee Time taken: 0.224 seconds, Fetched: 5 row(s) hive>
(2)条件查看
hive> select * from users where id=1; OK 1 aa Time taken: 0.399 seconds, Fetched: 1 row(s) hive> select name from users where name like '%a'; OK aa Time taken: 0.214 seconds, Fetched: 1 row(s) hive>
(3)统计查询
hive> select count(*) from users; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20170810095858_fa356df4-8b18-4a5b-835a-643427dd99f6 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1502371586068_0004, Tracking URL = http://node1:8088/proxy/application_1502371586068_0004/ Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502371586068_0004 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2017-08-10 09:59:21,494 Stage-1 map = 0%, reduce = 0% 2017-08-10 09:59:31,484 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.02 sec 2017-08-10 09:59:43,424 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.11 sec MapReduce Total cumulative CPU time: 4 seconds 110 msec Ended Job = job_1502371586068_0004 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.11 sec HDFS Read: 8068 HDFS Write: 101 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 110 msec OK 5 Time taken: 46.061 seconds, Fetched: 1 row(s) hive>
11.3.7 不支持删除修改记录
hive> delete from users where id=1; FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations. hive> update users set name='xx' where id=1; FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations. hive>
表中数据不支持删除修改操作,但是可以截断表和删除整张表。
hive> truncate table users; OK Time taken: 0.273 seconds hive> select * from users; OK Time taken: 0.243 seconds hive>
hive> drop table users; OK Time taken: 2.025 seconds hive>
相关文章推荐
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.4 数据类型和存储格式)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.7 HQL:数据查询)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.5 HQL:DDL数据定义)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.6 HQL:DML数据操纵)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.2 Hive安装与配置)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.8 HQL:排序)(草稿)
- Hadoop基础教程-第2章 Hadoop快速入门(2.4 Hadoop单机运行)
- Hadoop基础教程-第2章 Hadoop快速入门(2.5 WordContent简单应用)
- Hadoop基础教程-第2章 Hadoop快速入门(2.1 Hadoop简介)
- Hadoop基础教程-第2章 Hadoop快速入门(2.3 Hadoop单机配置)
- Hadoop基础教程-第2章 Hadoop快速入门(2.2 Hadoop下载与安装)
- Hadoop基础教程-第4章 HDFS的Java API(4.4 Windows+Eclipse+HDFS快速入门)
- Python 零基础 快速入门 趣味教程 (咪博士 海龟绘图 turtle) 7. 条件循环
- Hadoop基础教程-第6章 MapReduce入门(6.2 解读WordCount)(草稿)
- Hadoop基础教程-第6章 MapReduce入门(6.4 MapReduce程序框架)(草稿)
- 零基础上手Swift快速入门教程概览
- Hadoop基础教程-第12章 Hive:进阶(12.5 Hive外表)(草稿)
- Python 零基础 快速入门 趣味教程 (咪博士 海龟绘图 turtle) 5. 参数
- ThinkPHP3快速入门教程-:基础
- Hadoop入门基础教程 Hadoop之单机模式搭建