您的位置:首页 > 数据库

hive学习(2)--- hive sql 常用语法

2014-05-23 20:16 232 查看
hql语法参考文章:http://blog.csdn.net/acmilanvanbasten/article/details/17252673

一、单表select

1、and、sort by 、limit的使用

hive> select * from weather where city='hangzhou' and weath='fine' and minTemperat=-16 sort by pmvalue DESC limit 5;

//查询城市是hangzhou,天气fine并且最低温度是-16° 并且pm值最大的5条记录,结果如下

2014-05-23|07:34:58 China hangzhou fine -16 -10 496

2014-05-23|07:34:58 China hangzhou fine -16 -6 496

2014-05-23|07:34:58 China hangzhou fine -16 14 496

2014-05-23|07:34:58 China hangzhou fine -16 0 496

2014-05-23|07:34:58 China hangzhou fine -16 -8 496

Time taken: 29.266 seconds, Fetched: 5 row(s)

2、in

hive> select * from weather where city='hangzhou' and weath='fine' and minTemperat in (-16,-17) sort by pmvalue DESC limit 10;

2014-05-23|07:34:57 China hangzhou fine -17 -12 498

2014-05-23|07:34:57 China hangzhou fine -17 10 498

2014-05-23|07:34:57 China hangzhou fine -17 -14 498

2014-05-23|07:34:58 China hangzhou fine -17 -7 496

2014-05-23|07:34:58 China hangzhou fine -16 -6 496

2014-05-23|07:34:58 China hangzhou fine -16 6 496

2014-05-23|07:34:58 China hangzhou fine -16 -12 496

2014-05-23|07:34:58 China hangzhou fine -16 1 496

2014-05-23|07:34:58 China hangzhou fine -17 2 496

2014-05-23|07:34:58 China hangzhou fine -17 11 496

Time taken: 29.277 seconds, Fetched: 10 row(s)

3、group by

select * from weather where city='hangzhou' and weath='fine'

and minTemperat in (-16,-17) group by pmvalue;

4、把select的结果输出到文件中:

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/selectResult'

select * from weather where city='hangzhou' and weath='fine' and

minTemperat in (-16,-17) sort by pmvalue DESC limit 100;

二、多表联合select

ps:准备工作,再创建两张表,一张是城市名称与区号的对照表,一张是pm值对应的空气状况表,表结构如下:

create table cityinfo

(name string, number string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

STORED AS TEXTFILE;

create table pminfo

(pmvalue string, pmlevel string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

STORED AS TEXTFILE;

1、3表联合查询,返回所有字段,前5个结果

select *

from cityinfo join weather on (cityinfo.name=weather.city)

join pminfo on (pminfo.pmvalue=weather.pmvalue)

where city='hangzhou' and weath='fine' and minTemperat=-16 limit 5

Job 1: Map: 2 Reduce: 1 Cumulative CPU: 3.34 sec HDFS Read: 229134 HDFS Write: 336 SUCCESS

Total MapReduce CPU Time Spent: 9 seconds 300 msec

OK

hangzhou 0571 2014-05-23|07:33:59 China hangzhou fine -16 11 0 0 A

hangzhou 0571 2014-05-23|07:35:44 China hangzhou fine -16 -5 0 0 A

hangzhou 0571 2014-05-23|07:35:44 China hangzhou fine -16 -14 0 0 A

hangzhou 0571 2014-05-23|07:35:44 China hangzhou fine -16 12 0 0 A

hangzhou 0571 2014-05-23|07:35:44 China hangzhou fine -16 18 0 0 A

Time taken: 34.448 seconds, Fetched: 5 row(s)

2、3表联合查询,使用别名,只返回部分列

select cy.number,wh.*,pm.pmlevel

from cityinfo cy join weather wh on (cy.name=wh.city)

join pminfo pm on (pm.pmvalue=wh.pmvalue)

where wh.city='hangzhou' and wh.weath='fine' and wh.minTemperat=-16 limit 5

Total MapReduce CPU Time Spent: 6 seconds 790 msec

OK

0571 2014-05-23|07:33:59 China hangzhou fine -16 11 0 A

0571 2014-05-23|07:35:44 China hangzhou fine -16 -5 0 A

0571 2014-05-23|07:35:44 China hangzhou fine -16 -14 0 A

0571 2014-05-23|07:35:44 China hangzhou fine -16 12 0 A

0571 2014-05-23|07:35:44 China hangzhou fine -16 18 0 A

3、LEFT,RIGHT 和 FULLOUTER 、聚合等高级特性

//待续

补充知识:join时mp如何工作?

join 时,每次map/reduce 任务的逻辑:

reducer 会缓存 join 序列中除了最后一个表的所有表的记录,再通过最后一个表将结果序列化到文件系统。这一实现有助于在 reduce 端减少内存的使用量。实践中,应该把最大的那个表写在最后(否则会因为缓存浪费大量内存)。例如:

SELECT a.val, b.val, c.val FROM a

JOIN b ON (a.key = b.key1)JOIN c ON (c.key = b.key1)

所有表都使用同一个 join key(使用 1 次map/reduce 任务计算)。Reduce 端会缓存 a 表和 b 表的记录,然后每次取得一个 c 表的记录就计算一次 join 结果,类似的还有:

SELECT a.val, b.val, c.val FROMa

JOIN b ON (a.key = b.key1)JOIN c ON (c.key = b.key2)

这里用了 2 次 map/reduce 任务。第一次缓存 a 表,用 b 表序列化[王黎17] ;第二次缓存第一次
map/reduce 任务的结果,然后用 c 表序列化。

hive hql 语法参考网址:http://blog.csdn.net/hguisu/article/details/7256833
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: