您的位置:首页 > 运维架构

hive中求top k的两种方式

2015-06-10 19:20 531 查看
一、用rank() over()

select *
from (
select event_id,
event_name,
channel,
pv,
uv,
rank() over (partition by channel order by pv desc,uv desc) as rank
from (
select event_id,
channel,
event_name,
sum(pv) pv,
sum(uv) uv
from tablename
where hp_cal_dt = '2015-06-09'
group by event_id,
channel,
event_name
) a
)a
where rank < 4

二、用row_number()

select event_id,
event_name,
channel,
pv,
uv
from (
select event_id,
event_name,
channel,
pv,
uv
from (
select event_id,
channel,
event_name,
sum(pv) pv,
sum(uv) uv
from tablename
where hp_cal_dt = '2015-06-09'
group by event_id,
channel,
event_name
)a
distribute by channel
sort by channel,pv desc, uv desc
)a
where row_number(channel)< 4
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: