您的位置:首页 > 其它

5、Hive的一些技巧总结

2016-12-01 00:00 169 查看
#1:inner join中。
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+-------------+----------+-----------+-------------+----------+-----------+--+
或者
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+----------------+-------------+--------------+----------------+-------
##分析:这两种情况就是月份左右的顺序不一致,
第一种属于==>a left join b

第二种属于==>a right join b inner join 是right join
所以根据自己的喜欢a left join b 或者 a right join b

#2:group by 的使用
+-----------------------+--------------------+---------------------+--+
| usermag_tab.username | usermag_tab.month | usermag_tab.salary |
+-----------------------+--------------------+---------------------+--+
| A | 2015-01 | 5 |
| A | 2015-01 | 15 |
| B | 2015-01 | 5 |
| A | 2015-01 | 8 |
| B | 2015-01 | 25 |
| A | 2015-01 | 5 |
| A | 2015-02 | 4 |
| A | 2015-02 | 6 |
| B | 2015-02 | 10 |
| B | 2015-02 | 5 |
+-----------------------+--------------------+---------------------+--+

需要对这组数据按照每个用户、每个月的访问量进行汇总。(就是分组查询,并且对salary进行累加)

select username,month,sum(salary) as salary from usermag_tab group by username,month;

如果需要累计每个用户的访问次数,不按月分,那么可以如下:

<select username,month,sum(salary) as salary from usermag_tab group by username;>

上述语句会报错,因为按用户分组的话,month有多个值,而语句中并没有将这些month值进行合并或则去其中一个值,所以会报错。

select username,max(month) as month,sum(salary) as salary from usermag_tab group by username;  这样就可以解决了。

#3:hive求累计和
对下面的表实现salary按用户、按月统计当月总数,然后再加上一个累计到当前月的总数。(一个sql)
+-----------------------+--------------------+---------------------+--+
| username | month | salary |
+-----------------------+--------------------+---------------------+--+
| A | 2015-01 | 5 |
| A | 2015-01 | 15 |
| B | 2015-01 | 5 |
| A | 2015-01 | 8 |
| B | 2015-01 | 25 |
| A | 2015-01 | 5 |
| A | 2015-02 | 4 |
| A | 2015-02 | 6 |
| B | 2015-02 | 10 |
| B | 2015-02 | 5 |
+-----------------------+--------------------+---------------------+--+

思路:
1、第一步,先求每个用户的月总金额
select username,month,sum(salary) as salary from t_access_times group by username,month

+-----------+----------+---------+--+
| username  |  month   | salary  |
+-----------+----------+---------+--+
| A               | 2015-01  | 33      |
| A               | 2015-02  | 10      |
| B               | 2015-01  | 30      |
| B               | 2015-02  | 15      |
+-----------+----------+---------+--+

2、第二步,将月总金额表自己连接 自己连接
select A.*,B.* from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join (最好使用left join)
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A                  | 2015-01  | 33             | A                 | 2015-01  | 33        |
| A                  | 2015-01  | 33             | A                 | 2015-02  | 10        |
| A                  | 2015-02  | 10             | A                 | 2015-01  | 33        |
| A                  | 2015-02  | 10             | A                 | 2015-02  | 10        |
| B                  | 2015-01  | 30             | B                 | 2015-01  | 30        |
| B                  | 2015-01  | 30             | B                 | 2015-02  | 15        |
| B                  | 2015-02  | 15             | B                 | 2015-01  | 30        |
| B                  | 2015-02  | 15             | B                 | 2015-02  | 15        |
+-------------+----------+-----------+-------------+----------+-----------+--

3、第三步,从上一步的结果中
进行分组查询,分组的字段是a.username a.month
求月累计值:  将b.month <= a.month的所有b.salary求和即可
select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息