5、Hive的一些技巧总结
2016-12-01 00:00
169 查看
#1:inner join中。
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+-------------+----------+-----------+-------------+----------+-----------+--+
或者
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+----------------+-------------+--------------+----------------+-------
##分析:这两种情况就是月份左右的顺序不一致,
第一种属于==>a left join b
第二种属于==>a right join b inner join 是right join
所以根据自己的喜欢a left join b 或者 a right join b
#2:group by 的使用
+-----------------------+--------------------+---------------------+--+
| usermag_tab.username | usermag_tab.month | usermag_tab.salary |
+-----------------------+--------------------+---------------------+--+
| A | 2015-01 | 5 |
| A | 2015-01 | 15 |
| B | 2015-01 | 5 |
| A | 2015-01 | 8 |
| B | 2015-01 | 25 |
| A | 2015-01 | 5 |
| A | 2015-02 | 4 |
| A | 2015-02 | 6 |
| B | 2015-02 | 10 |
| B | 2015-02 | 5 |
+-----------------------+--------------------+---------------------+--+
#3:hive求累计和
对下面的表实现salary按用户、按月统计当月总数,然后再加上一个累计到当前月的总数。(一个sql)
+-----------------------+--------------------+---------------------+--+
| username | month | salary |
+-----------------------+--------------------+---------------------+--+
| A | 2015-01 | 5 |
| A | 2015-01 | 15 |
| B | 2015-01 | 5 |
| A | 2015-01 | 8 |
| B | 2015-01 | 25 |
| A | 2015-01 | 5 |
| A | 2015-02 | 4 |
| A | 2015-02 | 6 |
| B | 2015-02 | 10 |
| B | 2015-02 | 5 |
+-----------------------+--------------------+---------------------+--+
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+-------------+----------+-----------+-------------+----------+-----------+--+
或者
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+----------------+-------------+--------------+----------------+-------
##分析:这两种情况就是月份左右的顺序不一致,
第一种属于==>a left join b
第二种属于==>a right join b inner join 是right join
所以根据自己的喜欢a left join b 或者 a right join b
#2:group by 的使用
+-----------------------+--------------------+---------------------+--+
| usermag_tab.username | usermag_tab.month | usermag_tab.salary |
+-----------------------+--------------------+---------------------+--+
| A | 2015-01 | 5 |
| A | 2015-01 | 15 |
| B | 2015-01 | 5 |
| A | 2015-01 | 8 |
| B | 2015-01 | 25 |
| A | 2015-01 | 5 |
| A | 2015-02 | 4 |
| A | 2015-02 | 6 |
| B | 2015-02 | 10 |
| B | 2015-02 | 5 |
+-----------------------+--------------------+---------------------+--+
需要对这组数据按照每个用户、每个月的访问量进行汇总。(就是分组查询,并且对salary进行累加) select username,month,sum(salary) as salary from usermag_tab group by username,month; 如果需要累计每个用户的访问次数,不按月分,那么可以如下: <select username,month,sum(salary) as salary from usermag_tab group by username;> 上述语句会报错,因为按用户分组的话,month有多个值,而语句中并没有将这些month值进行合并或则去其中一个值,所以会报错。 select username,max(month) as month,sum(salary) as salary from usermag_tab group by username; 这样就可以解决了。
#3:hive求累计和
对下面的表实现salary按用户、按月统计当月总数,然后再加上一个累计到当前月的总数。(一个sql)
+-----------------------+--------------------+---------------------+--+
| username | month | salary |
+-----------------------+--------------------+---------------------+--+
| A | 2015-01 | 5 |
| A | 2015-01 | 15 |
| B | 2015-01 | 5 |
| A | 2015-01 | 8 |
| B | 2015-01 | 25 |
| A | 2015-01 | 5 |
| A | 2015-02 | 4 |
| A | 2015-02 | 6 |
| B | 2015-02 | 10 |
| B | 2015-02 | 5 |
+-----------------------+--------------------+---------------------+--+
思路: 1、第一步,先求每个用户的月总金额 select username,month,sum(salary) as salary from t_access_times group by username,month +-----------+----------+---------+--+ | username | month | salary | +-----------+----------+---------+--+ | A | 2015-01 | 33 | | A | 2015-02 | 10 | | B | 2015-01 | 30 | | B | 2015-02 | 15 | +-----------+----------+---------+--+ 2、第二步,将月总金额表自己连接 自己连接 select A.*,B.* from (select username,month,sum(salary) as salary from t_access_times group by username,month) A inner join (最好使用left join) (select username,month,sum(salary) as salary from t_access_times group by username,month) B on A.username=B.username where B.month <= A.month +-------------+----------+-----------+-------------+----------+-----------+--+ | a.username | a.month | a.salary | b.username | b.month | b.salary | +-------------+----------+-----------+-------------+----------+-----------+--+ | A | 2015-01 | 33 | A | 2015-01 | 33 | | A | 2015-01 | 33 | A | 2015-02 | 10 | | A | 2015-02 | 10 | A | 2015-01 | 33 | | A | 2015-02 | 10 | A | 2015-02 | 10 | | B | 2015-01 | 30 | B | 2015-01 | 30 | | B | 2015-01 | 30 | B | 2015-02 | 15 | | B | 2015-02 | 15 | B | 2015-01 | 30 | | B | 2015-02 | 15 | B | 2015-02 | 15 | +-------------+----------+-----------+-------------+----------+-----------+-- 3、第三步,从上一步的结果中 进行分组查询,分组的字段是a.username a.month 求月累计值: 将b.month <= a.month的所有b.salary求和即可 select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate from (select username,month,sum(salary) as salary from t_access_times group by username,month) A inner join (select username,month,sum(salary) as salary from t_access_times group by username,month) B on A.username=B.username where B.month <= A.month group by A.username,A.month order by A.username,A.month;
相关文章推荐
- jdbc编程中的一些常用的技巧[总结]
- 总结两个Javascript的哈稀对象的一些编程技巧
- 对网站,博客(BaiDu&163&cnbolgs),QQ空间优化(SEO)的一些实战技巧总结
- 个人总结的一些css实用技巧及必须得注意的事项
- 总结一些CSS实用技巧及必须注意的事项
- 再总结一些技巧
- 刚用MVC完成一个小项目,总结一些MVC技巧
- [原创]JS创建页面蒙板的一些知识技巧总结
- Reporting Service 2000的一些技巧总结
- 宏定义的一些使用技巧总结
- 再总结一些技巧
- 一些简单编程技巧的总结(一)
- GridView的一些技巧总结
- javascript创建页面蒙板的一些知识技巧总结第1/3页
- 总结两个Javascript的哈稀对象的一些编程技巧
- 一些英文写作的语言技巧总结
- 一些技巧总结集合对象的查询示例代码 / for each next
- javascript创建页面蒙板的一些知识技巧总结第1/3页
- 总结的一些css实用技巧及必须得注意的事项:
- 宏定义的一些使用技巧总结