hive left outer join where 条件问题
2016-07-13 17:21
134 查看
select count(1) from s_ods_trade where part ='2012-10-31'; 22076 select count(1) from s_ods_trade 104343 select count(1) from s_ods_trade_full where part ='2012-10-31'; 11456 select count(1) from s_ods_trade_full 53049 SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid and a.part='2012-10-31' and b.part='2012-10-31'); 104343 SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid and a.part=b.part and a.part='2012-10-31'); 104343 SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid ) where a.part='2012-10-31' and b.part='2012-10-31'; 11456 SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid and a.part=b.part) where a.part='2012-10-31'; 22076
倒数第二个第三个sql很是不解。
最后一个sql效果相同语句,效率会高一些:
SELECT count(1) FROM (select * from s_ods_trade where part ='2012-10-31') a left outer JOIN (select * from s_ods_trade_full where part ='2012-10-31' ) b ON (a.dp_id = b.dp_id AND a.tid = b.tid);
官网解释https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#:
Joins occur BEFORE WHERE CLAUSES. So, if you want to restrict the OUTPUT of a join, a requirement should be in the WHERE clause, otherwise it should be in the JOIN clause. A big point of confusion for this issue is partitioned tables: SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key) WHERE a.ds='2009-07-07' AND b.ds='2009-07-07' will join a on b, producing a list of a.val and b.val. The WHERE clause, however, can also reference other columns of a and b that are in the output of the join, and then filter them out. However, whenever a row from the JOIN has found a key for a and no key for b, all of the columns of b will be NULL, including the ds column. This is to say, you will filter out all rows of join output for which there was no valid b.key, and thus you have outsmarted your LEFT OUTER requirement. In other words, the LEFT OUTER part of the join is irrelevant if you reference any column of b in the WHERE clause. Instead, when OUTER JOINing, use this syntax: SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key AND b.ds='2009-07-07' AND a.ds='2009-07-07') ..the result is that the output of the join is pre-filtered, and you won't get post-filtering trouble for rows that have a valid a.key but no matching b.key. The same logic applies to RIGHT and FULL joins.
大小: 45 KB
查看图片附件
相关文章推荐
- Android半透明提示效果的实现
- python访问mysql
- 中断与时钟机制
- sqoop导入数据至hive
- 并查集操作
- Android如何解决多次fork进程的问题
- python获取当前 昨天 及所有时间
- A*算法的Javascript实现(最短路径算法)
- 几个经典sql
- 通道控制方式
- 使用HorizontalScrollView简单实现柱状图
- Android三种播放视频的方式
- CSS滤镜详解
- CentOS6.5下安装mongodb
- winform插件机制学习
- solr从数据库为solr_home导入数据
- 最短编辑距离
- 三线程联系输出abc
- Java中的main()方法详解
- 两个有序list合并