您的位置：首页 > 其它

hive学习笔记之map-side joins

2015-10-18 20:44 441 查看

在分布式计算框架中，其实表连接这类操作都是需要跨节点的，所以计算效率都比较慢。hive也是如此，针对表连接，hive在大表与小表进行连接时有个优化经常使用，就是map-side join。

比如：

select /*+ mapjoin(u)*/ u.user_id,l.time from user u join opera_log l on u.user_id=l.user_id where l.month='2015-10' ;

较早版本的hive是直接这么显示的使用map-side join

较新的版本舍弃了这种写法，直接使用如下替代：

set hive.auto.convert.join=true; //自动实现map side join

set hive.mapjoin.smalltable.filesize=25000000;//设置小表的数据文件的大小小于25M时，就使用mapside

select u.user_id,l.time from user u join opera_log l on u.user_id=l.user_id where l.month='2015-10' ;

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： hive

相关文章推荐

新的分享

章节导航