Hive ERROR: Out of memory due to hash maps used in map-side aggregation
2014-05-09 18:52
369 查看
当hive在执行大数据量的统计查询语句时,经常会出现下面OOM错误,具体错误提示如下:
查看task的失败信息为:
对于这个错误,通常是由两种情况造成的:(1) hive sql写的不合理,导致执行时hash map过大;(2)hive sql没有优化的余地了(要得到想要的数据只能写这样的sql)。
对于(1)则改变sql语句,从而降低hash map的大小。对于(2)则可以调整参数。
下面分别说明(1)和(2)的情况:
(1)改变sql语句
说明:减少了hash map的key个数
说明:没有减少hash
map的key个数,但是减少了value的大小
(2)调整参数
对于这个sql语句,是没办法进行优化(因为keywords的重复率很低,导致map阶段里面维护的一个内存Map对象非常巨大)来降低hash
map大小的:
与mapjoin和map aggregate相关的优化参数有:
hive.map.aggr
hive.groupby.mapaggr.checkinterval
hive.map.aggr.hash.min.reduction
hive.map.aggr.hash.percentmemory
hive.groupby.skewindata
以上参数可以查看配置文件说明即文档进行调整。如果需求确实没法通过调整这些参数来达到,那么set hive.map.aggr=false便是最终的方案,它肯定能满足你需求,只是执行速度比map join 和 map aggr慢些,但通过实际跑数据你很可能发现其实它也不慢哈。
参考文章:
http://blog.csdn.net/macyang/article/details/9260777 http://www.myexception.cn/open-source/1487747.html http://blog.csdn.net/lixucpf/article/details/20458617
Possible error: Out of memory due to hash maps used in map-side aggregation. Solution: Currently hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower value. i.e 'set hive.map.aggr.hash.percentmemory = 0.25;'
查看task的失败信息为:
Error:GC overhead limit exceeded
对于这个错误,通常是由两种情况造成的:(1) hive sql写的不合理,导致执行时hash map过大;(2)hive sql没有优化的余地了(要得到想要的数据只能写这样的sql)。
对于(1)则改变sql语句,从而降低hash map的大小。对于(2)则可以调整参数。
下面分别说明(1)和(2)的情况:
(1)改变sql语句
select count(distinct v) from tbl; 可以改为select count(1) from (select v from tbl group by v) t;
说明:减少了hash map的key个数
select collect_set(messageDate)[0],count(*) from incidents_hive group by substr(messageDate,8,2); 可以改为select hourNum, count(1) from (select substr(messageDate,9,2) as hourNum from incidents_hive ) t group by hourNum;
说明:没有减少hash
map的key个数,但是减少了value的大小
(2)调整参数
对于这个sql语句,是没办法进行优化(因为keywords的重复率很低,导致map阶段里面维护的一个内存Map对象非常巨大)来降低hash
map大小的:
INSERT OVERWRITE TABLE hbase_table_poi_keywords_count SELECT concat(substr(key,0,8), svccode, keywords), substr(key,0,8), svccode, keywords, count(*) where substr(key,0,8)=\"$yesterday\" AND length(keywords)>0 AND svccode is not null GROUP BY substr(key,0,8),svccode,keywords;
与mapjoin和map aggregate相关的优化参数有:
hive.map.aggr
hive.groupby.mapaggr.checkinterval
hive.map.aggr.hash.min.reduction
hive.map.aggr.hash.percentmemory
hive.groupby.skewindata
以上参数可以查看配置文件说明即文档进行调整。如果需求确实没法通过调整这些参数来达到,那么set hive.map.aggr=false便是最终的方案,它肯定能满足你需求,只是执行速度比map join 和 map aggr慢些,但通过实际跑数据你很可能发现其实它也不慢哈。
参考文章:
http://blog.csdn.net/macyang/article/details/9260777 http://www.myexception.cn/open-source/1487747.html http://blog.csdn.net/lixucpf/article/details/20458617
INSERT OVERWRITE TABLE hbase_table_poi_keywords_count SELECT concat(substr(key,0,8), svccode, keywords), substr(key,0,8), svccode, keywords, count(*) where substr(key,0,8)=\"$yesterday\" AND length(keywords)>0 AND svccode is not null GROUP BY substr(key,0,8),svccode,keywords;
相关文章推荐
- Hive ERROR: Out of memory due to hash maps used in map-side aggregation
- Hive ERROR: Out of memory due to hash maps used in map-side aggregation .
- Hive ERROR: Out of memory due to hash maps used in map-side aggregation
- Out of memory due to hash maps used in map-side aggregation解决办法
- Hive ERROR: Out of memory due to hash maps used in map-side aggregation
- web程序部署到Tomcat服务器报错Deployment is out ofdate due to changes in the underlying project contents...
- Wls815/AIX: Java Core And Out Of Memory Due to Fragment (Doc ID 792960.1)
- [Gradle]Error:java.lang.OutOfMemoryError: Java heap space . Please assign more memory to Gradle in t
- "Cannot allocate memory" OutofMemory when call Ant to build Polish project in Tomcat
- how to catch out of memory exception in c++
- Spark - ERROR Executor: Exception in tjava.lang.OutOfMemoryError: unable to create new native thread
- Tomcat发布项目出错:Deployment is out of date due to changes in the underlying project contents...
- MyEclipse中无法部署项目到tomcat中的解决方法( deployment is out of date due to changes in the underlying ......)
- Insufficient memory<failed to allocate 232852> in cv::OutOfMemoryError
- deployment is out of date due to changes in tho underlying project contents
- [Gradle]Error:java.lang.OutOfMemoryError: Java heap space . Please assign more memory to Gradle in t
- OpenCV Error: Insufficient memory (Failed to allocate 47752340 bytes) in cv::OutOfMemoryError, file
- 解决 deployment is out of dete due to changes in the underlying project contents youll need to
- 解决Tomcat部署Maven异常:Deployment is out of date due to changes in the underlying project contents
- MyEclipse部署项目出错:Deployment is out of date due to changes in the