您的位置：首页 > 其它

Hive的两个问题

2016-05-31 21:56 253 查看

Hive的两个问题:
问题一：Too Many Small Partitions

It can be tempting to partition your data into many small partitions to try to increase speed and concurrency.

However, Hive functions best when data is partitioned into larger partitions.

For example, consider partitioning a 100 TB table into 10,000 partitions, each 10 GB in size. In addition,

do not use more than 10,000 partitions per table. Having too many small partitions puts significant strain on the Hive MetaStore and does not

improve performance.

问题二：Hive Queries Fail with "Too many counters" Error

Hive operations use various counters while executing MapReduce jobs.

These per-operator counters are enabled by the configuration setting hive.task.progress.

This is disabled by default; if it is enabled, Hive may create a large number of counters (4 counters per operator, plus another 20).

Note:

If dynamic partitioning is enabled, Hive implicitly enables the counters during data load.

By default, CDH restricts the number of MapReduce counters to 120.

Hive queries that require more counters will fail with the "Too many counters" error.

What To Do

If you run into this error, set mapreduce.job.counters.max in mapred-site.xml to a higher value.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航