您的位置:首页 > 其它

Hive中的数据分区

2015-05-24 22:13 706 查看
首先认识什么是分区

Hive 中的分区就是分目录,把一个大的数据集根据业务需要分割成更下的数据集。

1. 如何定义分区,创建分区

hive> create table test(name string,sex int) partitioned by (birth
string, age string);

Time taken: 0.044 seconds

hive> alter table test add partition (birth='1980', age ='30');

Time taken: 0.079 seconds

hive> alter table test add partition (birth='1981', age ='29');

Time taken: 0.052 seconds

hive> alter table test add partition (birth='1982', age ='28');

Time taken: 0.056 seconds

hive> show partitions test;

birth=1980/age =30

birth=1981/age =29

birth=1982/age =28

2. 如何删除分区

hive> alter table test drop partition (birth='1980',age='30');

3. 加载数据到指定分区

load data local inpath '/home/hadoop/data.log' overwrite into table

test partition(birth='1980-01-01',age='30');

创建分区原则: 最少粒度原则

4 向partition_test的分区中插入数据:

hive> insert overwrite table partition_test partition(stat_date='20110728',province='henan') select member_id,name from partition_test_input where stat_date='20110728' and province='henan';

5 还可以同时向多个分区插入数据,0.7版本以后不存在的分区会自动创建,0.6之前的版本官方文档上说必须要预先创建好分区:

hive>

> from partition_test_input

> insert overwrite table partition_test partition (stat_date='20110526',province='liaoning')

> select member_id,name where stat_date='20110526' and province='liaoning'

> insert overwrite table partition_test partition (stat_date='20110728',province='sichuan')

> select member_id,name where stat_date='20110728' and province='sichuan'

> insert overwrite table partition_test partition (stat_date='20110728',province='heilongjiang')

> select member_id,name where stat_date='20110728' and province='heilongjiang';

Total MapReduce jobs = 4
http://biansutao.blog.163.com/blog/static/6702418820115332453560/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: