您的位置:首页 > 其它

[置顶] hive 动态加载数据到指定分区,以及其他hive使用的技巧

2016-10-26 13:23 627 查看
hive修改分隔符:

alter table tableName set SERDEPROPERTIES('field.delim'='\t'); 

hive根据数据创建分区,并且动态加载数据到分区

insert into table device_status_log partition(  date ) 

select `vin`,`obd_id` , `function_id` , `message_id` ,`message_content` ,

`longitude`,`latitude` ,`speed` ,`engine_speed` ,`gps_stat`,`client_time`,

`create_time`,`analytical_result`,regexp_replace( to_date(create_time ) ,'-','') as date 

from pre_device_status_log ;
如果报如下错误的话
Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
按照提示在hivecCli设置 :set hive.exec.dynamic.partition.mode=nonstrict

Loading data to table obd_message.device_status_log partition (date=null)

         Time taken for load dynamic partitions : 4073

        Loading partition {date=20161020}

        Loading partition {date=20161017}

        Loading partition {date=20161024}

        Loading partition {date=20161021}

        Loading partition {date=20161023}

        Loading partition {date=20161026}

        Loading partition {date=20161015}

        Loading partition {date=20161018}

        Loading partition {date=20161016}

        Loading partition {date=20161019}

        Loading partition {date=20161025}

        Loading partition {date=20161022}

         Time taken for adding to write entity : 6

Partition obd_message.device_status_log{date=20161015} stats: [numFiles=1, numRows=188, totalSize=79565, rawDataSize=79377]

Partition obd_message.device_status_log{date=20161016} stats: [numFiles=1, numRows=648, totalSize=299298, rawDataSize=298650]

Partition obd_message.device_status_log{date=20161017} stats: [numFiles=1, numRows=912, totalSize=414597, rawDataSize=413685]

Partition obd_message.device_status_log{date=20161018} stats: [numFiles=1, numRows=895, totalSize=410935, rawDataSize=410040]

Partition obd_message.device_status_log{date=20161019} stats: [numFiles=1, numRows=1412, totalSize=613903, rawDataSize=612491]

Partition obd_message.device_status_log{date=20161020} stats: [numFiles=1, numRows=475, totalSize=204375, rawDataSize=203900]

Partition obd_message.device_status_log{date=20161021} stats: [numFiles=1, numRows=346, totalSize=142079, rawDataSize=141733]

Partition obd_message.device_status_log{date=20161022} stats: [numFiles=1, numRows=561, totalSize=220711, rawDataSize=220150]

Partition obd_message.device_status_log{date=20161023} stats: [numFiles=1, numRows=856, totalSize=352452, rawDataSize=351596]

Partition obd_message.device_status_log{date=20161024} stats: [numFiles=1, numRows=1997, totalSize=783701, rawDataSize=781704]

Partition obd_message.device_status_log{date=20161025} stats: [numFiles=1, numRows=1384, totalSize=556970, rawDataSize=555586]

Partition obd_message.device_status_log{date=20161026} stats: [numFiles=1, numRows=326, totalSize=133275, rawDataSize=132949]

hive查看分区

show partitions  device_status_log ;
hive正则匹配去除指定分隔符:
create_time 类型为2016-10-10 00:00:00

regexp_replace( to_date(create_time
) ,'-','') as date 

hive 时间函数 添加分钟或者秒

from_unixtime(unix_timestamp(client_time) + 8*3600 ) as client_time 

hive 自带的时间 函数 有date_add(   ) 但是只能对天进行增加减少

date        date(       date_add(   date_sub(   datediff(   datetime    

有些tips
创建hiveInit.sh
编辑内容如下 (此处的目的是为了能够尽量让job在本地执行,缩短等待时间,方便调试):
SET mapred.job.tracker=local;
set mapred.reduce.tasks = 1;
set hive.exec.mode.local.auto.input.files.max=1000;
set hive.exec.mode.local.auto.inputbytes.max=50000000;
set hive.exec.mode.local.auto.tasks.max=10;
set hive.exec.mode.local.auto=true;
set hive.cli.print.current.db=true;
set hive.cli.print.header=true;
show databases;
use obd_message;


在编辑 hiveStart.sh

hive -i hiveInit.sh

然后修改执行权限 在当前目录执行 ./hiveStart.sh  就能以指定的配置启动hiveClient
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐