您的位置:首页 > 运维架构 > Nginx

hive导入nginx日志

2014-03-28 18:04 232 查看
将nginx日志导入到hive中的方法

1 在hive中建表

CREATE TABLE apachelog (ipaddress STRING, identd STRING, user STRING,finishtime STRING,requestline string, returncode INT, size INT,referer string,agent string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe'WITH SERDEPROPERTIES ('serialization.format'='org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol','quote.delim'='("|\\[|\\])','field.delim'=' ','serialization.null.format'='-')STORED AS TEXTFILE;


导入后日志格式为

203.208.60.91   -       -       05/May/2011:01:18:47 +0800      GET /robots.txt HTTP/1.1        404     1238 Mozilla/5.0


第二种方法导入

注意:这个方法在建表后,使用查询语句等前要先执行

hive> add jar /home/hjl/hive/lib/hive_contrib.jar;

CREATE TABLE log (host STRING,identity STRING,user STRING,time STRING,request STRING,status STRING,size STRING,referer STRING,agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?","output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s")STORED AS TEXTFILE;


导入后的格式
203.208.60.91   -       -       [05/May/2011:01:18:47 +0800]    "GET /robots.txt HTTP/1.1"      404     1238 "-"      "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"


如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCE 。

导入日志命令

hive>load data local inpath '/home/log/map.gz' overwrite into table log;

导入日志支持.gz等格式

参考http://www.johnandcailin.com/blog/cailin/exploring-apache-log-files-using-hive-and-hadoop
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: