hive筛选tomcat的日志文件到数据库中
2016-06-06 14:58
267 查看
Tomcat 日志文件目录、脚本正则表达式抓取
1、创建hive表:apachelog
语句如下:
CREATE TABLE apachelog (
host STRING,
identity STRING,
t_user STRING,
time STRING,
type STRING,
http STRING,
http_type STRING,
status STRING,
agent STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*?) .*?\\] \"([^ ]*) (.*?)\" ([^ ]*) ([^ ]*)"
)
STORED AS TEXTFILE;
最后load日志文件:
#LOAD DATA LOCAL INPATH 'log日志的绝对目录'
OVERWRITE INTO TABLE apachelog;
2、可以添加一个定时任务每小时去执行日志收集:
crontab -e
*/2400 * * * * /usr/sbin/sh shell脚本
日志格式可以如下:
127.0.0.1 - - [24/Apr/2016:09:55:45 +0800] "GET / HTTP/1.1" 200 11418
127.0.0.1 - - [24/Apr/2016:09:55:47 +0800] "GET / HTTP/1.1" 200 11418
127.0.0.1 - - [24/Apr/2016:09:57:52 +0800] "GET / HTTP/1.1" 200 11418
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET / HTTP/1.1" 200 11418
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /tomcat.css HTTP/1.1" 200 5926
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /tomcat.png HTTP/1.1" 200 5103
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-nav.png HTTP/1.1" 200 1401
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /asf-logo.png HTTP/1.1" 200 17811
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-middle.png HTTP/1.1" 200 1918
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-button.png HTTP/1.1" 200 713
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-upper.png HTTP/1.1" 200 3103
1、创建hive表:apachelog
语句如下:
CREATE TABLE apachelog (
host STRING,
identity STRING,
t_user STRING,
time STRING,
type STRING,
http STRING,
http_type STRING,
status STRING,
agent STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*?) .*?\\] \"([^ ]*) (.*?)\" ([^ ]*) ([^ ]*)"
)
STORED AS TEXTFILE;
最后load日志文件:
#LOAD DATA LOCAL INPATH 'log日志的绝对目录'
OVERWRITE INTO TABLE apachelog;
2、可以添加一个定时任务每小时去执行日志收集:
crontab -e
*/2400 * * * * /usr/sbin/sh shell脚本
日志格式可以如下:
127.0.0.1 - - [24/Apr/2016:09:55:45 +0800] "GET / HTTP/1.1" 200 11418
127.0.0.1 - - [24/Apr/2016:09:55:47 +0800] "GET / HTTP/1.1" 200 11418
127.0.0.1 - - [24/Apr/2016:09:57:52 +0800] "GET / HTTP/1.1" 200 11418
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET / HTTP/1.1" 200 11418
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /tomcat.css HTTP/1.1" 200 5926
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /tomcat.png HTTP/1.1" 200 5103
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-nav.png HTTP/1.1" 200 1401
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /asf-logo.png HTTP/1.1" 200 17811
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-middle.png HTTP/1.1" 200 1918
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-button.png HTTP/1.1" 200 713
0:0:0:0:0:0:0:1 - - [24/Apr/2016:09:57:56 +0800] "GET /bg-upper.png HTTP/1.1" 200 3103
相关文章推荐
- Tomcat内存溢出的三种情况及解决办法分析
- Tomcat项目部署方式
- idea tomcat 虚拟目录的创建
- JAVA网站之Tomcat使用域名与域名解析详解
- Linux下JDK、Tomcat的安装及配置
- JAVA-WEB开发第六讲[2016-06-06]TOMCAT的追加
- centOS下安装tomcat7
- 使用Ant工具编译Tomcat源代码(window环境下)
- Tomcat war包应用发布简述
- 如何在Linux(Ubuntu 14.04)下安装Tomcat
- Tomcat 启动时报错:Error initializing endpoint
- Tomcat日志配置远程Syslog采集
- MyEclipse报错:Target runtime Apache Tomcat 7.0 is not defined
- tomcat 二级域名 session共享
- tomcat中server.xml中sslProtocol="TLS"含义,关闭SSLv3
- tomcat6下jsp出现getOutputStream() has already been called for this response异常的原因和解决方法
- tomcat端口配置
- Apache2.2.11+Tomcat6.0.20集群配置
- tomcat内存溢出,修改catalina.out
- debian7 +tomcat部署war遇到的问题