您的位置:首页 > 运维架构 > Shell

shell统计Apache访问日志中指定页面的PV、UV等指标

2012-03-23 11:52 746 查看
最近由于运营需要,写了诸多shell脚本统计Apache日志中指定页面的PV、UV等指标。今天把这些脚本贴出来,留待后用。但凡程序,都是有改进的余地的,只看你是否有心去做。

前提:因为项目和电信移动互联网业务相关,所以apache访问日志配置有特殊字段。记录格式大体如下:

10.196.209.221 - - [20/Mar/2012:00:25:58 +0800] "GET http://domain/music/order.php HTTP/1.0" 200 6626 "-" "Mozilla/5.0 (Linux; U; Android 2.2.2; zh-cn; ZTE-C_N880S/N880SV1.0.0B03; 480*800; CTC/2.0) AppleWebkit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 App(CloudShow/1.0)" "CDMA" "1809766****" "-" "115.168.31.65" http://www.zte.com.cn/mobile/uaprof/N880.xml[/code] 需求:提取某天的音乐栏目的PV、UV、电信UV、累计UV以及订购数和下载数

方案:以IMSI号作为区分UV、电信UV的标识

脚本如下:

#!/bin/bash

access_log_path=/home/stat/logs
app=("music" "vedio") #存放有统计需求的栏目标识字段
order=("music") #存放有订购统计需求的栏目标识字段
download=("music" "vedio") #存放有下载统计需求的栏目标识字段

startdate=$1 #统计开始日期
days=$2 #统计日期区间

for((i=0;i<$days;i++));
do
if [ $i -eq 0 ];then
dates[$i]=$startdate
else
dates[$i]=`date -d "+$i day $startdate" +"%Y%m%d"`
fi
grep -E '"GET http://domain/music' $access_log_path/access.log.${dates[$i]} >access.vedio.${dates[$i]}.log
grep -E '"GET http://domain/vedio' $access_log_path/access.log.${dates[$i]} >access.vedio.${dates[$i]}.log
done

get_pv(){
clientname=$1
date=$2
awk -v date=$date 'BEGIN{PV=0}{PV++}END{print date,PV}' access.$clientname.$date.log >> stats.pv.$clientname.log
}

get_uv(){
clientname=$1
date=$2
awk 'BEGIN{FS="[\"\"]"}{
if(p1=match($6,"IMSI_")){
imsi=substr($6,p1+5,length($6)-p1);
if(length(imsi)==15){print imsi}
}
}' access.$clientname.$date.log|sort|uniq >> $clientname.imsi.$date.log

#UV
awk -v dt=$date 'BEGIN{count=0}{count++}END{print dt,count}' $clientname.imsi.$date.log >> stats.uv.$clientname.log
#电信UV
awk -v dt=$date 'BEGIN{count=0}{if(match($0,"46003"))count++}END{print dt,count}' $clientname.imsi.$date.log >> stats.uv.telcom.$clientname.log
if [ -f stats.imsi.$clientname.log ];then
cat stats.imsi.$clientname.log $clientname.imsi.$date.log|awk '{print $1}'|sort|uniq >> $clientname.imsi.log.tmp
else
awk '{print $1}' $clientname.imsi.$date.log >> $clientname.imsi.log.tmp
fi

mv $clientname.imsi.log.tmp stats.imsi.$clientname.log
rm $clientname.imsi.$date.log
awk -v dt=$date 'BEGIN{count=0}{count++}END{print dt,count}' stats.imsi.$clientname.log >> stats.uv.total.$clientname.log
}

get_order(){
clientname=$1
date=$2
targetstr=$3
grep "$targetstr" access.$clientname.$date.log | wc -l | awk -v dt=$date '{print dt,$0}' >> stats.order.$clientname.log
}

get_download(){
clientname=$1
date=$2
targetstr=$3
grep "$targetstr" access.$clientname.$date.log | wc -l | awk -v dt=$date '{print dt,$0}' >> stats.download.$clientname.log
}

for client in ${app[*]}
do
for date in ${dates[*]}
do
get_pv $client $date
get_uv $client $date
done
done

for client in ${order[*]}
do
case "$client" in
"music")
targetstr="order"
;;
*)
targetstr=""
;;
esac

if [ -n "$targetstr" ];then
targetstr="GET http://domain/$client/$targetstr" 
for date in ${dates[*]}
do
get_order $client $date "$targetstr"
done
fi

done

for client in ${download[*]}
do
case "$client" in
"music")
targetstr="download"
;;
"vedio")
targetstr="play"
;;
*)
targetstr=""
;;
esac

if [ -n "$targetstr" ];then
targetstr="GET http://domain/$client/$targetstr" 
for date in ${dates[*]}
do
get_download $client $date "$targetstr"
done
fi

done
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: