您的位置:首页 > 运维架构

zabbix监控haproxy, zabbix监控DRBD状态 ,zabbix监控lvs连接

2015-05-15 15:18 661 查看
zabbix监控haproxy

http://john88wang.blog.51cto.com
使用HAProxy+Keepalived的方式部署游戏服务器前端负载均衡和高可用,因此需要对HAProxy的监控状况进行实时监控.
本文使用的HAProxy版本是1.4.24
参考官方文档http://cbonte.github.io/haproxy-dconv/configuration-1.4.html 中的

Statistics and monitoring

https://github.com/olindata/tribily-zabbix-templates/tree/master/App_HAProxy

https://github.com/jlyheden/zabbix_scripts/tree/master/haproxy

1.监控原理描述

HAProxy提供HTTP页面和状态Unix Socket可以显示HAProxy的状态信息,并且可以以CSV的格式导出。

HTTP页面可以通过类似http://10.10.41.100/status;csv 的方式查看
Unix Socket可以通过
echo "show info;show stat" | sudo socat stdio unix-connect:/tmp/haproxy

本文主要通过第二种方式获取HAProxy的状态信息
在haproxy.cfg配置文件中设置状态socket
stats socket /tmp/haproxy level admin

level后面可以跟级别user,operator,admin
user是最低权限级别,只能看到一些非敏感信息
operator可以看到全部信息,但是只能修改一些非敏感信息
admin可以看到并且操作所有信息,需要慎用.

$echo "show help" | /usr/bin/sudo /usr/bin/socat stdio unix-connect:/tmp/haproxy
Unknown command. Please enter one of the following commands only :
clear counters : clear max statistics counters (add 'all' for all counters)
help : this message
prompt : toggle interactive mode with prompt
quit : disconnect
show info : report information about the running process
show stat : report counters for each proxy and server
show errors : report last request and response errors for each proxy
show sess [id] : report the list of current sessions or dump this session
get weight : report a server's current weight
set weight : change a server's weight
set timeout : change a timeout setting
disable server : set a server in maintenance mode
enable server : re-enable a server that was previously in maintenance mode
show info 报告当前的HAProxy进程信息

Name: HAProxy
Version: 1.4.24
Release_date: 2013/06/17
Nbproc: 1
Process_num: 1
Pid: 7020
Uptime: 110d 16h25m55s
Uptime_sec: 9563155
Memmax_MB: 0
Ulimit-n: 131101
Maxsock: 131101
Maxconn: 65536
Maxpipes: 0
CurrConns: 14
PipesUsed: 0
PipesFree: 0
Tasks: 26
Run_queue: 1
node: master_loadbalance1
description: lb1

show stat显示HAProxy各个指标的计数
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkf
ail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_cod
e,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,
srv_abrt,
login_game_pool,FRONTEND,,,24,868,2000,196721023,87244966860,121969199234,0,0,171448,,,,,OPEN,,,,,,,,,1,1,0,,,,0,95,0,628
,,,,0,195071390,0,1619236,28338,2034,,93,611,196721000,,,
login_pool,web1_80,0,0,0,38,2000,8333681,2356031055,2827436427,,0,,0,3,2211,11,UP,30,1,0,902,0,9558963
,0,,1,2,1,,8329209,,2,1,,199,L7OK,200,1,20,7967292,0,361648,7,0,0,,,,136,0,
login_pool,web2_80,0,0,0,63,2000,8333998,2358035705,2826639220,,0,,1,6,2281,13,UP,30,1,0,861,0,9558963

0. pxname: proxy name
1. svname: service name (FRONTEND for frontend, BACKEND for backend, any name
for server)
2. qcur: current queued requests
3. qmax: max queued requests
4. scur: current sessions
5. smax: max sessions
6. slim: sessions limit
7. stot: total sessions
8. bin: bytes in
9. bout: bytes out
10. dreq: denied requests
11. dresp: denied responses
12. ereq: request errors
13. econ: connection errors
14. eresp: response errors (among which srv_abrt)
15. wretr: retries (warning)
16. wredis: redispatches (warning)
17. status: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)
18. weight: server weight (server), total weight (backend)
19. act: server is active (server), number of active servers (backend)
20. bck: server is backup (server), number of backup servers (backend)
21. chkfail: number of failed checks
22. chkdown: number of UP->DOWN transitions
23. lastchg: last status change (in seconds)
24. downtime: total downtime (in seconds)
25. qlimit: queue limit
26. pid: process id (0 for first instance, 1 for second, ...)
27. iid: unique proxy id
28. sid: service id (unique inside a proxy)
29. throttle: warm up status
30. lbtot: total number of times a server was selected
31. tracked: id of proxy/server if tracking is enabled
32. type (0=frontend, 1=backend, 2=server, 3=socket)
33. rate: number of sessions per second over last elapsed second
34. rate_lim: limit on new sessions per second
35. rate_max: max number of new sessions per second
36. check_status: status of last health check, one of:
UNK     -> unknown
INI     -> initializing
SOCKERR -> socket error
L4OK    -> check passed on layer 4, no upper layers testing enabled
L4TMOUT -> layer 1-4 timeout
L4CON   -> layer 1-4 connection problem, for example
"Connection refused" (tcp rst) or "No route to host" (icmp)
L6OK    -> check passed on layer 6
L6TOUT  -> layer 6 (SSL) timeout
L6RSP   -> layer 6 invalid response - protocol error
L7OK    -> check passed on layer 7
L7OKC   -> check conditionally passed on layer 7, for example 404 with
disable-on-404
L7TOUT  -> layer 7 (HTTP/SMTP) timeout
L7RSP   -> layer 7 invalid response - protocol error
L7STS   -> layer 7 response error, for example HTTP 5xx
37. check_code: layer5-7 code, if available
38. check_duration: time in ms took to finish last health check
39. hrsp_1xx: http responses with 1xx code
40. hrsp_2xx: http responses with 2xx code
41. hrsp_3xx: http responses with 3xx code
42. hrsp_4xx: http responses with 4xx code
43. hrsp_5xx: http responses with 5xx code
44. hrsp_other: http responses with other codes (protocol error)
45. hanafail: failed health checks details
46. req_rate: HTTP requests per second over last elapsed second
47. req_rate_max: max number of HTTP requests per second observed
48. req_tot: total number of HTTP requests received
49. cli_abrt: number of data transfers aborted by the client
50. srv_abrt: number of data transfers aborted by the server (inc. in eresp)
需要注意的是如果HAProxy是以多进程方式启动即设置nbproc的值不为1,那么每个进程都可以通过socket显示它的状态信息,所以看到的状态信息是在多个进程间切换的。

2.监控脚本编写
这里有三个监控脚本

haproxy_info.sh 用于收集HAProxy的基本信息
haproxy_pool_discovery.py

用于zabbix通过LLD功能发现各个pool.如:
login_pool:BACKEND,login_pool:web1_80等,通过低级发现可以动态的根据配置文件中配置的后端主机监控各个后端主机的状态
haproxy_stat.sh
通过向stat socket发送show
stat命令收集各个状态的值,脚本中会根据,进行判断第二个字段的值,因为有些字段是只有FRONTEND或BACKEND才会有,或者除了
FRONTEND和BACKEND,其他都有等
haproxy_info.sh
#!/bin/bash
#This script is used for getting haproxy info such as version ,uptime and number of processes etc

metric=$1
stats_socket=/tmp/haproxy
info_file=/tmp/haproxy_info.csv
echo "show info"|/usr/bin/sudo /usr/bin/socat   unix-connect:$stats_socket  stdio > $info_file
grep $metric $info_file|awk '{print $2}'


haproxy_pool_discovery.py
需要安装socat并且要设置zabbix客户端用户具有sudo权限执行socat
执行vim sudo命令更改
如下
#
# Disable "ssh hostname sudo <cmd>", because it will show the password in clear.
#         You have to run "ssh -t hostname sudo <cmd>".
#
Defaults    !requiretty

zabbixagent   ALL=(root)      NOPASSWD:/usr/bin/socat
#/usr/bin/python
#This script is used to discovery disk on the server
import subprocess
import json
args='''echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy|egrep -v '^#|^$'|awk -F',' '{print $1":"$2}' '''

t=subprocess.Popen(args,shell=True,stdout=subprocess.PIPE).communicate()[0]
pools=[]
for pool in t.split('\n'):
if len(pool) != 0:
pools.append({'{#POOL_NAME}':pool})
print json.dumps({'data':pools},indent=4,separators=(',',':'))
执行结果
{
"data":[
{
"{#POOL_NAME}":"login_game_pool:FRONTEND"
},
{
"{#POOL_NAME}":"login_pool:web1_80"
},
{
"{#POOL_NAME}":"login_pool:web2_80"
},
{
"{#POOL_NAME}":"login_pool:BACKEND"
},

]
}
haproxy_stat.sh
#!/bin/bash
# login_game_pool:FRONTEND
pool_name=$(echo $1|awk -F':' '{print $1}')
server_name=$(echo $1|awk -F':' '{print $2}')
metric=$2
stat_socket=/tmp/haproxy
stat_file=/tmp/haproxy_stat.csv
echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy > $stat_file

case $metric in
qcur)
#current queued requests
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $3}' $stat_file
else
echo 0
fi
;;
qmax)
#max queued requests
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $4}' $stat_file
else
echo 0
fi
;;
scur)
#current sessions
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $5}' $stat_file
;;
smax)
#max sessions
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $6}' $stat_file
;;
slim)
#sessions limit
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $7}' $stat_file
;;
stol)
#total sessions
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $8}' $stat_file
;;
bin)
#bytes in
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $9}' $stat_file
;;
bout)
#bytes out
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $10}' $stat_file
;;
dreq)
#denied requests
#only FRONTEND and BACKEND has this field
if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $11}' $stat_file
else
echo 0
fi
;;
dresp)
#denied responses
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $12}' $stat_file
;;
ereq)
#request errors
#only FRONTEND has this field
if [ "$server_name" == "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $13}' $stat_file
else
echo 0
fi
;;
econ)
#connection errors
#FRONTEND has not this field
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $14}' $stat_file
else
echo 0
fi
;;
eresp)
#response errors
#FRONTEND has not this field
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $15}' $stat_file
else
echo 0
fi
;;
status)
#status
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $18}' $stat_file
;;
chkfail)
#number of failed checks
#FRONTEND and BACKEND has not this field
if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then
echo 0
else
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $22}' $stat_file
fi
;;
chkdown)
#number of UP->DOWN transitions
#FRONTEND has not this field will return 0
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $23}' $stat_file
else
echo 0
fi
;;
lastchg)
#last status change in seconds
#FRONTEND has not this field will return 0
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $24}' $stat_file
else
echo 0
fi
;;
downtime)
#total downtime in seconds
#FRONTEND has not this field will return 0
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $25}' $stat_file
else
echo 0
fi
;;
lbtot)
#total number of times a server was selected
#FRONTEND has not this field
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $31}' $stat_file
else
echo 0
fi
;;
rate)
#number of sessions per second over last elapsed second
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $34}' $stat_file
;;
rate_limit)
#limit on new sessions per second
#only FRONTEND has this field
if [ "$server_name" == "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $35}' $stat_file
else
echo 0
fi
;;
rate_max)
#max number of new sessions per second
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $36}' $stat_file
;;
check_status)
#status of last health check
if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then
echo "NULL"
else
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $37}' $stat_file
fi
;;
hrsp_1xx)
#http response with 1xx code
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $40}' $stat_file
;;
hrsp_2xx)
#http response with 2xx code
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $41}' $stat_file
;;
hrsp_3xx)
#http response with 3xx code
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $42}' $stat_file
;;
hrsp_4xx)
#http response with 4xx code
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $43}' $stat_file
;;
hrsp_5xx)
#http response with 5xx code
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $44}' $stat_file
;;
req_rate)
#HTTP requests per second over last elapsed second
#only FRONTEND has this field,others will return 0
if [ "$server_name" == "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $47}' $stat_file
else
echo 0
fi
;;
req_rate_max)
#max number of HTTP requests per second observed
#only FRONTEND has this field,others will return 0
if [ "$server_name" == "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $48}' $stat_file
else
echo 0
fi
;;
req_tot)
#total number of HTTP requests recevied
#only FRONTEND has this field,others will return 0
if [ "$server_name" == "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $49}' $stat_file
else
echo 0
fi
;;
*)
echo "please input the correct argument"
;;
esac
3.zabbix配置文件更改
在/data/zabbix/etc/zabbix_agentd.conf.d/中添加haproxy_status.conf
### Option: UserParameter
#   User-defined parameter to monitor. There can be several user-defined parameters.
#   Format: UserParameter=<key>,<shell command>
#   See 'zabbix_agentd' directory for examples.
#
# Mandatory: no
# Default:
# UserParameter=
UserParameter=haproxy.info[*],/usr/local/zabbix/bin/haproxy_info.sh $1
UserParameter=haproxy.discovery,/usr/bin/python /usr/local/zabbix/bin/haproxy_pool_discovery.py
UserParameter=haproxy.stat[*],/usr/local/zabbix/bin/haproxy_stat.sh $1 $2
4.添加zabbix模板
















详细模板参考附件。
http://john88wang.blog.51cto.com/2165294/1568541

使用zabbix监控DRBD状态
/article/4405070.html

线上采用DRBD+Heartbeat+MySQL的方式部署MySQL高可用架构,所以对DRBD的监控也很重要。

一 监控原理
1.使用drbd-overview
$ drbd-overview
0:??not-found??  Connected Primary/Secondary UpToDate/UpToDate C r----- /database ext4 50G 3.7G 44G 8%
如果不是root权限,将不会看到resource名称。
$ sudo drbd-overview
0:r0  Connected Primary/Secondary UpToDate/UpToDate C r----- /database ext4 50G 3.7G 44G 8%
2.查看/proc/drbd
$ cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2013-09-27 16:00:43
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:1200291012 nr:7644 dw:1200298728 dr:1575405 al:19036 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
C 这个位置表示同步协议是协议C , 可以是B 或 A

I/O 状态标志,共有6个标志位,表示关于这个资源的I/O操作状态信息

r-----
1.I/O suspension 。要么是r表示正在运行,要么是s表示暂停
2.Serial resynchronization。通常情况下是-
3.Peer-initiated sync suspension. 通常情况下是-
4.Peer-initiated sync suspension. 通常情况下是-
5.Locally blocked I/O。通常情况下是-
6.Activity Log update suspension. 通常情况下是-

cs Connetction State 显示定义resource的连接状态
可以有以下几种连接状态:
StandAlone
Disconnecting

[b]
Unconnected
[/b]
[b]
    [b]BrokenPipe
[/b][/b]
[b]
[b]    [b]NetworkFailure
[/b][/b][/b]
[b]
[b][b]    [b]Connected   正常状态
[/b][/b][/b][/b]
[b]
[b][b]    等等
[/b][/b][/b]
ds disk states 显示磁盘状态
先显示本地磁盘状态,然后再显示远程主机磁盘状态,它们都可能是以下几种状态:
Diskless

Attaching

Failed

Negotiating

Inconsistent

Outdated

DUnknown

Consistent

UpToDate   这个状态表示数据同步一致,是正常状态


ro 资源角色类型

Primary 可读可写
Secondary 不可读不可写
Unknown 这个状态只发生在远端主机

ns network send 发送的数据量,以KBytes表示
nr network received 接收的数据量,以KBytes表示
dw disk write 写入到本地磁盘的数据量,以KBytes表示

dr disk read 从本地读取的数据量,以KBytes表示

al activity log DRBD元数据中活动日志位置更新次数

bm bitmap DRBD元数据中bitmap位置更新次数

lo local count 本地I/O子系统有关DRBD的请求数量
pe pending 已经发送到对端但是还没有得到响应的请求数量

ua unacknowledged 对端通过网络接收到的请求数量,但是它们还没有被答复
ap application pending Number of block I/O requests forwarded to DRBD, but not yet answered by DRBD.
ep
(epochs).
Number of epoch objects. Usually 1. Might increase under I/O load when using either the
barrier
or the
none
write ordering method.
wo
(write order).
Currently used write ordering method:
b
(barrier),
f
(flush),
d
(drain) or
n
(none).
oos
(out of sync).
Amount of storage currently out of sync; in Kibibytes.

3.使用service drbd status查看
$ sudo service drbd status
drbd driver loaded OK; device status:
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2013-09-27 16:00:43
m:res  cs         ro                 ds                 p  mounted    fstype
0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /database  ext4
二 监控脚本编写
一般情况下,在生产服务器上只需定义一个resource,便于维护。所以,这里讨论只有一个DRBD resource的监控方法,如果有多个resource可以通过Zabbix的自动发现功能。

drbd_status.sh
#!/bin/bash
#gather drbd status via /proc/drbd
#$ cat /proc/drbd
#version: 8.3.16 (api:88/proto:86-97)
#GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by phil@Build64R6, 2013-09-27 16:00:43
# 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
#    ns:1202588332 nr:7644 dw:1202596048 dr:1575405 al:19216 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
#assume only one resource defined,default is r0

status_file=/proc/drbd
metric=$1

case $metric in
version)
cat $status_file|grep "version"|awk '{print $2}'
;;
name)
cat $status_file|grep "cs"|awk '{print $1}'|tr -d ":"
;;
cs)
cat $status_file|grep "cs"|awk '{print $2}'|awk -F":" '{print $2}'
;;
ro)
cat $status_file|grep -v "version"|grep "ro"|awk '{print $3}'|awk -F":" '{print $2}'
;;
ds)
cat $status_file|grep -v "version"|grep "ds"|awk '{print $4}'|awk -F":" '{print $2}'
;;
protocol)
cat $status_file|grep -v "version"|grep "cs"|awk '{print $5}'
;;
ns)
cat $status_file|grep "ns"|awk '{print $1}'|awk -F":" '{print $2}'
;;
nr)
cat $status_file|grep "nr"|awk '{print $2}'|awk -F":" '{print $2}'
;;
dw)
cat $status_file|grep "dw"|awk '{print $3}'|awk -F":" '{print $2}'
;;
dr)
cat $status_file|grep "dr"|awk '{print $4}'|awk -F":" '{print $2}'
;;
al)
cat $status_file|grep "al"|awk '{print $5}'|awk -F":" '{print $2}'
;;
bm)
cat $status_file|grep "bm"|awk '{print $6}'|awk -F":" '{print $2}'
;;
lo)
cat $status_file|grep "lo"|awk '{print $7}'|awk -F":" '{print $2}'
;;
pe)
cat $status_file|grep "pe"|awk '{print $8}'|awk -F":" '{print $2}'
;;
ua)
cat $status_file|grep "ua"|awk '{print $9}'|awk -F":" '{print $2}'
;;
ap)
cat $status_file|grep -v "version"|grep "ap"|awk '{print $10}'|awk -F":" '{print $2}'
;;
ep)
cat $status_file|grep "ep"|awk '{print $11}'|awk -F":" '{print $2}'
;;
wo)
cat $status_file|grep "wo"|awk '{print $12}'|awk -F":" '{print $2}'
;;
oos)
cat $status_file|grep "oos"|awk '{print $13}'|awk -F":" '{print $2}'
;;
*)
echo "unknown parameters"
esac
添加zabbix子配置文件drbd_status.conf
UserParameter=drbd.status[*],/usr/local/zabbix/bin/drbd_status.sh $1
三 添加监控模板











这里注意一下触发表达式
{Template DRBD:drbd.status[ro].str(Secondary/Primary)}#1 & {Template DRBD:drbd.status[ro].str(Primary/Secondary)}#1

参考文章:
http://blog.pandorafms.org/?p=1944
http://drbd.linbit.com/docs/about/

zabbix监控lvs连接

一、环境说明

zabbix:2.0.6
ipvsadm:1.24
OS:CentOS 6.4 x86
dip:192.168.100.14
rip:192.168.100.22
rip:192.168.100.24
rip:192.168.100.76
rip:192.168.100.101

二、新建脚本
[root@lvs-master zabbix]# pwd
/data/zabbix/sbin

[root@lvs-master zabbix]# cat lvs-status.sh
#!/bin/bash
# get lvs connection

function AllConn {
sudo /sbin/ipvsadm -L -n |awk '{print $5}'| awk 'BEGIN{sum=0}{sum+=$1}END{print sum}'
}
function 101Conn {
sudo /sbin/ipvsadm -L -n | grep 100.101|awk '{print $5}'
}
function 22Conn {
sudo /sbin/ipvsadm -L -n | grep 100.22|awk '{print $5}'
}
function 24Conn {
sudo /sbin/ipvsadm -L -n | grep 100.24|awk '{print $5}'
}
function 76Conn {
sudo /sbin/ipvsadm -L -n | grep 100.76|awk '{print $5}'
}

function AllInConn {
sudo /sbin/ipvsadm -L -n |awk '{print $6}'| awk 'BEGIN{sum=0}{sum+=$1}END{print sum}'
}
function 101InConn {
sudo /sbin/ipvsadm -L -n | grep 100.101|awk '{print $6}'
}
function 22InConn {
sudo /sbin/ipvsadm -L -n | grep 100.22|awk '{print $6}'
}
function 24InConn {
sudo /sbin/ipvsadm -L -n | grep 100.24|awk '{print $6}'
}
function 76InConn {
sudo /sbin/ipvsadm -L -n | grep 100.76|awk '{print $6}'
}

# Run the requested function
$1

三、修改配置文件
[root@lvs-master zabbix]# vim zabbix_agentd.conf
### ipvsadm Active #####
UserParameter=lvs.AllConn[*],/etc/zabbix/lvs-status.sh AllConn
UserParameter=lvs.101Conn[*],/etc/zabbix/lvs-status.sh 101Conn
UserParameter=lvs.22Conn,/etc/zabbix/lvs-status.sh 22Conn
UserParameter=lvs.24Conn,/etc/zabbix/lvs-status.sh 24Conn
UserParameter=lvs.76Conn,/etc/zabbix/lvs-status.sh 76Conn
### ipvsadm InActive #####
UserParameter=lvs.AllInConn,/etc/zabbix/lvs-status.sh AllInConn
UserParameter=lvs.101InConn,/etc/zabbix/lvs-status.sh 101InConn
UserParameter=lvs.22InConn,/etc/zabbix/lvs-status.sh 22InConn
UserParameter=lvs.24InConn,/etc/zabbix/lvs-status.sh 24InConn
UserParameter=lvs.76InConn,/etc/zabbix/lvs-status.sh 76InConn
[root@lvs-master zabbix]# chmod +x lvs-status.sh

四、排错

由于之前lvs-status.sh 脚本没有加入sudo ,所以看agent日志报如下:
[root@lvs-master zabbix]# tail -f /tmp/zabbix_agentd.log
Can't initialize ipvs: Permission denied (you must be root)
Are you sure that IP Virtual Server is built in the kernel or as module?

解决办法是visudo 修改如下:
[root@lvs-master ~]# visudo
#Defaults requiretty
添加
zabbix ALL=(ALL) NOPASSWD:/sbin/ipvsadm
重启zabbix_agentd服务
service zabbix_agentd restart

五、zabbix server 测试
[root@jumper ~]# zabbix_get -s 192.168.100.14 -p 10050 -k "lvs.AllConn"
2326


六,创建lvs模板



添加二个应用集



接下来创建监控的key










添加图形






最后定义触发器的值



模板在附件里面。

七,相关名词解释

lvs中ipvsadm的ActiveConn和InActConn理解
lvs的activeconn是个让人很迷惑的东东.每次看到这个数巨大而真实机上的活动连接数并不是很高,都很奇怪。
ActiveConn是活动连接数,也就是tcp连接状态的ESTABLISHED;
InActConn是指除了ESTABLISHED以外的,所有的其它状态的tcp连接。
为什么从lvs里看的ActiveConn会比在真实机上通过netstats看到的ESTABLISHED高很多呢?
原来lvs自身有一个默认超时时间.可以用ipvsadm -L --timeout查看,默认是900 120 300,分别是TCP
TCPFIN
UDP的时间.也就是说一条tcp的连接经过lvs后,lvs会把这台记录保存15分钟,而不管这条连接是不是已经失效!所以如果你的服务器在15分钟以内有大量的并发请求连进来的时候,你就会看到这个数值直线上升.
其实很多时候,我们看lvs的这个连接数是想知道现在的每台机器的真实连接数,怎么样做到这一点呢?
其实知道现在的ActiveConn是怎样产生的,做到这一点就简单了.举个例子:比如你的lvs是用来负载网站,用的模式是dr,后台的web
server用的nginx.这时候一条请求过来,在程序没有问题的情况下,一条连接最多也就五秒就断开了.这时候你可以这样设置:ipvsadm
--set 5 10
300.设置tcp连接只保持5秒中.如果现在ActiveConn很高你会发现这个数值会很快降下来,直到降到和你用nginx的status看当前连
接数的时候差不多.你可以继续增加或者减小5这个数值,直到真实机的status连接数和lvs里的ActiveConn一致.
http://www.linuxidc.com/Linux/2015-05/117477.htm
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: