您的位置:首页 > 移动开发 > IOS开发

搭建nagios监控平台

2017-06-08 15:58 495 查看
Centos7,使用nagios-4.3.1.tar.gz,nagios-plugins-2.2.1 nrpe2.12.tar.gz源码包
安装nagios主程序:./configure --prefix=/usr/local/nagios  --with-group=nagios  --with-user=nagios 
&&  make all &&  make install  &&  make install-init   &&  make install-commandmode  &&  make install-config

安装nagios-plugins:./configure --prefix=/usr/local/nagios  

安装nrpe远程插件:./configure --prefix=/usr/local/nagios

创建nagios用户:useradd -s /sbin/nologin -G naigos nagios

修改nagios属主:chown -R nagios:nagios /usr/local/nagios

nagios的使用最好是通过web展示或者你也可以只使用它的告警功能,这里介绍下nagios的web部署

①:使用apache做web

首先搭建好apache+php的环境,在测试过php文件可以被正确解析之后,可以配置apache的httpd.conf文件加入Listen  800,我们打算将nagios应用部署在800端口上

httpd.conf 检查如下字段:

#添加监听端口
Listen 800

#修改用户
User nagios
Group nagios

#确保php模块已经开启
LoadModule php5_module  modules/libphp5.so

#确保cgi模块已经开启
LoadModule cgid_module modules/mod_cgid.so

#添加主页php支持
DirectoryIndex index.php

#添加类型映射
AddType application/x-httpd-php .php
AddType application/x-httpd-php-source .php5

#将nagios配置独立出来,使httpd.conf整洁些
Include conf/nagios.conf  #conf/nagios.conf的位置取决于apache主目录

然后新建并编辑nagios.conf

#nagios setting
#监听800端口请求的虚拟主机配置
<VirtualHost *:800>
ServerName localhost:800
DocumentRoot "/usr/local/nagios/share"

ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
<Directory "/usr/local/nagios/sbin">
AuthType Basic
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios server"
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>
<Directory "/usr/local/nagios/share">
AuthType Basic
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios server"
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>
</VirtualHost>

这一段其实在安装nagios后make webconf会自动加入/etc/httpd.conf,但是httpd.conf找不到就无法添加,就要手动编辑

然后使用apache的htpasswd工具创建密码用于认证:htpasswd -c /usr/local/nagios/etc/htpasswd[存放路径] 用户[nagios]

然后启动apache并且访问localhost:800 看是否可以加载nagios页面

如果可以正常加载就可以进入nagios的编辑

②:使用nginx做web

要使用nginx作为nagios的web展示页面需要通过一些perl-fcgi的插件

需要FCGI-0.74,FCGI-ProcManager-0.28,IO-1.25和IO-All-0.86

这几个都是perl Makefile.PL && make && make install  就可以了

下面这段是perl-fcgi.pl,可以在网上下载到:通过这段pl脚本执行

/usr/bin/perl  /usr/local/nginx/conf/perl-fcgi.pl  -l  /usr/local/nginx/logs/perl-fcgi.log  -pid  /usr/local/nginx/logs/perl-fcgi.pid  -S /usr/local/nginx/logs/perl-fcgi.sock  

意思是:使用/usr/bin/perl执行/usr/local/nginx/conf/perl-fcgi.pl脚本,并对应的在/usr/local/nginx/logs/生成日志和pid文件和sock文件

#!/usr/bin/perl
#
#	author		Daniel Dominik Rudnicki
#	thanks to:	Piotr Romanczuk
#	email		daniel@sardzent.org
#	version		0.4.3
#	webpage	 http://www.nginx.eu/ #
#	BASED @ http://wiki.codemongers.com/NginxSimpleCGI #
#
# use strict;
use FCGI;
use Getopt::Long;
use IO::All;
use Socket;

sub init {
GetOptions(	"h"	=> \$help,
"verbose!"=>\$verbose,
"pid=s"	=> \$filepid,
"l=s" => \$logfile,
"S:s"   => \$unixsocket,
"P:i"   => \$unixport) or usage();
usage() if $help;

print "	Starting Nginx-fcgi\n" if $verbose;
print "	Running with $> UID" if $verbose;
print "	Perl $]" if $verbose;

#	if ( $> == "0" ) {
#		print "\n\tERROR\tRunning as a root!\n";
#		print "\tSuggested not to do so !!!\n\n";
#		exit 1;
#	}

if ( ! $logfile ) {
print "\n\tERROR\t log file must declared\n"
. "\tuse $0 with option -l filename\n\n";
exit 1;
}
print "	Using log file $logfile\n" if $verbose;
"\n\n" >> io($logfile);
addlog($logfile, "Starting Nginx-cfgi");
addlog($logfile, "Running with $> UID");
addlog($logfile, "Perl $]");
addlog($logfile, "Testing socket options");

if ( ($unixsocket && $unixport) || (!($unixsocket) && !($unixport)) ) {
print "\n\tERROR\tOnly one option can be used!\n";
print "\tSuggested (beacuse of speed) is usage UNIX socket -S \n\n";
exit 1;
}

if ($unixsocket) {
print "	Daemon listening at UNIX socket $unixsocket\n" if $versbose;
addlog($logfile, "Deamon listening at UNIX socket $unixsocket");
} else {
print "	Daemon listening at TCP/IP socket *:$unixport\n" if $verbose;
#
addlog($logfile, "Daemon listening at TCP/IP socket *:$unixport");
}

if ( -e $filepid ) {
print "\n\tERROR\t PID file $filepid already exists\n\n";
addlog($logfile, "Can not use PID file $filepid, already exists.");
exit 1;
}

if ( $unixsocket ) {
print "	Creating UNIX socket\n" if $verbose;
$socket = FCGI::OpenSocket( $unixsocket, 10 );
if ( !$socket) {
print "	Couldn't create socket\n";
addlog($logfile, "Couldn't create socket");
exit 1;
}
print "	Using UNIX socket $unixsocket\n" if $verbose;
} else {
print "	Creating TCP/IP socket\n" if $verbose;
$portnumber = ":".$unixport;
$socket = FCGI::OpenSocket( $unixport, 10 );
if ( !$socket ) {
print "	Couldn't create socket\n";
addlog($logfile, "Couldn't create socket");
exit 1;
}
print " Using port $unixport\n" if $verbose;
}
addlog($logfile, "Socket created");

if ( ! $filepid ) {
print "\n\tERROR\t PID file must declared\n"
. "\tuse $0 with option -pid filename\n\n";
exit 1;
}
print "	Using PID file $filepid\n" if $verbose;
addlog($logfile, "Using PID file $filepid");

my $pidnumber = $$;
$pidnumber >
4000
io($filepid);
print " PID number $$\n" if $verbose;
addlog($logfile, "PID number $pidnumber");

}

sub addzero {
my ($date) = shift;
if ($date < 10) {
return "0$date";
}
return $date;
}

sub logformat {
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$iddst) = localtime(time);
my $datestring;
$year += 1900;
$mon++;
$mon  = addzero($mon);
$mday = addzero($mday);
$min  = addzero($min);
$datestring = "$year-$mon-$mday $hour:$min";
return($datestring);
}

sub addlog {
my ($log_file, $log_message) = @_;
my $curr_time = logformat();
my $write_message = "[$curr_time]   $log_message";
$write_message >> io($log_file);
"\n" >> io($log_file);
}

sub printerror {
my $message = @_;
print "\n	Nginx FastCGI\tERROR\n"
. "\t $message\n\n";
exit 1;
}

sub usage {
print "\n	Nginx FastCGI \n"
. "\n\tusage: $0 [-h] -S string -P int\n"
. "\n\t-h\t\t: this (help) message"
. "\n\t-S path\t\t: path for UNIX socket"
. "\n\t-P port\t\t: port number"
. "\n\t-p file\t\t: path for pid file"
. "\n\t-l file\t\t: path for logfile"
. "\n\n\texample: $0 -S /var/run/nginx-perl_cgi.sock -l /var/log/nginx/nginx-cfgi.log -pid /var/run/nginx-fcgi.pid\n\n";
exit 1;
}

init;
#
END() { } BEGIN() { }
*CORE::GLOBAL::exit = sub { die "fakeexit\nrc=".shift()."\n"; }; eval q{exit};
if ($@) {
exit unless $@ =~ /^fakeexit/;
} ;

# fork part
my $pid = fork();

if( $pid == 0 ) {
&main;
exit 0;
}

print " Forking worker process with PID $pid\n" if $verbose;
addlog($logfile, "Forking worker process with PID $pid");
print " Update PID file $filepid\n" if $verbose;
addlog($logfile, "Update PID file $filepid");
$pid > io($filepid);
print "	Worker process running.\n" if $verbose;
addlog ($logfile, "Parent process $$ is exiting");
exit 0;

sub main {
$request = FCGI::Request( \*STDIN, \*STDOUT, \*STDERR, \%req_params, $socket );
if ($request) { request_loop()};
FCGI::CloseSocket( $socket );
}

sub request_loop {
while( $request->Accept() >= 0 ) {
# processing any STDIN input from WebServer (for CGI-POST actions)
$stdin_passthrough = '';
$req_len = 0 + $req_params{'CONTENT_LENGTH'};
if (($req_params{'REQUEST_METHOD'} eq 'POST') && ($req_len != 0) ){
while ($req_len) {
$stdin_passthrough .= getc(STDIN);
$req_len--;
}
}

# running the cgi app
if ( (-x $req_params{SCRIPT_FILENAME}) &&
(-s $req_params{SCRIPT_FILENAME}) &&
(-r $req_params{SCRIPT_FILENAME})
){
foreach $key ( keys %req_params){
$ENV{$key} = $req_params{$key};
}
if ( $verbose ) {
addlog($logfile, "running $req_params{SCRIPT_FILENAME}");
}
# http://perldoc.perl.org/perlipc.html#Safe-Pipe-Opens #
open $cgi_app, '-|', $req_params{SCRIPT_FILENAME}, $stdin_passthrough or print("Content-type: text/plain\r\n\r\n"); print "Error: CGI app returned no output - Executing $req_params{SCRIPT_FILENAME} failed !\n"; # addlog($logfile, "Error: CGI app returned no output - Executing $req_params{SCRIPT_FILENAME} failed !");

if ($cgi_app) {
print <$cgi_app>;
close $cgi_app;
}
} else {
print("Content-type: text/plain\r\n\r\n");
print "Error: No such CGI app - $req_params{SCRIPT_FILENAME} may not exist or is not executable by this process.\n";
addlog($logfile, "Error: No such CGI app - $req_params{SCRIPT_FILENAME} may not exist or is not executable by this process.");
}
}
}
生成之后,就要修改nginx的nginx.conf来使用nagios了

#修改使用者为nagios,不然执行有异常
user nagios;

server
{
listen        80;
server_name   localhost;
root   /usr/local/nagios/share;
index  index.html index.php;

auth_basic    "Nagios server";
auth_basic_user_file /usr/local/nagios/etc/htpasswd;
#认证这一块的密钥在apache部分有说,通过同样办法生成

location ~ .*\.(php|php5)?$
{
fastcgi_pass   127.0.0.1:9000;
fastcgi_index  index.php;
fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;
include  /usr/local/nginx/conf/fastcgi_params;
}

location ~ .*\.(cgi|pl)?$
{
root   /usr/local/nagios/sbin;
rewrite  ^/nagios/cgi-bin/(.*)\.cgi /$1.cgi break;
fastcgi_pass   unix:/usr/local/nginx/logs/perl-fcgi.sock;
fastcgi_index  index.cgi;
fastcgi_param  SCRIPT_FILENAME  /usr/local/nagios/sbin$fastcgi_script_name;
fastcgi_param  REMOTE_USER  $remote_user;
include  /usr/local/nginx/conf/fastcgi_params;
}

location /nagios
{
alias  /usr/local/nagios/share;
}

location /cgi-bin/images
{
alias  /usr/local/nagios/share/images;
}

location /cgi-bin/stylesheets
{
alias  /usr/local/nagios/share/stylesheets;
}

location /cgi-bin
{
alias  /usr/local/nagios/sbin;
}

}
然后nginx -t 检查语法,没有异常就可以启动了

编辑nagios配置文件:

    nagios的配置文件相当复杂,各个模板文件之间相互套用,下面只讲需要用到的几个

nagios.cfg:主配置文件;

#配置其他文件的读取路径
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg

#配置读取的目录
cfg_dir=/usr/local/nagios/etc/servers


cgi.cfg:控制cgi访问的配置文件;

#开启验证模式,不开启=0
use_authentication=1

#验证用户默认为nagiosadmin,添加通过htpasswd建立的验证用户nagios,用","隔开
authorized_for_system_information=nagiosadmin,nagios
authorized_for_configuration_information=nagiosadmin,nagios
authorized_for_system_commands=nagiosadmin,nagios
authorized_for_all_services=nagiosadmin,nagios
authorized_for_all_hosts=nagiosadmin,nagios
authorized_for_all_service_commands=nagiosadmin,nagios
authorized_for_all_host_commands=nagiosadmin,nagios


resource.cfg:变量定义文件;

objects/commands.cfg:命令定义配置文件;

objects/contacts.cfg:联系人配置文件;

define contact{
contact_name         nagios
use                  generic-contact
alias                Nagios Admin
email                admin@youngyuyeah.com.cn;    ;接收的邮箱地址
}
define contactgroup{
contactgroup_name    admins
alias                Nagios Administrators
members              nagios
}


objects/templates.cfg:模板文件;

#模板文件的介绍
define contact{
name                            generic-contact    ; 联系人名称
service_notification_period     24x7               ; 当服务出现异常时,使用24x7的时间定义
host_notification_period        24x7               ; 当主机出现异常时,使用24x7的时间定义
service_notification_options    w,u,c,r            ; 这个定义的是“通知可以被发出的情况”。w即warn,表示警告状态,u即unknown,表示不明状态;
; c即criticle,表示紧急状态,r即recover,表示恢复状态;
; 也就是在服务出现警告状态、未知状态、紧急状态和重新恢复状态时都发送通知给使用者。
host_notification_options       d,u,r                   ; 定义主机在什么状态下需要发送通知给使用者,d即down,表示宕机状态;
; u即unreachable,表示不可到达状态,r即recovery,表示重新恢复状态。
service_notification_commands   notify-service-by-email ; 服务故障时,发送邮件;
host_notification_commands      notify-host-by-email    ; 主机故障时,发送邮件;
register                        0                    ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
define host{
name                            generic-host    ; 主机名称,这里的主机名,并不是直接对应到真正机器的主机名;
; 乃是对应到在主机配置文件里所设定的主机名。
notifications_enabled           1               ; Host notifications are enabled
event_handler_enabled           1               ; Host event handler is enabled
flap_detection_enabled          1               ; Flap detection is enabled
failure_prediction_enabled      1               ; Failure prediction is enabled
process_perf_data               1               ; 其值可以为0或1,其作用为是否启用Nagios的数据输出功能;
; 如果将此项赋值为1,那么Nagios就会将收集的数据写入某个文件中,以备提取。
retain_status_information       1               ; Retain status information across program restarts
retain_nonstatus_information    1               ; Retain non-status information across program restarts
notification_period             24x7            ; 指定“发送通知”的时间段,也就是可以在什么时候发送通知给使用者。
register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
define host{
name                            linux-server    ; 主机名称
use                             generic-host    ; use表示引用,也就是将主机generic-host的所有属性引用到linux-server中来;
; 在nagios配置中,很多情况下会用到引用。
check_period                    24x7            ; 这里的check_period告诉nagios检查主机的时间段
check_interval                  5               ; nagios对主机的检查时间间隔,这里是5分钟。
retry_interval                  1               ; 重试检查时间间隔,单位是分钟。
max_check_attempts              10              ; nagios对主机的最大检查次数,也就是nagios在检查发现某主机异常时,并不马上判断为异常状况;
; 而是多试几次,因为有可能只是一时网络太拥挤,或是一些其他原因,让主机受到了一点影响;
; 这里的10就是最多试10次的意思。
check_command                   check-host-alive ; 指定检查主机状态的命令,其中“check-host-alive”在commands.cfg文件中定义。
notification_period             24x7            ; 主机故障时,发送通知的时间
notification_interval           10              ; 在主机出现异常后,故障一直没有解决,nagios再次对使用者发出通知的时间。单位是分钟;
; 如果你觉得,所有的事件只需要一次通知就够了,可以把这里的选项设为0
notification_options            d,u,r           ; 定义主机在什么状态下可以发送通知给使用者,d即down,表示宕机状态;
; u即unreachable,表示不可到达状态;r即recovery,表示重新恢复状态。
contact_groups                  nagios          ; 指定联系人组,这个“admins”在contacts.cfg文件中定义。
register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
define service{
name                            generic-service         ; 定义一个服务名称
active_checks_enabled           1                       ; Active service checks are enabled
passive_checks_enabled          1                       ; Passive service checks are enabled/accepted
parallelize_check               1                       ; Active service checks should be parallelized;
; (disabling this can lead to major performance problems)
obsess_over_service             1                       ; We should obsess over this service (if necessary)
check_freshness                 0                       ; Default is to NOT check service 'freshness'
notifications_enabled           1                       ; Service notifications are enabled
event_handler_enabled           1                       ; Service event handler is enabled
flap_detection_enabled          1                       ; Flap detection is enabled
failure_prediction_enabled      1                       ; Failure prediction is enabled
process_perf_data               1                       ; Process performance data
retain_status_information       1                       ; Retain status information across program restarts
retain_nonstatus_information    1                       ; Retain non-status information across program restarts
is_volatile                     0                       ; The service is not volatile
check_period                    24x7             ; 这里的check_period告诉nagios检查服务的时间段。
max_check_attempts              3                ; nagios对服务的最大检查次数。
normal_check_interval           5                ; 此选项是用来设置服务检查时间间隔,也就是说,nagios这一次检查和下一次检查之间所隔的时间;
; 这里是5分钟。
retry_check_interval            2                ; 重试检查时间间隔,单位是分钟。
contact_groups                  nagios           ; 指定联系人组
notification_options            w,u,c,r          ; 这个定义的是“通知可以被发出的情况”。w即warn,表示警告状态;
; u即unknown,表示不明状态;c即criticle,表示紧急状态,r即recover,表示恢复状态;
notification_interval           10               ; Re-notify about service problems every hour
notification_period             24x7             ; 指定“发送通知”的时间段,也就是可以在什么时候发送通知给使用者。
register                        0                ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
define service{
name                            local-service
use                             generic-service
max_check_attempts              4
normal_check_interval           5
retry_check_interval            1
register                        0


objects/timeperiods.cfg:时间段配置文件;

<
b749
p>
#监控基本都是24*7,所以只保留这个就好
define timeperiod{
timeperiod_name 24x7
alias           24hoursx7day
sunday          00:00-24:00
monday          00:00-24:00
tuesday         00:00-24:00
wednesday       00:00-24:00
thursday        00:00-24:00
friday          00:00-24:00
saturday        00:00-24:00
}


然而我们实际中要检测的机器数量很巨大,所以我们最好分组,之前修改nagios.cfg让其读取servers目录,在servers目录下编辑几个实例[下面给出其中一个实例的部分]

define host{
use                     linux-server        ;这里使用了templates里面linux-server的定义
host_name               server1             ;机器的主机名,跟实际相同即可
address                 192.168.100.1       ;机器的地址
}

define service{
use                             linux-service         ; 这里使用了templates里面linux-service的定义
host_name                       server1              ; 机器的主机名
service_description             PING                  ; 检查的项目,在commands.cfg里定义了比较多的部分,调用该名称
check_command           check_ping!100.0,20%!500.0,60%    ; 检查项目的预警阈值和警告阀值
}

define service{
use                             linux-service
host_name                       server1
service_description             Root Partition
check_command           check_nrpe!check_disk!20%!10%!/   ;通过check_nrpe去检测远端机器的check_disk,主要是远端机器的大部分需要加check_nrpe
}
#后面也以同样的方式定义多个服务检测项目即可


server的目录结构应该如此
├── server1.cfg

├── server2.cfg

├── server3.cfg

└── group.cfg

#通过一个group文件来编排几个server文件为一个组
define  hostgroup{
hostgroup_name  Server
alias           Linux Server
members         server1,server2,server3
}
但是其实要检测远程的机器是需要通过nrpe这个小插件来完成的,所以在远程机器上都需要部署nagios-plugins和nrpe
nrpe主要是用于传输检测到的数据,而nagios-plugins就是检测服务的命令集

pid_file=/usr/local/nagios/var/nrpe.pid
#根据实际安装路径改变
server_port=5666
#传输数据的端口
allowed_hosts=127.0.0.1,192.168.100.10
#添加允许的机器,这里的192.168.100.10是主控端
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10%
command[check_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 300
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
command[check_mem]=/usr/local/nagios/libexec/check_mem -w 90% -c 95%
#配合nagios-plugins制定一些自己想要的检查项目 -w 为warn,-c为criticle

然后/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -f启动服务;添加自启动systemctl enable nrpe.service

可以使用主控端测试一下nrpe的连通性 /usr/local/nagios/libexec/check_nrpe -H  远程机器的ip,如果返回nrpe版本即正确

最后使用/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg如果没有报错或警告就可以启动了
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  nagios 配置