您的位置:首页 > 移动开发 > IOS开发

Nagios监控DELL服务器硬件

2013-06-15 09:09 316 查看
1、监控插件介绍

通过Nagios插件check_openmanage基于DELL Openmanage Server Administrator (OMSA),对DELL服务器硬件(物理磁盘、逻辑磁盘、电源、
电源、风扇转速、温度、CPU、内存、BIOS、Famware版本等硬件),此插件支持Win、Linux系统,

官方测试成功的DELL服务器型号:1750, 1800, 1850, 1950, 1955, 2600, 2650, 2800, 2850, 2900, 2950, 6650, 6950, 750, 850, M600, M610, R510, R610, R710, T710, R805, R815, R900, R910。

本文测试使用的R410,基本主流的DELL服务器都支持;

插件check_openmanage的依赖软件及安装过程这里省略。
关于插件详见
http://folk.uio.no/trondham/software/check_openmanage.html
监控方式本:本地插件或远程SNMP:




2、被监控机须安装DELL OMSA

2.1 安装前的设置

wget -q -O - http://linux.dell.com/repo/hardware/OMSA_6.5.3/bootstrap.cgi | bash


2.2 安装OMSA

yum install srvadmin-all -y


2.3 安装firmware-tools 用来管理BIOS和firmware版本更新

yum install dell_ft_install

yum install $(bootstrap_firmware)


2.4 更新BIOS、firmware版本

比较可用更新版本

update_firmware


安装任何可用更新

update_firmware --yes


更新后,需要重启服务器。

测试中R410报错如下(更新固件版本后才可以使用此插件获得数据):

[root@localhost check_openmanage-3.7.5]# ./check_openmanage

Storage Error! No controllers found

Problem running 'omreport chassis memory': Error: Memory object not found

Problem running 'omreport chassis fans': Error! No fan probes found on this system.

Problem running 'omreport chassis temps': Error! No temperature probes found on this system.

Problem running 'omreport chassis volts': Error! No voltage probes found on this system.

Chassis Service Tag is bogus: 'N/A'


OMSA详见: http://linux.dell.com/repo/hardware/OMSA_6.5.3/
3、插件命令使用示例:

3.1 温度

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only temp-d

System:PowerEdge R410 IIOMSA version:6.5.0

ServiceTag:3NGGB3XPlugin version:3.7.5

BIOS/date:1.9.0 10/21/2011Checking mode:SNMPv2c UDP/IPv4

-----------------------------------------------------------------------------

Chassis Components

=============================================================================

STATE|ID|MESSAGE TEXT

---------+------+------------------------------------------------------------

OK |0 | Temperature Probe 0 [System Board Ambient Temp] reads 28 C (min=8/3, max=42/47)


默认温度最大警告阈值42.0 °C 最大故障阈值 47.0 °C

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only temp

TEMPERATURES OK - 1 temperature probes checked

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only temp-p

TEMPERATURES OK - 1 temperature probes checked|T0_System_Board_Ambient=28C;42;47


自定义温度阈值

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only temp-w 0=20 -c 0=30

Temperature Probe 0 [System Board Ambient Temp] reads 28 C (custom max=20)


3.2 电压(19个部件)

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only voltage

VOLTAGE OK - 19 voltage probes checked


3.3 CPU

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only cpu

PROCESSORS OK - 2 processors checked

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only cpu -d

System:PowerEdge R410 IIOMSA version:6.5.0

ServiceTag:3NGGB3XPlugin version:3.7.5

BIOS/date:1.9.0 10/21/2011Checking mode:SNMPv2c UDP/IPv4

-----------------------------------------------------------------------------

Chassis Components

=============================================================================

STATE|ID|MESSAGE TEXT

---------+------+------------------------------------------------------------

OK |0 | Processor 0 [Intel Xeon E5620 2.40GHz] is Present

OK |1 | Processor 1 [Intel Xeon E5620 2.40GHz] is Present


3.4 风扇的转速

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only fans

FANS OK - 8 fan probes checked


关于风扇的debug输出

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only fans -d

System:PowerEdge R410 IIOMSA version: 6.5.0

ServiceTag:3NGGB3XPlugin version:3.7.5

BIOS/date:1.9.0 10/21/2011Checking mode:SNMPv2c UDP/IPv4

-----------------------------------------------------------------------------

Chassis Components

=============================================================================

STATE|ID|MESSAGE TEXT

---------+------+------------------------------------------------------------

OK |0 | Chassis fan 0 [System Board FAN MOD 1A RPM] reading: 6240 RPM

OK |1 | Chassis fan 1 [System Board FAN MOD 1B RPM] reading: 4320 RPM

OK |2 | Chassis fan 2 [System Board FAN MOD 2A RPM] reading: 6240 RPM

OK |3 | Chassis fan 3 [System Board FAN MOD 2B RPM] reading: 4320 RPM

OK |4 | Chassis fan 4 [System Board FAN MOD 3A RPM] reading: 6360 RPM

OK |5 | Chassis fan 5 [System Board FAN MOD 3B RPM] reading: 4440 RPM

OK |6 | Chassis fan 6 [System Board FAN MOD 4A RPM] reading: 7800 RPM

OK |7 | Chassis fan 7 [System Board FAN MOD 4B RPM] reading: 5400 RPM


参数-p 输出性能数据 用户pnp画图

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only fans -p

FANS OK - 8 fan probes checked|F0_System_Board_FAN_MOD_1A=6240rpm;0;0 F1_System_Board_FAN_MOD_1B=4320rpm;0;0 F2_System_Board_FAN_MOD_2A=6240rpm;0;0 F3_System_Board_FAN_MOD_2B=4320rpm;0;0 F4_System_Board_FAN_MOD_3A=6360rpm;0;0 F5_System_Board_FAN_MOD_3B=4440rpm;0;0 F6_System_Board_FAN_MOD_4A=7800rpm;0;0 F7_System_Board_FAN_MOD_4B=5400rpm;0;0


自定义风扇阈值

./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only fans-w 2000 -c 10000 -p

FANS OK - 8 fan probes checked|F0_System_Board_FAN_MOD_1A=6240rpm;0;0 F1_System_Board_FAN_MOD_1B=4320rpm;0;0 F2_System_Board_FAN_MOD_2A=6240rpm;0;0 F3_System_Board_FAN_MOD_2B=4320rpm;0;0 F4_System_Board_FAN_MOD_3A=6360rpm;0;0 F5_System_Board_FAN_MOD_3B=4440rpm;0;0 F6_System_Board_FAN_MOD_4A=7800rpm;0;0 F7_System_Board_FAN_MOD_4B=5400rpm;0;0


3.5 cmos电池

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only batteries

BATTERIES OK - 1 batteries checked

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only batteries -d

System:PowerEdge R410 IIOMSA version:6.5.0

ServiceTag:3NGGB3XPlugin version:3.7.5

BIOS/date:1.9.0 10/21/2011Checking mode:SNMPv2c UDP/IPv4

-----------------------------------------------------------------------------

Chassis Components

=============================================================================

STATE|ID|MESSAGE TEXT

---------+------+------------------------------------------------------------

OK |0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected


3.6 内存

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only memory

^[[AMEMORY OK - 6 memory modules, 49152 MB total memory


[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only memory -d

System:PowerEdge R410 IIOMSA version:6.5.0

ServiceTag:3NGGB3XPlugin version:3.7.5

BIOS/date:1.9.0 10/21/2011Checking mode:SNMPv2c UDP/IPv4

-----------------------------------------------------------------------------

Chassis Components

=============================================================================

STATE|ID|MESSAGE TEXT

---------+------+------------------------------------------------------------

OK |0 | Memory module 0 [DIMM_A1, 8192 MB] is Ok

OK |1 | Memory module 1 [DIMM_A2, 8192 MB] is Ok

OK |2 | Memory module 2 [DIMM_A3, 8192 MB] is Ok

OK |3 | Memory module 3 [DIMM_B1, 8192 MB] is Ok

OK |4 | Memory module 4 [DIMM_B2, 8192 MB] is Ok

OK |5 | Memory module 5 [DIMM_B3, 8192 MB] is Ok


3.7 电源监控(测试机R410不支持电源功耗监控)

[root@nagios90-248 libexec]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-only amperage

OK - no power monitoring probes found


3.8 监控所有硬件+DELL支持链接

./check_openmanage-H 10.0.188.115 -C vistata-I-b ctrl_fw=ALL\/ctrl_driver=ALL-a -p

OK - System: '<a target="_blank" href="http://support.dell.com/support/edocs/systems/per410/">PowerEdge R410 II</a>', SN: '<a target="_blank" href="http://www.dell.com/support/troubleshooting/Index?t=warranty&servicetag=3NGGB3X">3NGGB3X</a>', 48 GB ram (6 dimms), 1 logical drives, 4 physical drives|T0_System_Board_Ambient=28C;42;47 F0_System_Board_FAN_MOD_1A=6240rpm;0;0 F1_System_Board_FAN_MOD_1B=4320rpm;0;0 F2_System_Board_FAN_MOD_2A=6240rpm;0;0 F3_System_Board_FAN_MOD_2B=4320rpm;0;0 F4_System_Board_FAN_MOD_3A=6360rpm;0;0 F5_System_Board_FAN_MOD_3B=4440rpm;0;0 F6_System_Board_FAN_MOD_4A=7800rpm;0;0 F7_System_Board_FAN_MOD_4B=5400rpm;0;0


参数-o 定义输出信息行数

./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_fw=ALL\/ctrl_driver=ALL-o 4

OK - System: 'PowerEdge R410 II', SN: '3NGGB3X', 48 GB ram (6 dimms), 1 logical drives, 4 physical drives

----- BIOS='1.9.0 10/21/2011', iDRAC6='1.80'

----- Ctrl 0 [PERC H700 Adapter]: Fw='12.10.2-0004', Dr='00.00.04.17-RH1'

----- Encl 0:0:0 [Backplane]: Fw='1.07'

----- OpenManage Server Administrator (OMSA) version: '6.5.0'


3.9 模仿硬盘故障
正常状态

[root@localhost check_openmanage-3.7.5]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_driver=0-a

OK - System: 'PowerEdge R410 II', SN: '3NGGB3X', 48 GB ram (6 dimms), 1 logical drives, 4 physical drives


拔出硬盘模仿故障

[root@localhost check_openmanage-3.7.5]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_driver=0-a

Physical Disk 0:0:3 [Dell ST3600057SS, 600GB] on ctrl 0 needs attention: Removed

Logical Drive '/dev/sda' [RAID-5, 1675.12 GB] needs attention: Degraded

ESM log content: 1 critical, 0 non-critical, 1 ok


模仿更换硬盘后的状态

[root@localhost check_openmanage-3.7.5]# ./check_openmanage-H 10.0.188.115 -C vistata-b ctrl_driver=0-a

ESM log content: 1 critical, 0 non-critical, 2 ok

Physical Disk 0:0:3 [Dell ST3600057SS, 600GB] on ctrl 0 is Rebuilding

Logical Drive '/dev/sda' [RAID-5, 1675.12 GB] needs attention: Degraded


4、Nagios监控效果
4.1 硬件监控:特定监控项(温度、cpu、内存)及所有硬件(服务器型号、SN、内存、逻辑、物理磁盘)




4.2 物理硬盘故障




4.3 更换硬盘后的监控状态



5、简单介绍OMSA平台管理服务器

这是一个基于web程序管理DELL服务器,安装完毕OMSA自动启用TCP端口1311。

访问方式:https://服务器IP:1311 ,使用系统账户密码登录,界面类似于iDrac。

功能方面:可以配置BIOS、控制电源、硬件监控、存储管理配置、软件信息、iDrac管理、系统网络管理等,功能比较强,当然是基于操作系统和OMSA。

界面(使用很方便,不再赘述):





结束语:
服务监控固然重要,但他的载体服务器硬件监控一样不容忽视。这些功能的实现原理跟IPMI及其重要部件BMC有关。
详见:http://www.ibm.com/developerworks/cn/linux/l-ipmi/index.html

拓展:
DELL刀片监控插件:
http://folk.uio.no/trondham/software/check_dell_bladechassis.html
HP 刀片监控插件:
http://folk.uio.no/trondham/software/check_hp_bladechassis.html
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: