主板故障导致服务器不定时频繁重启故障解决过程全记录
2014-11-15 11:19
253 查看
服务器:HP DL385 G7操作系统:suse10 sp3数据库:oracle 11g
R2集群软件:VCS 双机主备环境:两台服务器使用VCS软件做的oracle主备切换数据库故障现象:1.两台数据库主机不定期频繁重启,每次重启时在操作系统message日志中均没有任何记录;2.系统启动时,message 日志出现与硬件相关的错误信息 message 日志信息:-------------------------------------------------------------------------------------------------------------Oct 27 17:51:01
linux10 /usr/sbin/cron[5968]: (CRON) STARTUP (V5.0)Oct 27 17:51:02
linux10 sshd[6047]: Server listening on 0.0.0.0 port 22.Oct 27
17:51:02 linux10 rcpowersaved: CPU frequency scaling is not supported by your
processor.Oct 27
17:51:02 linux10 rcpowersaved: enter 'CPUFREQ_ENABLED=no' in
/etc/powersave/cpufreq to avoid this warning.Oct 27
17:51:02 linux10 rcpowersaved: Cannot load cpufreq governors - No cpufreq
driver availableOct 27 17:51:03
linux10 rcpowersaved: s2ram does not know your machine. See 's2ram -i' for
details. (127)Oct 27 17:51:03
linux10 rcpowersaved: Use SUSPEND2RAM_FORCE=yes to override this detection.Oct 27 17:51:03
linux10 modprobe: FATAL: Error running install command for binfmt_miscOct 27 17:51:06
linux10 kernel: klogd 1.4.1, log source = /proc/kmsg started.Oct 27 17:51:06
linux10 kernel: Floppy drive(s): fd0 is 1.44MOct 27 17:51:06
linux10 syslog-ng[5762]: Changing permissions on special file /dev/xconsoleOct 27 17:51:06
linux10 syslog-ng[5762]: Changing permissions on special file /dev/tty10Oct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-10 - disabling barriersOct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-11 - disabling barriersOct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-12 - disabling barriersOct 27 17:51:06
linux10 kernel: AppArmor: AppArmor initializedOct 27 17:51:06
linux10 kernel: audit(1414403451.182:2):
info="AppArmor initialized" pid=4403Oct 27 17:51:06
linux10 kernel: floppy0: no floppy controllers foundOct 27 17:51:06
linux10 kernel: ACPI: Power Button (FF) [PWRF]Oct 27 17:51:06
linux10 kernel: rdac: device handler unregisteredOct 27 17:51:06
linux10 kernel: No dock devices found.Oct 27 17:51:06
linux10 kernel: bnx2: eth0: using MSIOct 27 17:51:06
linux10 kernel: bnx2: eth1: using MSIOct 27 17:51:06
linux10 kernel: Ethernet Channel Bonding Driver: v3.2.5 (March 21, 2008)Oct 27 17:51:06
linux10 kernel: bonding: Warning: either miimon or arp_interval and
arp_ip_target module parameters must be specified, otherwise
bonding will not detect link failures! see bonding.txt for details.Oct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-8 - disabling barriersOct 27 17:51:06
linux10 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplexOct 27 17:51:06
linux10 kernel: bonding: bond0: setting mode to active-backup (1).Oct 27 17:51:06
linux10 kernel: bonding: bond0: Setting MII monitoring interval to 100.Oct 27 17:51:06
linux10 kernel: bonding: bond0: Setting use_carrier to 0.Oct 27 17:51:06
linux10 kernel: bnx2: eth0: using MSIOct 27 17:51:06
linux10 kernel: bonding: bond0: enslaving eth0 as a backup interface with a
down link.Oct 27 17:51:06
linux10 kernel: bnx2: eth1: using MSIOct 27 17:51:06
linux10 kernel: bonding: bond0: enslaving eth1 as a backup interface with a
down link.Oct 27 17:51:06
linux10 kernel: audit(1414403461.814:3): audit_pid=5906 old=0 by
auid=4294967295Oct 27 17:51:06
linux10 kernel: llt: module not supported by Novell, setting U taint flag.Oct 27 17:51:06
linux10 kernel: LLT INFO V-14-1-10009 LLT 5.1.100.000-SP1GA Protocol availableOct 27 17:51:06
linux10 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplexOct 27 17:51:06
linux10 kernel: bonding: bond0: link status definitely up for interface eth1.Oct 27 17:51:06
linux10 kernel: bonding: bond0: making interface eth1 the new active one.Oct 27 17:51:06
linux10 kernel: bonding: bond0: first active interface up!Oct 27 17:51:06
linux10 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplexOct 27 17:51:06
linux10 kernel: powernow-k8: Found 4 AMD Opteron(tm) Processor 6134 processors
(16 cpu cores) (version 2.20.00)Oct 27 17:51:06
linux10 kernel: powernow-k8: MP systems not supported by PSB BIOS structure…………Oct 28 18:10:01
linux10 /usr/sbin/cron[17099]: (root) CMD (/usr/sbin/ntpdate 172.29.141.162)Oct 28 18:11:14
linux10 zmd: ShutdownManager (WARN): Preparing to sleep...Oct 28 18:11:15
linux10 zmd: ShutdownManager (WARN): Going to sleep, waking up at 10/29/2014
17:41:08Oct 28 18:11:49
linux10 syslog-ng[5762]: Error connecting to remote host
AF_INET(172.29.141.162:5140), reattempting in 60 seconds…………-----------------------------------------------------------------------在上面的日志中出现两个问题分别为:一、zmd:
ShutdownManager (WARN): Preparing to sleep…二、Oct 27 17:51:02
linux10 rcpowersaved: CPU frequency scaling is not supported by your processor.Oct 27 17:51:02
linux10 rcpowersaved: enter 'CPUFREQ_ENABLED=no' in /etc/powersave/cpufreq to
avoid this warning.Oct 27 17:51:02
linux10 rcpowersaved: Cannot load cpufreq governors - No cpufreq driver
available…………Oct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-10 - disabling barriersOct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-11 - disabling barriersOct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-12 - disabling barriers 问题一 由ZMD服务器引起,Novell对ZMD服务的解释为:The zmd daemon
performs software management functions on the ZENworks managed device,
including updating, installing, and removing software, and performing basic
queries of the device's package management database. Typically, these
management tasks are initiated through the ZENworks Control Center or the rug,
zen-installer, zen-updater, or zen-remover utilities, which means you should
not need to interact directly with zmd.ZMD服务主要负责用户软件的更新、安装管理操作,在开机时自动启动,ZMD服务启动后,默认每六小时联网更新,更新时会占用80端口,因此经常会与tomcat 等服务器产生端口,因此在软件安装或更新完后,可以及时关闭此服务,#/etc/init.d/novell-zmd
statusChecking for ZENworks
Management Daemon:
running#/etc/init.d/novell-zmd
stopShutting down
ZENworks Management Daemon done注:关闭 此服务后,安装软件是比较麻烦,因此在需要时可以在此打开,改服务在更新时有可能会长时间锁定/etc/mtab,因此需要注意。 解决方法: 关闭novell-zmd服务后,此日志消失。有时我们为了提高开机速度,会将novell-zmd服务进行关闭chkconfig -delete
novell-zmd 问题二:单从日志信息上看cpu不支持变频的问题,由于在操作系统和VCS日志中均没有发现其他异常,因此怀疑是服务器硬件出了问题,去机房一看,服务器住面板有电流符号的故障灯显示橘红色,这时基本就能放松了,硬件肯定是不对了,于是收集硬件日志联系HP厂商,经确定是主板故障,更换主板后,服务器没有重启。
本文出自 “狂奔的蜗牛” 博客,谢绝转载!
R2集群软件:VCS 双机主备环境:两台服务器使用VCS软件做的oracle主备切换数据库故障现象:1.两台数据库主机不定期频繁重启,每次重启时在操作系统message日志中均没有任何记录;2.系统启动时,message 日志出现与硬件相关的错误信息 message 日志信息:-------------------------------------------------------------------------------------------------------------Oct 27 17:51:01
linux10 /usr/sbin/cron[5968]: (CRON) STARTUP (V5.0)Oct 27 17:51:02
linux10 sshd[6047]: Server listening on 0.0.0.0 port 22.Oct 27
17:51:02 linux10 rcpowersaved: CPU frequency scaling is not supported by your
processor.Oct 27
17:51:02 linux10 rcpowersaved: enter 'CPUFREQ_ENABLED=no' in
/etc/powersave/cpufreq to avoid this warning.Oct 27
17:51:02 linux10 rcpowersaved: Cannot load cpufreq governors - No cpufreq
driver availableOct 27 17:51:03
linux10 rcpowersaved: s2ram does not know your machine. See 's2ram -i' for
details. (127)Oct 27 17:51:03
linux10 rcpowersaved: Use SUSPEND2RAM_FORCE=yes to override this detection.Oct 27 17:51:03
linux10 modprobe: FATAL: Error running install command for binfmt_miscOct 27 17:51:06
linux10 kernel: klogd 1.4.1, log source = /proc/kmsg started.Oct 27 17:51:06
linux10 kernel: Floppy drive(s): fd0 is 1.44MOct 27 17:51:06
linux10 syslog-ng[5762]: Changing permissions on special file /dev/xconsoleOct 27 17:51:06
linux10 syslog-ng[5762]: Changing permissions on special file /dev/tty10Oct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-10 - disabling barriersOct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-11 - disabling barriersOct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-12 - disabling barriersOct 27 17:51:06
linux10 kernel: AppArmor: AppArmor initializedOct 27 17:51:06
linux10 kernel: audit(1414403451.182:2):
info="AppArmor initialized" pid=4403Oct 27 17:51:06
linux10 kernel: floppy0: no floppy controllers foundOct 27 17:51:06
linux10 kernel: ACPI: Power Button (FF) [PWRF]Oct 27 17:51:06
linux10 kernel: rdac: device handler unregisteredOct 27 17:51:06
linux10 kernel: No dock devices found.Oct 27 17:51:06
linux10 kernel: bnx2: eth0: using MSIOct 27 17:51:06
linux10 kernel: bnx2: eth1: using MSIOct 27 17:51:06
linux10 kernel: Ethernet Channel Bonding Driver: v3.2.5 (March 21, 2008)Oct 27 17:51:06
linux10 kernel: bonding: Warning: either miimon or arp_interval and
arp_ip_target module parameters must be specified, otherwise
bonding will not detect link failures! see bonding.txt for details.Oct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-8 - disabling barriersOct 27 17:51:06
linux10 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplexOct 27 17:51:06
linux10 kernel: bonding: bond0: setting mode to active-backup (1).Oct 27 17:51:06
linux10 kernel: bonding: bond0: Setting MII monitoring interval to 100.Oct 27 17:51:06
linux10 kernel: bonding: bond0: Setting use_carrier to 0.Oct 27 17:51:06
linux10 kernel: bnx2: eth0: using MSIOct 27 17:51:06
linux10 kernel: bonding: bond0: enslaving eth0 as a backup interface with a
down link.Oct 27 17:51:06
linux10 kernel: bnx2: eth1: using MSIOct 27 17:51:06
linux10 kernel: bonding: bond0: enslaving eth1 as a backup interface with a
down link.Oct 27 17:51:06
linux10 kernel: audit(1414403461.814:3): audit_pid=5906 old=0 by
auid=4294967295Oct 27 17:51:06
linux10 kernel: llt: module not supported by Novell, setting U taint flag.Oct 27 17:51:06
linux10 kernel: LLT INFO V-14-1-10009 LLT 5.1.100.000-SP1GA Protocol availableOct 27 17:51:06
linux10 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplexOct 27 17:51:06
linux10 kernel: bonding: bond0: link status definitely up for interface eth1.Oct 27 17:51:06
linux10 kernel: bonding: bond0: making interface eth1 the new active one.Oct 27 17:51:06
linux10 kernel: bonding: bond0: first active interface up!Oct 27 17:51:06
linux10 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplexOct 27 17:51:06
linux10 kernel: powernow-k8: Found 4 AMD Opteron(tm) Processor 6134 processors
(16 cpu cores) (version 2.20.00)Oct 27 17:51:06
linux10 kernel: powernow-k8: MP systems not supported by PSB BIOS structure…………Oct 28 18:10:01
linux10 /usr/sbin/cron[17099]: (root) CMD (/usr/sbin/ntpdate 172.29.141.162)Oct 28 18:11:14
linux10 zmd: ShutdownManager (WARN): Preparing to sleep...Oct 28 18:11:15
linux10 zmd: ShutdownManager (WARN): Going to sleep, waking up at 10/29/2014
17:41:08Oct 28 18:11:49
linux10 syslog-ng[5762]: Error connecting to remote host
AF_INET(172.29.141.162:5140), reattempting in 60 seconds…………-----------------------------------------------------------------------在上面的日志中出现两个问题分别为:一、zmd:
ShutdownManager (WARN): Preparing to sleep…二、Oct 27 17:51:02
linux10 rcpowersaved: CPU frequency scaling is not supported by your processor.Oct 27 17:51:02
linux10 rcpowersaved: enter 'CPUFREQ_ENABLED=no' in /etc/powersave/cpufreq to
avoid this warning.Oct 27 17:51:02
linux10 rcpowersaved: Cannot load cpufreq governors - No cpufreq driver
available…………Oct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-10 - disabling barriersOct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-11 - disabling barriersOct 27 17:51:06
linux10 kernel: JBD: barrier-based sync failed on dm-12 - disabling barriers 问题一 由ZMD服务器引起,Novell对ZMD服务的解释为:The zmd daemon
performs software management functions on the ZENworks managed device,
including updating, installing, and removing software, and performing basic
queries of the device's package management database. Typically, these
management tasks are initiated through the ZENworks Control Center or the rug,
zen-installer, zen-updater, or zen-remover utilities, which means you should
not need to interact directly with zmd.ZMD服务主要负责用户软件的更新、安装管理操作,在开机时自动启动,ZMD服务启动后,默认每六小时联网更新,更新时会占用80端口,因此经常会与tomcat 等服务器产生端口,因此在软件安装或更新完后,可以及时关闭此服务,#/etc/init.d/novell-zmd
statusChecking for ZENworks
Management Daemon:
running#/etc/init.d/novell-zmd
stopShutting down
ZENworks Management Daemon done注:关闭 此服务后,安装软件是比较麻烦,因此在需要时可以在此打开,改服务在更新时有可能会长时间锁定/etc/mtab,因此需要注意。 解决方法: 关闭novell-zmd服务后,此日志消失。有时我们为了提高开机速度,会将novell-zmd服务进行关闭chkconfig -delete
novell-zmd 问题二:单从日志信息上看cpu不支持变频的问题,由于在操作系统和VCS日志中均没有发现其他异常,因此怀疑是服务器硬件出了问题,去机房一看,服务器住面板有电流符号的故障灯显示橘红色,这时基本就能放松了,硬件肯定是不对了,于是收集硬件日志联系HP厂商,经确定是主板故障,更换主板后,服务器没有重启。
本文出自 “狂奔的蜗牛” 博客,谢绝转载!
相关文章推荐
- [置顶] linux系统tomcat应为被定时任务脚本监控自动部署,服务器重启导致同一个tomcat出现很多进程,kill -9杀死又出现等一系列问题解决。
- 真实记录疑似Linux病毒导致服务器 带宽跑满的解决过程
- 断电或者其他 故障导致ARCCatalog 切片过程中断 不重新切的解决办法
- 使用axis2构建webservice时客户端内存不断增长导致应用服务器频繁重启的解决方案
- 真实记录linux病毒导致带宽跑满的解决过程 推荐
- 电路板氧化后引起接触不良从而导致交换机软故障及解决过程
- 记录一次bug解决过程:else未补全导致数据泄露和代码优化
- 提交代码到svn时服务器重启导致svn无法更新问题解决办法
- 教你解决主板过热导致的频繁死机
- 因Window服务器自动更新并重启导致WebSphere服务停止服务故障一例
- [转]线上GC故障解决过程记录
- 记录由于一次强制断电导致的服务器无法启动的恢复过程
- IIS网站文件结构修改导致服务器重启的解决办法
- IIS网站文件结构修改导致服务器重启的解决办法
- 故障的机器修好后重启,狂拉主库binlog,导致网络问题的解决方法
- 每次重启Tomcat后,使用Hibernate修改数据库的记录都会导致以前记录被删除的解决办法。
- 双节点RAC各个节点主机频繁自动重启故障解决
- 一次线上GC故障解决过程记录
- ubuntu服务器启动过程中重启卡死的问题解决办法
- 一次线上GC故障解决过程记录