您的位置:首页 > 其它

内存控制器错误信息[备忘]

2015-03-02 10:03 162 查看
参考日志错误信息:

[root@hh-yun-compute-130125 ~]# cat /var/log/messages | grep -i error
Mar  1 04:58:05 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  1 04:58:06 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x16113a9000 => socket=1, Channel=2(mask=4), rank=0
Mar  1 10:27:08 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  1 10:27:09 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x15e1c49000 => socket=1, Channel=2(mask=4), rank=0
Mar  1 13:52:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  1 13:52:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x160e949000 => socket=1, Channel=2(mask=4), rank=0
Mar  2 04:16:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  2 04:16:56 hh-yun-compute-130125 kernel: sbridge: HANDLING MCE MEMORY ERROR
Mar  2 04:16:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x1613a61000 => socket=1, Channel=2(mask=4), rank=0
Mar  2 04:16:57 hh-yun-compute-130125 kernel: EDAC MC1: CE row 2, channel 0, label "CPU_SrcID#1_Channel#2_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x1613a79000 => socket=1, Channel=2(mask=4), rank=0


参考信息2:

[root@hh-yun-compute-130125 ~]# cat /sys/devices/system/edac/mc/mc?/ce*count
0
0
8
0
[root@hh-yun-compute-130125 ~]# cat /sys/devices/system/edac/mc/mc1/ce_count
8


模块信息

[root@hh-yun-compute-130125 ~]# modinfo sb_edac
filename:       /lib/modules/2.6.32-504.3.3.el6.x86_64/kernel/drivers/edac/sb_edac.ko
description:    MC Driver for Intel Sandy Bridge and Ivy Bridge memory controllers -  Ver: 1.1.0
author:         Red Hat Inc. (http://www.redhat.com)
author:         Mauro Carvalho Chehab <mchehab@redhat.com>
license:        GPL
srcversion:     01CFEEBE911D55B6FE660BE
alias:          pci:v00008086d00002FA0sv*sd*bc*sc*i*
alias:          pci:v00008086d00000EA8sv*sd*bc*sc*i*
alias:          pci:v00008086d00003CA8sv*sd*bc*sc*i*
depends:        edac_core
vermagic:       2.6.32-504.3.3.el6.x86_64 SMP mod_unload modversions
parm:           edac_op_state:EDAC Error Reporting state: 0=Poll,1=NMI (int)

[root@hh-yun-compute-130125 ~]# modinfo edac_core
filename:       /lib/modules/2.6.32-504.3.3.el6.x86_64/kernel/drivers/edac/edac_core.ko
description:    Core library routines for EDAC reporting
author:         Doug Thompson www.softwarebitmaker.com, et al
license:        GPL
srcversion:     C21E296292A2174839A086C
depends:
vermagic:       2.6.32-504.3.3.el6.x86_64 SMP mod_unload modversions
parm:           check_pci_errors:Check for PCI bus parity errors: 0=off 1=on (int)
parm:           edac_pci_panic_on_pe:Panic on PCI Bus Parity error: 0=off 1=on (int)
parm:           edac_mc_panic_on_ue:Panic on uncorrected error: 0=off 1=on (int)
parm:           edac_mc_log_ue:Log uncorrectable error to console: 0=off 1=on (int)
parm:           edac_mc_log_ce:Log correctable error to console: 0=off 1=on (int)
parm:           edac_mc_poll_msec:Polling period in milliseconds


官方解释:

Total Correctable Errors count attribute file:

'ce_count'

This attribute file displays the total count of correctable
errors that have occurred on this csrow. This
count is very important to examine. CEs provide early
indications that a DIMM is beginning to fail. This count
field should be monitored for non-zero values and report
such information to the system administrator.


启用 mcelog

[root@hh-yun-compute-130125 ~]# service  mcelogd restart
Stopping mcelog                                     [确定]
Starting mcelog daemon                              [确定]
[root@hh-yun-compute-130125 ~]# mcelog
mcelog: Family 6 Model 3e CPU: only decoding architectural errors


查询日志

[root@hh-yun-compute-130125 ~]# tail /var/log/mcelog
mcelog: failed to prefill DIMM database from DMI data
mcelog: mcelog server already running


相关评估

This is a harmless warning message. The DIMM database prefill relies on a specific non-standard format of the DIMMs in the DMI BIOS tables. If this format is not used by the BIOS, mcelog will only discover DIMMs as they get their first error (if the CPU reports DIMMs in machine check errors). Please understand for the most part, mcelog should be ignored.


因此最终决定忽略该信息


内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐