Linux系统无法ping通,导致需要重启系统
2011-05-03 10:27
906 查看
Linux系统无法ping通,导致需要重启系统,查看messages没有任何记录,后来查看mcelog发现了是硬件的问题,这个文件是记录硬件报错的日志,搜索了一下MCE日志功能。
Memory errors or Error Correction Code (ECC) problems
Inadequate cooling / processor over-heating
System bus errors
Cache errors in the processor or hardware
Some systems do this for you on a regular basis and send the output to the file /var/log/mcelog . So if you see the "Machine Check Events logged" message but mcelog does not return any data, please look /var/log/mcelog.
The output received may not always be easy to understand. If you have any questions about the decoded error message please create a support ticket and we will help analyize the problem.
Paste or type the error message into a file, and then run it through the mcelog for example:
Use the --k8 option if you are using an AMD Opteron or Athlon 64 processor, or substitute it for --p4 for a Pentium 4 or Xeon. Here is the output from the previous mce error:
This indicates that an uncorrected ECC error occured. This indicates that one of your memory modules has failed. For further analysis and please submit a support ticket with the complete MCE error message and the output of mcelog.
What are Machine Check Exceptions (or MCE)?
A machine check exception is an error dedected by your system's processor. There are 2 major types of MCE errors, a notice or warning error, and a fatal execption. The warning will be logged by a "Machine Check Event logged" notice in your system logs, and can be later viewed via some Linux utilities. A fatal MCE will cause the machine to stop responding and the details of the MCE will be printed out to the system's console.What causes MCE errors?
There most common reason for MCE events to occur are:Memory errors or Error Correction Code (ECC) problems
Inadequate cooling / processor over-heating
System bus errors
Cache errors in the processor or hardware
How do I find out what the errors mean?
If you see the message "Machine Check Events logged" on your console or in your system logs, then you can run the mcelog command to read the message from the kernel. Once you run mcelog you will not be able to re-run it to see the error, so it's best to output the text to a file so you can further analyize it. For example:root@localhost:/root> /usr/sbin/mcelog > mcelog.out
Some systems do this for you on a regular basis and send the output to the file /var/log/mcelog . So if you see the "Machine Check Events logged" message but mcelog does not return any data, please look /var/log/mcelog.
The output received may not always be easy to understand. If you have any questions about the decoded error message please create a support ticket and we will help analyize the problem.
What if I get a fatal machine check event that causes my machine to stop responding?
These errors are almost always caused by faulty hardware. Please capture the mce message and you can later run it through the mcelog program once the machine is back up. Here's an example of a message you might see:CPU 1: Machine Check Exception: 4 Bank 4: f600200137080813 TSC b0ce27165dd3 ADDR 180ee1b40
Paste or type the error message into a file, and then run it through the mcelog for example:
root@localhost:/root> /usr/sbin/mcelog --k8 --ascii < myerror
Use the --k8 option if you are using an AMD Opteron or Athlon 64 processor, or substitute it for --p4 for a Pentium 4 or Xeon. Here is the output from the previous mce error:
HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 4 northbridge TSC b0ce27165dd3 Northbridge Chipkill ECC error Chipkill ECC syndrome = 3700 bit32 = err cpu0 bit45 = uncorrected ecc error bit57 = processor context corrupt bit61 = error uncorrected bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out generic read mem transaction memory access, level generic' STATUS f600200137080813 MCGSTATUS 4
This indicates that an uncorrected ECC error occured. This indicates that one of your memory modules has failed. For further analysis and please submit a support ticket with the complete MCE error message and the output of mcelog.
相关文章推荐
- Linux 修改inittab导致系统无法启动修复
- Linux fstab修改不当导致开机fsck失败而主机无法重启
- 系统突然断电重启导致rac节点无法启动,crs-4000错误
- Linux文件系统破坏,导致系统无法启动解决办法
- Linux下修改配置文件导致系统无法启动问题解决办法
- u盘安装linux,将grub安装到了u盘,导致没U盘系统无法引导启动的解决方法
- Linux:安装双系统(Win7+Ubuntu)后,Ubuntu正常,Win7无法启动,有需要的朋友可以参考下
- linux中/etc/fstab文件删除或修改了,导致系统无法启动
- linux系统磁盘满了,导致系统无法启动,追踪文件夹大小
- 【操作系统】linux--系统管理-fstab错误导致无法开机的解决办法
- linux中/etc/fstab文件删除或修改了,导致系统无法启动
- 遇到的问题----linux系统中的eth0网络不见了--重启不加载ifcfg-eth0的配置--需要重新激活
- 删除Linux分区导致系统无法启动的解决办法
- 遇到的问题----linux系统中的eth0网络不见了--重启不加载ifcfg-eth0的配置--需要重新激活
- Linux:记一次异常断电导致的系统无法正常启动(文件系统故障)
- 误删除linux导致双系统无法进入windows
- 一旦linux的grub出现问题,导致无法进入系统
- grub rescue 方法,解决重装windows系统导致的linux无法启动问题
- linux下用非root用户重启导致ssh无法连接的问题
- 【操作系统】linux--双系统下格式化了linux盘导致开机无法进入linux