怎样分析crash dump(空指针)
2014-02-13 17:47
239 查看
以简单的系统提供的crash方法为例,echoc
> /proc/sysrq-trigger.
得到crash文件后,一般情况下,最想看到的是错误类型和发生错误时的registers和backtrace.可以通过命令log|
tail -200得到,意思是得到log文件的最后200行:
[2207.605719:0] pgd = ddf30000
[2207.608588:0] [00000000] *pgd=00000000
[2207.612339:0] Internal error: Oops: 805 [#1] PREEMPT SMP ARM
[2207.617975:0] Modules linked in:
[2207.621205:0] CPU: 0 Not tainted (3.4.0-gc37fe8c-dirty #651)
[2207.627196:0] PC is at sysrq_handle_crash+0x38/0x48
[2207.632059:0] LR is at _raw_spin_unlock_irqrestore+0x20/0x40
[2207.637699:0] pc : [<c01ed158>] lr : [<c0569e2c>] psr: 60000093
[2207.637704:0] sp : e7c61ec8 ip : e7c61e98 fp : e7c61ed4
[2207.649487:0] r10: e7c61f70 r9 : e7c60000 r8 : 00000000
[2207.654865:0] r7 : 60000013 r6 : 00000063 r5 : 00000004 r4 :c079fc74
[2207.661539:0] r3 : 00000000 r2 : 00000001 r1 : 20000093 r0 :00000001
[2207.668215:0] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
[2207.675582:0] Control: 10c53c7d Table: 9ff3004a DAC: 00000015
[2207.681477:0]
[2208.257865:0] Process sh (pid: 2309, stack limit =
0xe7c602f0)
[2208.351387:0] Backtrace:
[2208.354019:0] [<c01ed120>] (sysrq_handle_crash+0x0/0x48) from[<c01ed838>] (__handle_sysrq+0xac/0x158)
[2208.363293:0] [<c01ed78c>] (__handle_sysrq+0x0/0x158) from[<c01ed914>] (write_sysrq_trigger+0x30/0x38)
[2208.372647:0] r8:b76e780c r7:00000002 r6:e331c5c0 r5:c01ed8e4r4:00000002
[2208.379373:0] r3:e7c61f70
[2208.382188:0] [<c01ed8e4>] (write_sysrq_trigger+0x0/0x38) from[<c00f7948>] (proc_reg_write+0x88/0x9c)
[2208.391455:0] r4:ed98b5e0 r3:e7c61f70
[2208.395221:0] [<c00f78c0>] (proc_reg_write+0x0/0x9c) from[<c00b3384>] (vfs_write+0xb8/0x144)
[2208.403715:0] [<c00b32cc>] (vfs_write+0x0/0x144) from[<c00b34d4>] (sys_write+0x44/0x70)
[2208.411772:0] r8:00000002 r7:00000000 r6:00000000 r5:b76e780cr4:e331c5c0
[2208.418695:0] [<c00b3490>] (sys_write+0x0/0x70) from[<c000e040>] (ret_fast_syscall+0x0/0x30)
[2208.427185:0] r8:c000e1e8 r7:00000004 r6:00000001 r5:00000002r4:00000003
[2208.434101:0] Code: 0a000000 e12fff33 e3a03000 e3a02001 (e5c32000)
[2208.440344:0] Enter crash kexec !!
[2208.443747:1] CPU 1 will stop doing anything useful since anotherCPU has crashed
[2208.451905:0] Loading crashdump kernel...
[2208.455900:0] Software reset on panic!
> 1904 1384 1 e3355000 RU 2.9 663340 23740 MediaScannerSer
> 2309 1394 0 d9742c00 RU 0.1 820 480 sh
因为是多核,到底是哪个进程?
其实上面的log信息已经显示出
[2207.621205:0]
CPU: 0 Nottainted (3.4.0-gc37fe8c-dirty #651)
[2208.257865:0] Process sh (pid: 2309, stack limit = 0xe7c602f0)
也可以通过set命令得出:
crash>set 2309
PID:2309
COMMAND:"sh"
TASK:d9742c00 [THREAD_INFO: e7c60000]
CPU:0
STATE:TASK_RUNNING (PANIC)
crash>set 1904
PID:1904
COMMAND:"MediaScannerSer"
TASK:e3355000 [THREAD_INFO: e0422000]
CPU:1
STATE:TASK_RUNNING (ACTIVE)
[2207.621205:0] CPU: 0 Not tainted (3.4.0-gc37fe8c-dirty #651)
[2207.627196:0] PC is at sysrq_handle_crash+0x38/0x48
[2207.632059:0] LR is at _raw_spin_unlock_irqrestore+0x20/0x40
[2207.637699:0]
pc : [<c01ed158>] lr : [<c0569e2c>] psr: 60000093
[2207.637704:0] sp : e7c61ec8 ip : e7c61e98 fp : e7c61ed4
[2207.649487:0] r10: e7c61f70 r9 : e7c60000 r8 : 00000000
[2207.654865:0] r7 : 60000013 r6 : 00000063 r5 : 00000004 r4 :c079fc74
[2207.661539:0] r3 : 00000000 r2 : 00000001 r1 : 20000093 r0 :00000001
当前的PC值是c01ed158
,使用命令dis-r xxx得到出问题的具体地方和从函数入口到此处的代码
helpdis
-r (reverse) displays all instructions from the start of the
routineup to and including the designated address.
crash>dis -r c01ed158
0xc01ed120<sysrq_handle_crash>: mov r12, sp
0xc01ed124<sysrq_handle_crash+4>: push {r11, r12, lr, pc}
0xc01ed128<sysrq_handle_crash+8>: sub r11, r12, #4
0xc01ed12c<sysrq_handle_crash+12>: ldr r3, [pc, #44] ;0xc01ed160 <sysrq_handle_crash+64>
0xc01ed130<sysrq_handle_crash+16>: mov r2, #1
0xc01ed134<sysrq_handle_crash+20>: str r2, [r3]
0xc01ed138<sysrq_handle_crash+24>: dsb sy
0xc01ed13c<sysrq_handle_crash+28>: ldr r3, [pc, #32] ;0xc01ed164 <sysrq_handle_crash+68>
0xc01ed140<sysrq_handle_crash+32>: ldr r3, [r3, #24]
0xc01ed144<sysrq_handle_crash+36>: cmp r3, #0
0xc01ed148<sysrq_handle_crash+40>: beq 0xc01ed150<sysrq_handle_crash+48>
0xc01ed14c<sysrq_handle_crash+44>: blx r3
0xc01ed150<sysrq_handle_crash+48>: mov r3, #0
0xc01ed154<sysrq_handle_crash+52>: mov r2, #1
0xc01ed158<sysrq_handle_crash+56>: strb r2, [r3]
出问题的具体地方就是strbr2, [r3],且此时r3:
00000000,把数据放入0地址肯定异常。
下面查找原因,看r3来自哪里?向上看就是:
0xc01ed150<sysrq_handle_crash+48>: mov r3, #0,是代码显示赋值的,且不是来自入口参数。
查找具体的代码,看问题的原因.
Helpdis
-l displays source code line number data in addition to the
disassemblyoutput.
crash>dis -rl c01ed158
/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:129
0xc01ed120<sysrq_handle_crash>: mov r12, sp
0xc01ed124<sysrq_handle_crash+4>: push {r11, r12, lr, pc}
0xc01ed128<sysrq_handle_crash+8>: sub r11, r12, #4
/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:132
0xc01ed12c<sysrq_handle_crash+12>: ldr r3, [pc, #44] ;0xc01ed160 <sysrq_handle_crash+64>
0xc01ed130<sysrq_handle_crash+16>: mov r2, #1
0xc01ed134<sysrq_handle_crash+20>: str r2, [r3]
/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:133
0xc01ed138<sysrq_handle_crash+24>: dsb sy
/home/wenshuai/code/kernel3.4/linux_kernel/arch/arm/include/asm/outercache.h:114
0xc01ed13c<sysrq_handle_crash+28>: ldr r3, [pc, #32] ;0xc01ed164 <sysrq_handle_crash+68>
0xc01ed140<sysrq_handle_crash+32>: ldr r3, [r3, #24]
0xc01ed144<sysrq_handle_crash+36>: cmp r3, #0
0xc01ed148<sysrq_handle_crash+40>: beq 0xc01ed150<sysrq_handle_crash+48>
/home/wenshuai/code/kernel3.4/linux_kernel/arch/arm/include/asm/outercache.h:115
0xc01ed14c<sysrq_handle_crash+44>: blx r3
/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:134
0xc01ed150<sysrq_handle_crash+48>: mov r3, #0
0xc01ed154<sysrq_handle_crash+52>: mov r2, #1
0xc01ed158<sysrq_handle_crash+56>: strb r2, [r3]
从上可知出问题的具体地方是inux_kernel/drivers/tty/sysrq.c
staticvoid sysrq_handle_crash(int key)
{
char*killer = NULL;
panic_on_oops= 1; /* force panic */
wmb();
*killer= 1;
}
这个例子当然很简单,可以很容易看出原因。更过的错误原因是入口参数导致的,输入参数的某个成员没有赋值等原因导致
> /proc/sysrq-trigger.
得到crash文件后,一般情况下,最想看到的是错误类型和发生错误时的registers和backtrace.可以通过命令log|
tail -200得到,意思是得到log文件的最后200行:
crash>log | tail -200
[2207.597488:0] Unable to handle kernel NULL pointer dereference atvirtual address 00000000[2207.605719:0] pgd = ddf30000
[2207.608588:0] [00000000] *pgd=00000000
[2207.612339:0] Internal error: Oops: 805 [#1] PREEMPT SMP ARM
[2207.617975:0] Modules linked in:
[2207.621205:0] CPU: 0 Not tainted (3.4.0-gc37fe8c-dirty #651)
[2207.627196:0] PC is at sysrq_handle_crash+0x38/0x48
[2207.632059:0] LR is at _raw_spin_unlock_irqrestore+0x20/0x40
[2207.637699:0] pc : [<c01ed158>] lr : [<c0569e2c>] psr: 60000093
[2207.637704:0] sp : e7c61ec8 ip : e7c61e98 fp : e7c61ed4
[2207.649487:0] r10: e7c61f70 r9 : e7c60000 r8 : 00000000
[2207.654865:0] r7 : 60000013 r6 : 00000063 r5 : 00000004 r4 :c079fc74
[2207.661539:0] r3 : 00000000 r2 : 00000001 r1 : 20000093 r0 :00000001
[2207.668215:0] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
[2207.675582:0] Control: 10c53c7d Table: 9ff3004a DAC: 00000015
[2207.681477:0]
[2208.257865:0] Process sh (pid: 2309, stack limit =
0xe7c602f0)
[2208.351387:0] Backtrace:
[2208.354019:0] [<c01ed120>] (sysrq_handle_crash+0x0/0x48) from[<c01ed838>] (__handle_sysrq+0xac/0x158)
[2208.363293:0] [<c01ed78c>] (__handle_sysrq+0x0/0x158) from[<c01ed914>] (write_sysrq_trigger+0x30/0x38)
[2208.372647:0] r8:b76e780c r7:00000002 r6:e331c5c0 r5:c01ed8e4r4:00000002
[2208.379373:0] r3:e7c61f70
[2208.382188:0] [<c01ed8e4>] (write_sysrq_trigger+0x0/0x38) from[<c00f7948>] (proc_reg_write+0x88/0x9c)
[2208.391455:0] r4:ed98b5e0 r3:e7c61f70
[2208.395221:0] [<c00f78c0>] (proc_reg_write+0x0/0x9c) from[<c00b3384>] (vfs_write+0xb8/0x144)
[2208.403715:0] [<c00b32cc>] (vfs_write+0x0/0x144) from[<c00b34d4>] (sys_write+0x44/0x70)
[2208.411772:0] r8:00000002 r7:00000000 r6:00000000 r5:b76e780cr4:e331c5c0
[2208.418695:0] [<c00b3490>] (sys_write+0x0/0x70) from[<c000e040>] (ret_fast_syscall+0x0/0x30)
[2208.427185:0] r8:c000e1e8 r7:00000004 r6:00000001 r5:00000002r4:00000003
[2208.434101:0] Code: 0a000000 e12fff33 e3a03000 e3a02001 (e5c32000)
[2208.440344:0] Enter crash kexec !!
[2208.443747:1] CPU 1 will stop doing anything useful since anotherCPU has crashed
[2208.451905:0] Loading crashdump kernel...
[2208.455900:0] Software reset on panic!
得到当前出问题的进程:
crash>ps | grep ">"> 1904 1384 1 e3355000 RU 2.9 663340 23740 MediaScannerSer
> 2309 1394 0 d9742c00 RU 0.1 820 480 sh
因为是多核,到底是哪个进程?
其实上面的log信息已经显示出
[2207.621205:0]
CPU: 0 Nottainted (3.4.0-gc37fe8c-dirty #651)
[2208.257865:0] Process sh (pid: 2309, stack limit = 0xe7c602f0)
也可以通过set命令得出:
crash>set 2309
PID:2309
COMMAND:"sh"
TASK:d9742c00 [THREAD_INFO: e7c60000]
CPU:0
STATE:TASK_RUNNING (PANIC)
crash>set 1904
PID:1904
COMMAND:"MediaScannerSer"
TASK:e3355000 [THREAD_INFO: e0422000]
CPU:1
STATE:TASK_RUNNING (ACTIVE)
从出问题的具体位置开始分析
[2207.621205:0] CPU: 0 Not tainted (3.4.0-gc37fe8c-dirty #651)[2207.627196:0] PC is at sysrq_handle_crash+0x38/0x48
[2207.632059:0] LR is at _raw_spin_unlock_irqrestore+0x20/0x40
[2207.637699:0]
pc : [<c01ed158>] lr : [<c0569e2c>] psr: 60000093
[2207.637704:0] sp : e7c61ec8 ip : e7c61e98 fp : e7c61ed4
[2207.649487:0] r10: e7c61f70 r9 : e7c60000 r8 : 00000000
[2207.654865:0] r7 : 60000013 r6 : 00000063 r5 : 00000004 r4 :c079fc74
[2207.661539:0] r3 : 00000000 r2 : 00000001 r1 : 20000093 r0 :00000001
当前的PC值是c01ed158
,使用命令dis-r xxx得到出问题的具体地方和从函数入口到此处的代码
helpdis
-r (reverse) displays all instructions from the start of the
routineup to and including the designated address.
crash>dis -r c01ed158
0xc01ed120<sysrq_handle_crash>: mov r12, sp
0xc01ed124<sysrq_handle_crash+4>: push {r11, r12, lr, pc}
0xc01ed128<sysrq_handle_crash+8>: sub r11, r12, #4
0xc01ed12c<sysrq_handle_crash+12>: ldr r3, [pc, #44] ;0xc01ed160 <sysrq_handle_crash+64>
0xc01ed130<sysrq_handle_crash+16>: mov r2, #1
0xc01ed134<sysrq_handle_crash+20>: str r2, [r3]
0xc01ed138<sysrq_handle_crash+24>: dsb sy
0xc01ed13c<sysrq_handle_crash+28>: ldr r3, [pc, #32] ;0xc01ed164 <sysrq_handle_crash+68>
0xc01ed140<sysrq_handle_crash+32>: ldr r3, [r3, #24]
0xc01ed144<sysrq_handle_crash+36>: cmp r3, #0
0xc01ed148<sysrq_handle_crash+40>: beq 0xc01ed150<sysrq_handle_crash+48>
0xc01ed14c<sysrq_handle_crash+44>: blx r3
0xc01ed150<sysrq_handle_crash+48>: mov r3, #0
0xc01ed154<sysrq_handle_crash+52>: mov r2, #1
0xc01ed158<sysrq_handle_crash+56>: strb r2, [r3]
出问题的具体地方就是strbr2, [r3],且此时r3:
00000000,把数据放入0地址肯定异常。
下面查找原因,看r3来自哪里?向上看就是:
0xc01ed150<sysrq_handle_crash+48>: mov r3, #0,是代码显示赋值的,且不是来自入口参数。
找到错误位置,并更改
查找具体的代码,看问题的原因.Helpdis
-l displays source code line number data in addition to the
disassemblyoutput.
crash>dis -rl c01ed158
/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:129
0xc01ed120<sysrq_handle_crash>: mov r12, sp
0xc01ed124<sysrq_handle_crash+4>: push {r11, r12, lr, pc}
0xc01ed128<sysrq_handle_crash+8>: sub r11, r12, #4
/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:132
0xc01ed12c<sysrq_handle_crash+12>: ldr r3, [pc, #44] ;0xc01ed160 <sysrq_handle_crash+64>
0xc01ed130<sysrq_handle_crash+16>: mov r2, #1
0xc01ed134<sysrq_handle_crash+20>: str r2, [r3]
/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:133
0xc01ed138<sysrq_handle_crash+24>: dsb sy
/home/wenshuai/code/kernel3.4/linux_kernel/arch/arm/include/asm/outercache.h:114
0xc01ed13c<sysrq_handle_crash+28>: ldr r3, [pc, #32] ;0xc01ed164 <sysrq_handle_crash+68>
0xc01ed140<sysrq_handle_crash+32>: ldr r3, [r3, #24]
0xc01ed144<sysrq_handle_crash+36>: cmp r3, #0
0xc01ed148<sysrq_handle_crash+40>: beq 0xc01ed150<sysrq_handle_crash+48>
/home/wenshuai/code/kernel3.4/linux_kernel/arch/arm/include/asm/outercache.h:115
0xc01ed14c<sysrq_handle_crash+44>: blx r3
/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:134
0xc01ed150<sysrq_handle_crash+48>: mov r3, #0
0xc01ed154<sysrq_handle_crash+52>: mov r2, #1
0xc01ed158<sysrq_handle_crash+56>: strb r2, [r3]
从上可知出问题的具体地方是inux_kernel/drivers/tty/sysrq.c
staticvoid sysrq_handle_crash(int key)
{
char*killer = NULL;
panic_on_oops= 1; /* force panic */
wmb();
*killer= 1;
}
这个例子当然很简单,可以很容易看出原因。更过的错误原因是入口参数导致的,输入参数的某个成员没有赋值等原因导致
相关文章推荐
- 怎样分析crash dump(内存错误)
- 怎样分析crash dump(软件看门狗超时)
- 怎样分析crash dump(内存错误)
- 怎样分析crash dump(软件看门狗超时)
- 怎样分析crash dump(内存错误)
- 使用 Crash 工具分析 Linux dump 文件
- 使用 Crash 工具分析 Linux dump 文件
- 【性能诊断】九、并发场景的性能分析(windbg案例,Fist Chance Exception/Crash dump)
- 使用 Crash 工具分析 Linux dump 文件
- 使用 Crash 工具分析 Linux dump 文件
- 使用 Crash 工具分析 Linux dump 文件
- show platform crashdump CrashDump及事后分析
- 某杀毒软件的crash dump 分析
- redhat-kernel-kdump-crash----内核dump文件分析
- 一次因内存覆盖引起的system dump问题分析,基于linux的crash工具。
- 怎样保存kernel crash dump 文件
- 使用 Crash 工具分析 Linux dump 文件
- 使用 Crash 工具分析 Linux dump 文件
- android arm linux下使用内存转储crash工具分析 kernel system dump问题
- 使用 Crash 工具分析 Linux dump 文件