您的位置:首页 > 其它

怎样分析crash dump(空指针)

2014-02-13 17:47 239 查看
以简单的系统提供的crash方法为例,echoc
> /proc/sysrq-trigger.
得到crash文件后,一般情况下,最想看到的是错误类型和发生错误时的registers和backtrace.可以通过命令log|
tail -200得到,意思是得到log文件的最后200行:

crash>log | tail -200

[2207.597488:0] Unable to handle kernel NULL pointer dereference atvirtual address 00000000
[2207.605719:0] pgd = ddf30000
[2207.608588:0] [00000000] *pgd=00000000
[2207.612339:0] Internal error: Oops: 805 [#1] PREEMPT SMP ARM
[2207.617975:0] Modules linked in:
[2207.621205:0] CPU: 0 Not tainted (3.4.0-gc37fe8c-dirty #651)
[2207.627196:0] PC is at sysrq_handle_crash+0x38/0x48
[2207.632059:0] LR is at _raw_spin_unlock_irqrestore+0x20/0x40
[2207.637699:0] pc : [<c01ed158>] lr : [<c0569e2c>] psr: 60000093
[2207.637704:0] sp : e7c61ec8 ip : e7c61e98 fp : e7c61ed4
[2207.649487:0] r10: e7c61f70 r9 : e7c60000 r8 : 00000000
[2207.654865:0] r7 : 60000013 r6 : 00000063 r5 : 00000004 r4 :c079fc74
[2207.661539:0] r3 : 00000000 r2 : 00000001 r1 : 20000093 r0 :00000001
[2207.668215:0] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
[2207.675582:0] Control: 10c53c7d Table: 9ff3004a DAC: 00000015
[2207.681477:0]

[2208.257865:0] Process sh (pid: 2309, stack limit =
0xe7c602f0)

[2208.351387:0] Backtrace:

[2208.354019:0] [<c01ed120>] (sysrq_handle_crash+0x0/0x48) from[<c01ed838>] (__handle_sysrq+0xac/0x158)
[2208.363293:0] [<c01ed78c>] (__handle_sysrq+0x0/0x158) from[<c01ed914>] (write_sysrq_trigger+0x30/0x38)
[2208.372647:0] r8:b76e780c r7:00000002 r6:e331c5c0 r5:c01ed8e4r4:00000002
[2208.379373:0] r3:e7c61f70
[2208.382188:0] [<c01ed8e4>] (write_sysrq_trigger+0x0/0x38) from[<c00f7948>] (proc_reg_write+0x88/0x9c)
[2208.391455:0] r4:ed98b5e0 r3:e7c61f70
[2208.395221:0] [<c00f78c0>] (proc_reg_write+0x0/0x9c) from[<c00b3384>] (vfs_write+0xb8/0x144)
[2208.403715:0] [<c00b32cc>] (vfs_write+0x0/0x144) from[<c00b34d4>] (sys_write+0x44/0x70)
[2208.411772:0] r8:00000002 r7:00000000 r6:00000000 r5:b76e780cr4:e331c5c0
[2208.418695:0] [<c00b3490>] (sys_write+0x0/0x70) from[<c000e040>] (ret_fast_syscall+0x0/0x30)
[2208.427185:0] r8:c000e1e8 r7:00000004 r6:00000001 r5:00000002r4:00000003
[2208.434101:0] Code: 0a000000 e12fff33 e3a03000 e3a02001 (e5c32000)

[2208.440344:0] Enter crash kexec !!
[2208.443747:1] CPU 1 will stop doing anything useful since anotherCPU has crashed
[2208.451905:0] Loading crashdump kernel...
[2208.455900:0] Software reset on panic!

得到当前出问题的进程:

crash>ps | grep ">"

> 1904 1384 1 e3355000 RU 2.9 663340 23740 MediaScannerSer

> 2309 1394 0 d9742c00 RU 0.1 820 480 sh

因为是多核,到底是哪个进程?

其实上面的log信息已经显示出

[2207.621205:0]
CPU: 0 Nottainted (3.4.0-gc37fe8c-dirty #651)

[2208.257865:0] Process sh (pid: 2309, stack limit = 0xe7c602f0)

也可以通过set命令得出:

crash>set 2309

PID:2309

COMMAND:"sh"

TASK:d9742c00 [THREAD_INFO: e7c60000]

CPU:0

STATE:TASK_RUNNING (PANIC)

crash>set 1904

PID:1904

COMMAND:"MediaScannerSer"

TASK:e3355000 [THREAD_INFO: e0422000]

CPU:1

STATE:TASK_RUNNING (ACTIVE)


从出问题的具体位置开始分析

[2207.621205:0] CPU: 0 Not tainted (3.4.0-gc37fe8c-dirty #651)
[2207.627196:0] PC is at sysrq_handle_crash+0x38/0x48

[2207.632059:0] LR is at _raw_spin_unlock_irqrestore+0x20/0x40
[2207.637699:0]
pc : [<c01ed158>] lr : [<c0569e2c>] psr: 60000093
[2207.637704:0] sp : e7c61ec8 ip : e7c61e98 fp : e7c61ed4
[2207.649487:0] r10: e7c61f70 r9 : e7c60000 r8 : 00000000
[2207.654865:0] r7 : 60000013 r6 : 00000063 r5 : 00000004 r4 :c079fc74
[2207.661539:0] r3 : 00000000 r2 : 00000001 r1 : 20000093 r0 :00000001

当前的PC值是c01ed158

,使用命令dis-r xxx得到出问题的具体地方和从函数入口到此处的代码

helpdis

-r (reverse) displays all instructions from the start of the

routineup to and including the designated address.

crash>dis -r c01ed158
0xc01ed120<sysrq_handle_crash>: mov r12, sp
0xc01ed124<sysrq_handle_crash+4>: push {r11, r12, lr, pc}
0xc01ed128<sysrq_handle_crash+8>: sub r11, r12, #4
0xc01ed12c<sysrq_handle_crash+12>: ldr r3, [pc, #44] ;0xc01ed160 <sysrq_handle_crash+64>
0xc01ed130<sysrq_handle_crash+16>: mov r2, #1
0xc01ed134<sysrq_handle_crash+20>: str r2, [r3]
0xc01ed138<sysrq_handle_crash+24>: dsb sy
0xc01ed13c<sysrq_handle_crash+28>: ldr r3, [pc, #32] ;0xc01ed164 <sysrq_handle_crash+68>
0xc01ed140<sysrq_handle_crash+32>: ldr r3, [r3, #24]
0xc01ed144<sysrq_handle_crash+36>: cmp r3, #0
0xc01ed148<sysrq_handle_crash+40>: beq 0xc01ed150<sysrq_handle_crash+48>
0xc01ed14c<sysrq_handle_crash+44>: blx r3
0xc01ed150<sysrq_handle_crash+48>: mov r3, #0
0xc01ed154<sysrq_handle_crash+52>: mov r2, #1
0xc01ed158<sysrq_handle_crash+56>: strb r2, [r3]

出问题的具体地方就是strbr2, [r3],且此时r3:
00000000,把数据放入0地址肯定异常。

下面查找原因,看r3来自哪里?向上看就是:

0xc01ed150<sysrq_handle_crash+48>: mov r3, #0,是代码显示赋值的,且不是来自入口参数。


找到错误位置,并更改

查找具体的代码,看问题的原因.

Helpdis

-l displays source code line number data in addition to the

disassemblyoutput.

crash>dis -rl c01ed158

/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:129

0xc01ed120<sysrq_handle_crash>: mov r12, sp

0xc01ed124<sysrq_handle_crash+4>: push {r11, r12, lr, pc}

0xc01ed128<sysrq_handle_crash+8>: sub r11, r12, #4

/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:132

0xc01ed12c<sysrq_handle_crash+12>: ldr r3, [pc, #44] ;0xc01ed160 <sysrq_handle_crash+64>

0xc01ed130<sysrq_handle_crash+16>: mov r2, #1

0xc01ed134<sysrq_handle_crash+20>: str r2, [r3]

/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:133

0xc01ed138<sysrq_handle_crash+24>: dsb sy

/home/wenshuai/code/kernel3.4/linux_kernel/arch/arm/include/asm/outercache.h:114

0xc01ed13c<sysrq_handle_crash+28>: ldr r3, [pc, #32] ;0xc01ed164 <sysrq_handle_crash+68>

0xc01ed140<sysrq_handle_crash+32>: ldr r3, [r3, #24]

0xc01ed144<sysrq_handle_crash+36>: cmp r3, #0

0xc01ed148<sysrq_handle_crash+40>: beq 0xc01ed150<sysrq_handle_crash+48>

/home/wenshuai/code/kernel3.4/linux_kernel/arch/arm/include/asm/outercache.h:115

0xc01ed14c<sysrq_handle_crash+44>: blx r3

/home/wenshuai/code/kernel3.4/linux_kernel/drivers/tty/sysrq.c:134

0xc01ed150<sysrq_handle_crash+48>: mov r3, #0

0xc01ed154<sysrq_handle_crash+52>: mov r2, #1

0xc01ed158<sysrq_handle_crash+56>: strb r2, [r3]

从上可知出问题的具体地方是inux_kernel/drivers/tty/sysrq.c

staticvoid sysrq_handle_crash(int key)

{

char*killer = NULL;

panic_on_oops= 1; /* force panic */

wmb();

*killer= 1;

}

这个例子当然很简单,可以很容易看出原因。更过的错误原因是入口参数导致的,输入参数的某个成员没有赋值等原因导致
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: