您的位置：首页 > 运维架构 > Linux

Analysis of Linux kernel crashes

2015-12-06 23:21 477 查看

From: http://stablebits.blogspot.hk/
Introduction
Tools
Format of a crash report
Analysis

Simple case to learn the basics
Crash in a binary kernel module
Suspected memory corruption

Extra Details

Introduction

The aim of this post is to illustrate the analysis of Linux kernel crashes by studying a few real-life examples. The examples are coming from a MIPS platform, but the general approach is applicable to other architectures.

It's implied that readers have knowledge of C programming and of basic operating system concepts, like virtual memory.

We begin by analyzing a simple crash to illustrate the basics. Further, we reconstruct what happened in case of a crash in a kernel module that has no source code available. Finally, we consider a crash caused by "memory corruption".

The information provided here is by no means comprehensive. We take a minimalist approach and don't consider tools such as Kdump and crash.

Tools

Any general purpose disassembler is sufficient. We'll use objdump with

'-d'

option
here.

If a binary was built with debugging information,

objdump -S

can display source code intermixed with disassembly
[1]. Also,addr2line can be used to match addresses with source code file names and lines.

In order to interpret disassembly, we need to have the MIPS Instruction Reference and
the Compiler Register Usageinformation at hand [2], so please keep these
pages open while reading further material.

If you are not familiar with how the virtual memory space is divided on MIPS, please refer to 'Virtual Memory Layout' [3] in the last section.

Format of a crash report

Here is an example of a crash report:

[code]CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 8023afd0, ra == 8023b024
Oops[#1]:
Cpu 0
$ 0 : 00000000 1000fc00 8555fc54 00000000
$ 4 : 00000000 00000000 0000000b 00000001
$ 8 : 00000008 800445f4 00000000 00000000
$12 : 0000004f 0000004e 00000041 00000001
$16 : 00000000 8555fc54 0000000b 8555fc54
$20 : c01eded0 8555fbd0 80240000 7f7ff0c4
$24 : 00000002 c01d6edc
$28 : 8555e000 8555fab0 7f7ff0a0 8023b024
Hi : 00000000
Lo : 3b9aca00
epc : 8023afd0 strlen+0x0/0x28
    Tainted: PF
ra : 8023b024 strlcpy+0x2c/0x7c
Status: 1000fc04 IEp
Cause : 00000008
BadVA : 00000000
PrId : 0000c401 (Fusiv MIPS1)
Modules linked in: xt_CLASSIFY [ skipped proprietary (aka evil) modules ] ip6_tunnel tunnel6
Process controllerd (pid: 751, threadinfo=8555e000, task=8783add8, tls=00000000)
Stack : 1000fc01 7f7ff0d0 8008f4a4 7f7ff268 8555fc5f 8555fc54 8023aff8 80044500
    c01d6cc8 c01d6ab8 000000a4 7f7ff0c4 00000000 80050000 00000000 c026ca78
    c026ca78 8555fb18 8555fbd0 7f7ff0d0 7f7ff174 7f7ff0d0 7f7ff0c0 86458400
    000000a4 c01d4440 80631224 8026d168 87008838 8026ca84 00000000 00000001
    00000006 00000001 80631224 806312c0 1000fc01 fffffffe 805e5778 805e0000
...
Call Trace:
[<8023afd0>] strlen+0x0/0x28
[<8023b024>] strlcpy+0x2c/0x7c
[<c01d6cc8>] contoller_get_info+0x2c4/0x37c [controller_lkm]
[<c01d4440>] controller_init+0x3e0/0xa64 [controller_lkm]

Code: 00000000  03e00008  01031023 <80820000> 0808ebfa  00801821 24630001  80620000  00000000

Let's review the parts one by one.

[code]CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 8023afd0, ra == 8023b024
Oops[#1]:

A header indicates a particular reason for this crash.

On

CPU #0

a load or store instruction at address

epc

accessed
a virtual address

0x00000000

. There was no valid virtual to physical address translation available - hence,
the crash. In the middle of the report the same virtual address is shown as:

[code]    BadVA : 00000000

BadVA

is a register of the MIPS Coprocessor 0 that describes a memory address at which address exception occurred.

Unable to handle kernel paging request

is by far one of the most common reasons for crashes. You may also encounter:

[code]Kernel bug detected

It is triggered by one of the sanity checks in the kernel code, such as

BUG()

BUG_ON(condition)

.
This mechanism does what

assert()

does for user-space applications.

Other reasons can be found by running

'grep -rn die_if_kernel arch/mips/'

in a Linux kernel tree.

Further, the content of the registers is displayed.

[code]$ 0   : 00000000 1000fc00 8555fc54 00000000
$ 4   : 00000000 00000000 0000000b 00000001
[ ... ]
$24   : 00000002 c01d6edc
$28   : 8555e000 8555fab0 7f7ff0a0 8023b024
Hi    : 00000000
Lo    : 3b9aca00
[ ... ]
Status: 1000fc04    IEp
Cause : 00000008

Registers

$0-31

are general purpose MIPS
registers. To simplify reading, each of them has a mnemonic name in assembler code. Now it's time to take a quick look at Compiler
Register Usage. For example,

a0-3

correspond to

$4-7

,
which are used in the o32 calling convention to pass the first 4 arguments to a function. o32 is
the most commonly used calling convention on 32bit MIPS [2] and our examples here relate to it.

An ideal case is to have a complete dump of the memory used by the kernel. That would allow us to restore the environment - to see the content of local variables, various kernel data structures, etc. Kdump can
do this (no MIPS support at the moment). Nevertheless, in many cases the content of the registers alone reveals enough information to understand a problem.

The content of Status and Cause registers may be very useful in some cases, but we won't
consider them here.

[code]epc   : 8023afd0 strlen+0x0/0x28
ra    : 8023b024 strlcpy+0x2c/0x7c

epc

shows the address of the instruction that caused a crash.

ra
 ($31)

contains the return address from the last function called prior to a crash. In practice,

ra

usually
points either to a caller of the function where

epc

belongs to or to the same function as

epc

.
An invalid address in

ra

can indicate stack corruption (at least for non-leaf functions).

Names of the functions where

epc

and

ra

belong
to are displayed if the kernel was built with

CONFIG_KALLSYMS

enabled. In any case, these names can be located
with

objdump

.

The

+0x0/0x28

notation stands for

+offset/size

,
where

offset

is the offset of the instruction within a function it belongs to, and

size

is
the size of this function.

In most cases,

ra

points to the 2nd instruction that follows an instruction representing a function call (usually

jal

jalr

).
For example,

[code]  801f9bb8 <pci_bus_read_config_byte>:
  [...]
  801f9c14:       02202821        move    a1,s1
  801f9c18:       0040f809        jalr    v0
  801f9c1c:       24070001        li      a3,1
  801f9c20:       00408021        move    s0,v0
  801f9c24:       8fa20018        lw      v0,24(sp)

A function call is at

0x801f9c18

. Control gets back to

pci_bus_read_config_byte

0x801f9c20

jalr

saves
this address into

ra

before jumping to an address

v0

-
the start of a called function.

The instruction at

0x801f9c1c

is located in the delay
slot of

jalr

and is executed before any instruction in the called function. Delay slots of

jalr

are
often used to initialize one of the function arguments. If the called function above has at least 4 arguments, its 4th argument will be 1.

Branch instructions, like

bltz

(branch on less than zero), are another example of instructions with delay
slots.

We mentioned earlier that both

epc

and

ra

may
point to the same function. To illustrate this case, let's suppose that a crash occurs at

0x801f9c24

in the
above disassembly. Provided that a function call at

0x801f9c18

took place,

ra

would
point to

0x801f9c20

inside the same function as

epc

[code]  Modules linked in: xt_CLASSIFY [ ... ] ip6_tunnel tunnel6

This is a list of loaded kernel modules.

[code]  Process controllerd (pid: 751, threadinfo=8555e000, task=8783add8, tls=00000000)

This is information about a process that was running at the moment of a crash.

In an ideal world where kernels and, especially, kernel modules behave well, user-space actions can never trigger a kernel crash. No matter what these actions are. Kernel-mode tasks have full privileges to cause havoc though.

A user-space task appears here in one of the following cases:

it has triggered a kernel action (usually via a system call like

ioctl()

) that, due to a kernel bug or memory
corruption, results in a crash.

a crash has occurred in the interrupt context. Unless your kernel supports threaded interrupt handlers (i.e. interrupts are handled by dedicated threads) or separate interrupt stacks, the kernel-space stack of a current task is used by the interrupt handling
code. The displayed task is usually unrelated in this case.

[code]  Stack : 1000fc01 7f7ff0d0 8008f4a4 7f7ff268 8555fc5f 8555fc54 8023aff8 80044500
          [ ... ]
          00000006 00000001 80631224 806312c0 1000fc01 fffffffe 805e5778 805e0000
         ...

This is a partial dump of the kernel-space stack of the current task, which we have discussed in the previous section.

[code]  Call Trace:
  [<8023afd0>] strlen+0x0/0x28
  [<8023b024>] strlcpy+0x2c/0x7c
  [<c01d6cc8>] controller_get_info+0x2c4/0x37c [controller_lkm]
  [<c01d4440>] controller_init+0x3e0/0xa64 [controller_lkm]
  [ ... ]

As the name suggests, this is a call trace.

It does not always represent a genuine call trace though. When

epc

points to an invalid address on MIPS or

raw_show_trace

is
enabled in the kernel command line, the so-called raw call trace is displayed. It contains all the values from stack that look like valid return addresses. So there can be 'ghost' traces of previously run and completely unrelated functions. For curious readers,
the implementation of both methods is in

show_backtrace()

and

show_raw_backtrace()

arch/mips/kernel/traps.c

[code]  Code: 00000000  03e00008  01031023 <80820000> 0808ebfa  00801821 24630001  80620000  00000000

Finally, this last section displays a sequence of instructions (binary representation) at and around

epc

,
with the instruction at

epc

being indicated by <> symbols.

Analysis

Simple crash to learn the basics

Let's now analyze the crash that was used as an example in the previous section.

[code]  CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 8023afd0, ra == 8023b024
  [...]
  epc   : 8023afd0 strlen+0x0/0x28
  ra    : 8023b024 strlcpy+0x2c/0x7c

The kernel was built with

CONFIG_KALLSYMS

enabled, and that allows us to see the names of functions where
the instruction at

epc

and

ra

belong
to. If it was not the case, we may note that both

epc

and

ra

belong
to

kseg0

[3], so we may expect to find them inside the kernel image (vmlinux).

A quick sanity check for

ra

(recall that property of

ra

we
mentioned above):

[code]  8023aff8 <strlcpy>:
  [...]
  8023b018:       afbf0020        sw      ra,32(sp)
  8023b01c:       0c08ebf4        jal     8023afd0 <strlen>
  8023b020:       00a08021        move    s0,a1
  8023b024:       00408821        move    s1,v0                  <=== 'ra' points to this location
  [...]

ra

is indeed one instruction away (delay slot) from

jal

.
It's also clear that a function being called is

strlen()

. In cases when the address of a called function
is not known at build time (kernel modules), disassembly may look as follows:

[code]  b6d38:        3c020000        lui     v0,0x0
  b6d3c:        24420000        addiu   v0,v0,0
  b6d40:        0040f809        jalr    v0

The first 2 instructions are changed by the loader at run time. In such cases, don't get confused when disassembly and the

Code:

sequence
of a crash dump display different instructions at the same address. Usually, there is a remote resemblance though. For instance, the instructions above might have been changed as follows:

[code]  b6d38:        3c02804a        lui     v0,0x804a
  b6d3c:        2442346c        addiu   v0,v0,13420

This basically corresponds to

v0 = 0x804a346c

.

For 'epc',

[code]  8023afd0 <strlen>:
  8023afd0:       80820000        lb      v0,0(a0)               <=== 'epc' is here
  8023afd4:       0808ebfa        j       8023afe8 <strlen+0x18>

  8023afd8:       00801821        move    v1,a0
  8023afdc:       24630001        addiu   v1,v1,1
  [...]

we may make the following observations:

it's indeed the first instruction (offset 0x0) of

strlen()

;

a0 ($4)

is indeed 0. The instruction at

epc

loads
a byte from address

MEM[a0 + 0]

and

BadVA

is
reported to be 0. Hence,

a0

should have been 0 too.

[code]$ 4   : 00000000 [...]

The instructions from disassembly match the ones shown in the

Code:

sequence.

[code]Code: [...] <80820000> 0808ebfa  00801821 24630001  80620000  00000000

These checks can be also applied as sanity checks to ensure you have got the right image for disassembly.

Now,

a0

is supposed to hold the 1st (and only) argument of

strlen()

[Compiler
Register Usage]. Given that

epc

points to the 1st instruction,

a0

has
not yet been reused for anything else inside

strlen()

. This can be verified by analyzing an instructions
flow. The current working hypothesis is that

strlen(s)

has been called with

s
 == NULL

.

Let's see if we can figure out where this

s == NULL

is coming from by examining the caller of strlen -

strlcpy()

.
But before doing so, we should consider a few more aspects common to all functions.

At the beginning of most of the functions, there is a sequence of instruction called "prologue". For example,

[code]  800a28d0 <vfs_write>:
  800a28d0:       27bdffd0        addiu   sp,sp,-48
  800a28d4:       afb30024        sw      s3,36(sp)
  800a28d8:       afb20020        sw      s2,32(sp)
  800a28dc:       afb1001c        sw      s1,28(sp)
  800a28e0:       afb00018        sw      s0,24(sp)
  800a28e4:       afbf0028        sw      ra,40(sp)
  [...]

The first instruction creates a stack frame by reserving space on the stack. As per o32 calling
convention, it's a job of a called function to preserve non-temporaries registers (like $s-registers) if they are to be reused. This is what those

sw
 $reg, off(sp)

instructions are concerned with - saving to-be-reused-registers to the stack. Same applies to

ra

for
non-leaf functions (those calling other functions). Obviously, local variables also reside on this stack frame.

An "epilogue" sequence does the opposite actions.

[code]  800a2908:       8fbf0028        lw      ra,40(sp)
  800a290c:       8fb30024        lw      s3,36(sp)
  800a2910:       8fb20020        lw      s2,32(sp)
  800a2914:       8fb1001c        lw      s1,28(sp)
  800a2918:       8fb00018        lw      s0,24(sp)
  800a291c:       03e00008        jr      ra
  800a2920:       27bd0030        addiu   sp,sp,48

The content of reused registers is restored. The stack frame is deleted - usually, by the last instruction in a delay slot of

jr

.
Finally, control is given back to the caller by

jr ra

.

Stack corruptions may overwrite a value corresponding to

ra

, resulting in the control being given to unexpected
places (unless this is a result of a deliberate security attack). This is likely to result in

"Unable to handle
 kernel paging request"

"Unaligned access"

, or

"Invalid
 instruction"

. Quite often in such cases

epc

is equal to

ra

(or
close to it) or to both

ra

and

BadVA

.

"Epilogue" is not necessarily placed at the very end of a function. Moreover, a function may have more than one "epilogue".

One more thing before we get back to the analysis. Function calls look as follows:

[code]  800a2aa0:       00c03821        move    a3,a2
  800a2aa4:       00002021        move    a0,zero
  800a2aa8:       02202821        move    a1,s1
  800a2aac:       0c028848        jal     800a2120 <rw_verify_area>
  800a2ab0:       02603021        move    a2,s3

this sequence corresponds to

rw_verify_area(a0, a1, a2, a3)

with

a0

being
0 (

zero

is register $0). The instructions initializing a0-a3 do not have to be placed immediately next to
the

jal

instruction, but usually they are in some proximity.

The return value of a function, if any, is stored in

v0

(

$2

).

Let's get back to our analysis.

[code]  8023aff8 <strlcpy>:
  8023aff8:       27bdffd8        addiu   sp,sp,-40
  8023affc:       afb3001c        sw      s3,28(sp)
  8023b000:       00809821        move    s3,a0
  8023b004:       00a02021        move    a0,a1                    <=== the argument for strlen()
  8023b008:       afb20018        sw      s2,24(sp)
  8023b00c:       afb10014        sw      s1,20(sp)
  8023b010:       afb00010        sw      s0,16(sp)
  8023b014:       00c09021        move    s2,a2
  8023b018:       afbf0020        sw      ra,32(sp)
  8023b01c:       0c08ebf4        jal     8023afd0 <strlen>        <=== the call is here
  8023b020:       00a08021        move    s0,a1
  8023b024:       00408821        move    s1,v0                    <=== 'ra' points here
  [...]

strlen()

accepts a single argument that is passed via

a0

.
A few instructions above the actual call (see remarks) - at

0x8023b004

, we can see that

a0

is
being loaded with the content of

a1

. After examining the remaining instructions it becomes clear that

a1

still
holds its initial value that corresponds to the 2nd argument of

strlcpy(dst, src, len)

.

Now we can update our working hypothesis. It looks like

strlcpy()

has been called with its 2nd argument,

src

,
being NULL.

Real-life shortcut: we may simply examine the source code of

strlcpy()

and notice that there is a single call
to

strlen()

. This is in accordance with our hypothesis indeed.

[code]  size_t strlcpy(char *dest, const char *src, size_t size)
  {
          size_t ret = strlen(src);
  [...]

What's next? We can do the same analysis for

contoller_get_info()

that has supposedly called

strlcpy()

[code]  Call Trace:
  [<8023afd0>] strlen+0x0/0x28
  [<8023b024>] strlcpy+0x2c/0x7c
  [<c01d6cc8>] controller_get_info+0x2c4/0x37c [controller_lkm]
  [...]

Recall the remarks above regarding the validity of call traces. Basic sanity checks won't take much time. At the very least, check that

0xc01d6cc8

could
have been a valid

ra

(one instruction away from

jalr/jal

).
If it is the case, verify the code of (in this example)

controller_get_info()

to confirm that it does call

strlcpy()

.
Having disassembly intermixed with source code is helpful here [1].

In any case, there is obviously a limit as to how far "in the past" we would be able to look by analyzing a crash report even if we had a complete memory dump. Nevertheless, the results of this analysis - if not sufficient to reveal a root cause - are usually
very helpful in further debugging. As to this particular example, we would still need to analyze

controller_get_info()

to
understand why

strlcpy()

might have been called with

src
 == NULL

Crash in a binary kernel module

This crash occurred in a kernel module for which no source code is available.

[code]  CPU 1 Unable to handle kernel paging request at virtual address 00000000, epc == c1c52470, ra == c1c63d64
  [...]
  $ 0   : 00000000 10008d00 c1c523f0 00000000
  $ 4   : 00000000 c1f18f5c 0000008c ffff00fe
  $ 8   : 80008fe1 15941794 8e038b00 fefe7dfd
  $12   : faf9fdfe 7dfffe7f fb7eff7d 7b7e7e7c
  $16   : 00000000 00000800 c1e2d178 c1e2cf98
  $20   : 842ffe08 c1e2d1b4 00000050 c1c51fc0
  $24   : 00000000 00000000
  $28   : 842fc000 842ffdf0 00000000 c1c63d64
  [...]
  epc   : c1c52470 fast_memcpy+0x80/0x1cc [binary_blob_module]
      Tainted: P
  ra    : c1c63d64 net_egress+0x80/0x2b78 [binary_blob_module]
  [...]

The load address of a kernel module is not known at build time, so we see relative addresses in the disassembly of

binary_blob_module

.
We can use

'+0x80/0x1cc'

to locate the instruction at

epc

[code]  a53f0 <fast_memcpy>: 
  [...]
  a5468:       98ab000f        lwr     t3,15(a1)
  a546c:       98af001f        lwr     t7,31(a1)
  a5470:       a8880000        swl     t0,0(a0)      <== 'epc' points here at offset 0x80

The instruction at

epc

accesses

MEM[a0
 + 0]

, so

a0

should have been 0 to result in a memory access at

virtual
 address 0x00000000

. We can confirm this by verifying the content of the

a0

(

$4

)
register:

[code]  $ 4   : 00000000 [...]

Next step is to examine the flow of instructions to trace the source of the value in

a0

. A full listing is
not provided here, but what it revealed is that

a0

is used read-only. At

epc

it
still holds the 1st input argument of the function. Moreover, there are no explicit validity checks prior to its use. Thus, the 1st argument is expected to be a valid address.

The name of function,

fast_memcpy()

, suggests its

memcpy

-like
nature, so the 1st argument is likely to be

dst

(of course, this can be verified by a careful analysis of
disassembly).

Let's examine the caller,

net_egress()

[code]  b6ce4 <net_egress>:
  [...]
  b6d38:       3c020000        lui     v0,0x0           (1)
  b6d3c:       24420000        addiu   v0,v0,0          (2)
  b6d40:       0040f809        jalr    v0               (3)
  b6d44:       97a4001a        lhu     a0,26(sp)        (4)
  b6d48:       97a6001a        lhu     a2,26(sp)        (5)
  b6d4c:       00408021        move    s0,v0            (6)
  b6d50:       00402021        move    a0,v0            (7)
  b6d54:       3c020000        lui     v0,0x0           (8)
  b6d58:       24420000        addiu   v0,v0,0          (9)
  b6d5c:       0040f809        jalr    v0               (10)
  b6d60:       8fa5001c        lw      a1,28(sp)        (11)
  b6d64:       02202021        move    a0,s1             <== 'ra' points here at offset 0x80
  [...]

There are 2 function calls here. The first one, at line

(3)

, seems to have a single argument which gets initialized
at line

(4)

. Let's refer to this called function as

unknown_function

.
The second one, at line

(10)

, takes 3 arguments that are initialized at lines

(7)

(5)

,
and

(11)

respectively. Supposedly, this is a call of fast_memcpy() where the crash occurred.

Can we say something specific about those arguments?

the 1st argument,

a0

, of

fast_memcpy()

gets
initialized with the return value,

v0

, of

unknown_function()

at
line

(7)

;
this return value is passed as is, i.e. there are no validity check;
the 1st argument of

unknown_function()

and the 3rd one of

fast_memcpy()

get
initilized with the same value loaded from

26(sp)

at lines

(4)

and

(5)

.

These observations suggest that the source code may look as follows:

[code]  dst = unknown_function(len);
  fast_memcpy(dst, src, len);

In this particular case,

unknown_function()

returned NULL - hence, the crash.

Further, a question regarding the nature of that

unknown_function()

, accompanied by the analysis of the crash,
can be sent to a supplier of

binary_blob_module

. By submitting a more detailed report and asking concrete
questions for hard-to-reproduce problems, we can somewhat decrease the chances of having a (sometimes) default reply such as "please try reproducing it on our reference software and/or hardware".

Suspected memory corruption

Finally, let's consider a case where memory corruption is suspected.

[code]  CPU 0 Unable to handle kernel paging request at virtual address 0004349c, epc == 0004349c, ra == 80012224
  Oops[#1]:
  Cpu 0
  $ 0   : 00000000 7f99bcc0 00000069 2abc7ea0
  $ 4   : 00000000 7f99bd20 7f99be60 00000001
  $ 8   : 00000000 80000008 0004349c fffffff4
  $12   : 7f99bd08 00000001 00000000 00000000
  $16   : 2ab01000 2ab01000 7f99bdc8 00000000
  $20   : 2aafc6a8 00410000 7f99bde0 7f99be60
  $24   : 00000000 2abc7e80
  $28   : 85248000 85249f30 0040484c 80012224
  Hi    : 0000ba1a
  Lo    : ff98c506
  epc   : 0004349c 0x4349c
      Tainted: PF       W
  ra    : 80012224 stack_done+0x20/0x3c
  Status: 1100ff03    KERNEL EXL IE
  Cause : 10800008
  BadVA : 0004349c
  PrId  : 00019554 (MIPS 34Kc)
  [...]
  Process screen_plugin (pid: 799, threadinfo=85248000, task=87140038, tls=00000000)
  Stack : 2ab85040 00000000 00000001 00000000 00000000 00000000 00000000 7f99bcc0
          [...]
          00000001 00000000 2ac2d530 7f99bce8 7f99bd18 2aae83d4 0100ff13 00028675
  Call Trace:
  Code: (Bad address in epc)

Note that

epc == BadVA

. A CPU has tried to fetch an instruction at

0x0004349c

,
but this is not a valid kernel-space address.

Let's examine the code at

ra

[code]  ra    : 80012224 stack_done+0x20/0x3c

  80012140 <handle_sys>:
  [...]
  800121e0:       000240c0        sll     t0,v0,0x3
  800121e4:       3c098001        lui     t1,0x8001
  800121e8:       25292460        addiu   t1,t1,9312
  800121ec:       01284821        addu    t1,t1,t0
  800121f0:       8d2a0000        lw      t2,0(t1)
  800121f4:       1140005e        beqz    t2,80012370 <illegal_syscall>
  800121f8:       8d2b0004        lw      t3,4(t1)
  800121fc:       05610040        bgez    t3,80012300 <stackargs>
  80012200:       afa70080        sw      a3,128(sp)
  80012204 <stack_done>:
  80012204:       8f880008        lw      t0,8(gp)
  80012208:       3c098000        lui     t1,0x8000
  8001220c:       35290008        ori     t1,t1,0x8
  80012210:       01094024        and     t0,t0,t1
  80012214:       15000016        bnez    t0,80012270 <syscall_trace_entry>

  80012218:       00000000        nop
  8001221c:       0140f809        jalr    t2
  80012220:       00000000        nop
  80012224:       2408fb92        li      t0,-1134             <=== 'ra' points here

ra

is the valid return address for a function call at

0x8001221c

.
The address of that function is taken from

t2

(

$10

),
which indeed contains

0x0004349c

[code]  $ 8   : 00000000 80000008 0004349c fffffff4

So what we have is a call through a function pointer that contains a bogus (corrupted?) value.

Let's try to figure out where

t2

is coming from:

[code]  800121e0:       000240c0        sll     t0,v0,0x3
  800121e4:       3c098001        lui     t1,0x8001
  800121e8:       25292460        addiu   t1,t1,9312
  800121ec:       01284821        addu    t1,t1,t0
  800121f0:       8d2a0000        lw      t2,0(t1)
  800121f4:       1140005e        beqz    t2,80012370 <illegal_syscall>

The instruction at

0x800121f0

corresponds to

't2
 = *t1'

and is followed by an instruction that compares

t2

to 0. In case of

t2

being
0, control is given to

illegal_syscall

.

Well, some knowledge of the kernel internals would be helpful here. In any case, the appearance of names such as

illegal_syscall

handle_sys

,
and

syscall_trace_entry

suggest that the code in question has something to do with the handling of system
calls. The relevant code can indeed be found in

arch/mips/kernel/scall32-o32.S

.

Can we guess what syscall it was?

[code]  800121e4:       3c098001        lui     t1,0x8001
  800121e8:       25292460        addiu   t1,t1,9312

t1 = 0x80019312;

nm shows that this value corresponds to

sys_call_table

,
which is an array that contains addresses of all the system calls.

[code]  800121ec:       01284821        addu    t1,t1,t0

t1 = t1 + t0;

t0

is the offset in the table, which is being calculated as follows:

[code]  800121e0:       000240c0        sll     t0,v0,0x3

t0 = v0 * 8;

and the value of

v0

(

$2

)
is

0x69

(

decimal):

[code]  $ 0   : 00000000 7f99bcc0 00000069 2abc7ea0

The analysis of the source code of

handle_sys

(it's written in assembler) reveals that

v0

represents
a syscall number.

The syscall numbers are defined in

arch/mips/include/asm/unistd.h

. The one we are interested in corresponds
to

sys_getitimer()

[code]  #define __NR_getitimer                  (__NR_Linux + 105)

sys_call_table

is not modified at run time. Also, it was not the first and only call to

sys_gettimer()

-
the system has been functioning properly for days prior to this crash. So where do we go from here?

Memory corruptions often result in seemingly unrelated crashes: both in kernel and user-space. What common though is that all these crashes may look "weird". That is, the careful analysis does not reveal any obvious problems with the code and, moreover, suggests
possible external influence, be it stack/memory corruption or hardware issues. Having multiple crashes in different parts of the core kernel code is usually a good indicator too.

Of course, it's always possible to overlook something. So the larger a set of crashes from which a conclusion is drawn, the better.

The strategy then is to look for common patterns.

In this particular case, there was another crash in the same location (among a dozen of crashes in yet other areas) where the syscall number and

epc

were

0x68

and

0x000430d8

correspondingly.

[code]  syscall 0x69 (sys_getitimer) and epc: 0x0004349c
  syscall 0x68 (sys_setitimer) and epc: 0x000430d8

These 2 slots are neighboring in

sys_call_table

. Maybe it's just a coincidence but is worth taking into account.
Now, what's about the content of

epc

? Do we actually know how the correct values would look like? We can
look them up with nmor from disassembly:

[code]  8004349c <sys_getitimer>:
  800430d8 <sys_setitimer>:

Is there another pattern? Yes, the only difference between good and bad values is in the high bit 0x80000000. ...

p.s. This missing-high-bit theory had explanatory power when applied to some of the other "weird" crashes for which it was possible to infer the good value. How could this bit be cleared? In the end, it has been found that DDR timing settings were not properly
set in the bootloader. However, as of the moment of this writing, it's not yet clear whether the problem has been completely resolved.

Perhaps, we can dedicate another post specifically to the analysis of "weird" crashes.

Many thanks to Yuri Leikind, Bero Brekalo, and Alina Krynina for review and useful suggestions.

Extra Details

[1] objdump

'objdump -d'

alone may be sufficient in many cases (not to mention all the fun of matching disassembly and
source code on your own). Alternatively, you can reproduce the original binary (if possible) with debugging information enabled and then use

'-dS'

.
Be careful though to double-check that the addresses you are interested in correspond to the same instructions in both original and new disassembly files. If it's not the case, code shifts/changes should be taken into account.

[2] Calling conventions

Be sure to verify the options used by your toolchain, if in doubt. For gcc,

'-mabi=type'

options are used.
For example,

'-mabi=32'

corresponds to o32.

[3] Virtual Memory Layout on MIPS

Please refer to MIPS Address Space for a general review.

Regarding the use in Linux:

1)

kuseg

range

[0x00000000,
 0x80000000)

is user-space addresses.

A private address space of user-space processes resides in this range. From kernel-space this area can be safely accessed only by means of special-purpose functions, like

copy_to_user()

and

copy_from_user()

.
Direct accesses are always a bug, even though, given the nature of MIPS's MMU, such accesses may appear to be working properly under certain circumstances.

2)

kseg0

range

[0x80000000,
 0xa0000000)

is kernel-space addresses used by the kernel code and data (vmlinux).

Dynamic allocations via general purpose allocators, such as

kmalloc() and __get_free_pages()

(but not from
the

ZONE_HIGHMEM

zone) return addresses in this range. [ to-be-continued ]

3)

kseg2

range

[0xc0000000,
 0xffffffff)

is kernel-space addresses used by the code and data of kernel modules.

vmalloc()

and

vmap()

allocations
return addresses in this range.

For

kuseg

and

kseg2

,
the translation of virtual addresses into physical ones is done via MMU. Conversely,

kseg0

addresses don't
require MMU translations; the translation is done simply by stripping off the top-bit. For example,

0x80100000

corresponds
to

0x00100000 (1 MB)

in RAM.

kseg0

ranges are both virtually and physically contiguous, while

kuseg

and

kseg2

are
only virtually contiguous.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航