Analysis of Linux kernel crashes
2015-12-06 23:21
477 查看
From: http://stablebits.blogspot.hk/
Introduction
Tools
Format of a crash report
Analysis
Simple case to learn the basics
Crash in a binary kernel module
Suspected memory corruption
Extra Details
The aim of this post is to illustrate the analysis of Linux kernel crashes by studying a few real-life examples. The examples are coming from a MIPS platform, but the general approach is applicable to other architectures.
It's implied that readers have knowledge of C programming and of basic operating system concepts, like virtual memory.
We begin by analyzing a simple crash to illustrate the basics. Further, we reconstruct what happened in case of a crash in a kernel module that has no source code available. Finally, we consider a crash caused by "memory corruption".
The information provided here is by no means comprehensive. We take a minimalist approach and don't consider tools such as Kdump and crash.
Any general purpose disassembler is sufficient. We'll use objdump with
here.
If a binary was built with debugging information,
[1]. Also,addr2line can be used to match addresses with source code file names and lines.
In order to interpret disassembly, we need to have the MIPS Instruction Reference and
the Compiler Register Usageinformation at hand [2], so please keep these
pages open while reading further material.
If you are not familiar with how the virtual memory space is divided on MIPS, please refer to 'Virtual Memory Layout' [3] in the last section.
Here is an example of a crash report:
Let's review the parts one by one.
A header indicates a particular reason for this crash.
On
a virtual address
the crash. In the middle of the report the same virtual address is shown as:
is by far one of the most common reasons for crashes. You may also encounter:
It is triggered by one of the sanity checks in the kernel code, such as
This mechanism does what
Other reasons can be found by running
Further, the content of the registers is displayed.
Registers
registers. To simplify reading, each of them has a mnemonic name in assembler code. Now it's time to take a quick look at Compiler
Register Usage. For example,
which are used in the o32 calling convention to pass the first 4 arguments to a function. o32 is
the most commonly used calling convention on 32bit MIPS [2] and our examples here relate to it.
An ideal case is to have a complete dump of the memory used by the kernel. That would allow us to restore the environment - to see the content of local variables, various kernel data structures, etc. Kdump can
do this (no MIPS support at the moment). Nevertheless, in many cases the content of the registers alone reveals enough information to understand a problem.
The content of Status and Cause registers may be very useful in some cases, but we won't
consider them here.
points either to a caller of the function where
An invalid address in
Names of the functions where
to are displayed if the kernel was built with
with
The
where
the size of this function.
In most cases,
For example,
A function call is at
this address into
the start of a called function.
The instruction at
slot of
often used to initialize one of the function arguments. If the called function above has at least 4 arguments, its 4th argument will be 1.
Branch instructions, like
slots.
We mentioned earlier that both
point to the same function. To illustrate this case, let's suppose that a crash occurs at
above disassembly. Provided that a function call at
point to
This is a list of loaded kernel modules.
This is information about a process that was running at the moment of a crash.
In an ideal world where kernels and, especially, kernel modules behave well, user-space actions can never trigger a kernel crash. No matter what these actions are. Kernel-mode tasks have full privileges to cause havoc though.
A user-space task appears here in one of the following cases:
it has triggered a kernel action (usually via a system call like
corruption, results in a crash.
a crash has occurred in the interrupt context. Unless your kernel supports threaded interrupt handlers (i.e. interrupts are handled by dedicated threads) or separate interrupt stacks, the kernel-space stack of a current task is used by the interrupt handling
code. The displayed task is usually unrelated in this case.
This is a partial dump of the kernel-space stack of the current task, which we have discussed in the previous section.
As the name suggests, this is a call trace.
It does not always represent a genuine call trace though. When
enabled in the kernel command line, the so-called raw call trace is displayed. It contains all the values from stack that look like valid return addresses. So there can be 'ghost' traces of previously run and completely unrelated functions. For curious readers,
the implementation of both methods is in
Finally, this last section displays a sequence of instructions (binary representation) at and around
with the instruction at
Let's now analyze the crash that was used as an example in the previous section.
The kernel was built with
the instruction at
to. If it was not the case, we may note that both
to
A quick sanity check for
mentioned above):
It's also clear that a function being called is
is not known at build time (kernel modules), disassembly may look as follows:
The first 2 instructions are changed by the loader at run time. In such cases, don't get confused when disassembly and the
of a crash dump display different instructions at the same address. Usually, there is a remote resemblance though. For instance, the instructions above might have been changed as follows:
This basically corresponds to
For 'epc',
we may make the following observations:
it's indeed the first instruction (offset 0x0) of
a byte from address
reported to be 0. Hence,
The instructions from disassembly match the ones shown in the
These checks can be also applied as sanity checks to ensure you have got the right image for disassembly.
Now,
Register Usage]. Given that
not yet been reused for anything else inside
flow. The current working hypothesis is that
Let's see if we can figure out where this
But before doing so, we should consider a few more aspects common to all functions.
At the beginning of most of the functions, there is a sequence of instruction called "prologue". For example,
The first instruction creates a stack frame by reserving space on the stack. As per o32 calling
convention, it's a job of a called function to preserve non-temporaries registers (like $s-registers) if they are to be reused. This is what those
non-leaf functions (those calling other functions). Obviously, local variables also reside on this stack frame.
An "epilogue" sequence does the opposite actions.
The content of reused registers is restored. The stack frame is deleted - usually, by the last instruction in a delay slot of
Finally, control is given back to the caller by
Stack corruptions may overwrite a value corresponding to
places (unless this is a result of a deliberate security attack). This is likely to result in
close to it) or to both
"Epilogue" is not necessarily placed at the very end of a function. Moreover, a function may have more than one "epilogue".
One more thing before we get back to the analysis. Function calls look as follows:
this sequence corresponds to
0 (
the
The return value of a function, if any, is stored in
Let's get back to our analysis.
A few instructions above the actual call (see remarks) - at
being loaded with the content of
holds its initial value that corresponds to the 2nd argument of
Now we can update our working hypothesis. It looks like
being NULL.
Real-life shortcut: we may simply examine the source code of
to
What's next? We can do the same analysis for
Recall the remarks above regarding the validity of call traces. Basic sanity checks won't take much time. At the very least, check that
have been a valid
If it is the case, verify the code of (in this example)
Having disassembly intermixed with source code is helpful here [1].
In any case, there is obviously a limit as to how far "in the past" we would be able to look by analyzing a crash report even if we had a complete memory dump. Nevertheless, the results of this analysis - if not sufficient to reveal a root cause - are usually
very helpful in further debugging. As to this particular example, we would still need to analyze
understand why
This crash occurred in a kernel module for which no source code is available.
The load address of a kernel module is not known at build time, so we see relative addresses in the disassembly of
We can use
The instruction at
register:
Next step is to examine the flow of instructions to trace the source of the value in
not provided here, but what it revealed is that
still holds the 1st input argument of the function. Moreover, there are no explicit validity checks prior to its use. Thus, the 1st argument is expected to be a valid address.
The name of function,
nature, so the 1st argument is likely to be
disassembly).
Let's examine the caller,
There are 2 function calls here. The first one, at line
at line
The second one, at line
and
Can we say something specific about those arguments?
the 1st argument,
initialized with the return value,
line
this return value is passed as is, i.e. there are no validity check;
the 1st argument of
initilized with the same value loaded from
These observations suggest that the source code may look as follows:
In this particular case,
Further, a question regarding the nature of that
can be sent to a supplier of
questions for hard-to-reproduce problems, we can somewhat decrease the chances of having a (sometimes) default reply such as "please try reproducing it on our reference software and/or hardware".
Finally, let's consider a case where memory corruption is suspected.
Note that
but this is not a valid kernel-space address.
Let's examine the code at
The address of that function is taken from
which indeed contains
So what we have is a call through a function pointer that contains a bogus (corrupted?) value.
Let's try to figure out where
The instruction at
0, control is given to
Well, some knowledge of the kernel internals would be helpful here. In any case, the appearance of names such as
and
calls. The relevant code can indeed be found in
Can we guess what syscall it was?
nm shows that this value corresponds to
which is an array that contains addresses of all the system calls.
and the value of
is
The analysis of the source code of
a syscall number.
The syscall numbers are defined in
to
the system has been functioning properly for days prior to this crash. So where do we go from here?
Memory corruptions often result in seemingly unrelated crashes: both in kernel and user-space. What common though is that all these crashes may look "weird". That is, the careful analysis does not reveal any obvious problems with the code and, moreover, suggests
possible external influence, be it stack/memory corruption or hardware issues. Having multiple crashes in different parts of the core kernel code is usually a good indicator too.
Of course, it's always possible to overlook something. So the larger a set of crashes from which a conclusion is drawn, the better.
The strategy then is to look for common patterns.
In this particular case, there was another crash in the same location (among a dozen of crashes in yet other areas) where the syscall number and
These 2 slots are neighboring in
Now, what's about the content of
look them up with nmor from disassembly:
Is there another pattern? Yes, the only difference between good and bad values is in the high bit 0x80000000. ...
p.s. This missing-high-bit theory had explanatory power when applied to some of the other "weird" crashes for which it was possible to infer the good value. How could this bit be cleared? In the end, it has been found that DDR timing settings were not properly
set in the bootloader. However, as of the moment of this writing, it's not yet clear whether the problem has been completely resolved.
Perhaps, we can dedicate another post specifically to the analysis of "weird" crashes.
Many thanks to Yuri Leikind, Bero Brekalo, and Alina Krynina for review and useful suggestions.
source code on your own). Alternatively, you can reproduce the original binary (if possible) with debugging information enabled and then use
Be careful though to double-check that the addresses you are interested in correspond to the same instructions in both original and new disassembly files. If it's not the case, code shifts/changes should be taken into account.
Be sure to verify the options used by your toolchain, if in doubt. For gcc,
For example,
Please refer to MIPS Address Space for a general review.
Regarding the use in Linux:
1)
A private address space of user-space processes resides in this range. From kernel-space this area can be safely accessed only by means of special-purpose functions, like
Direct accesses are always a bug, even though, given the nature of MIPS's MMU, such accesses may appear to be working properly under certain circumstances.
2)
Dynamic allocations via general purpose allocators, such as
the
3)
return addresses in this range.
For
the translation of virtual addresses into physical ones is done via MMU. Conversely,
require MMU translations; the translation is done simply by stripping off the top-bit. For example,
to
only virtually contiguous.
Introduction
Tools
Format of a crash report
Analysis
Simple case to learn the basics
Crash in a binary kernel module
Suspected memory corruption
Extra Details
Introduction
The aim of this post is to illustrate the analysis of Linux kernel crashes by studying a few real-life examples. The examples are coming from a MIPS platform, but the general approach is applicable to other architectures.It's implied that readers have knowledge of C programming and of basic operating system concepts, like virtual memory.
We begin by analyzing a simple crash to illustrate the basics. Further, we reconstruct what happened in case of a crash in a kernel module that has no source code available. Finally, we consider a crash caused by "memory corruption".
The information provided here is by no means comprehensive. We take a minimalist approach and don't consider tools such as Kdump and crash.
Tools
Any general purpose disassembler is sufficient. We'll use objdump with '-d'option
here.
If a binary was built with debugging information,
objdump -Scan display source code intermixed with disassembly
[1]. Also,addr2line can be used to match addresses with source code file names and lines.
In order to interpret disassembly, we need to have the MIPS Instruction Reference and
the Compiler Register Usageinformation at hand [2], so please keep these
pages open while reading further material.
If you are not familiar with how the virtual memory space is divided on MIPS, please refer to 'Virtual Memory Layout' [3] in the last section.
Format of a crash report
Here is an example of a crash report:[code]CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 8023afd0, ra == 8023b024 Oops[#1]: Cpu 0 $ 0 : 00000000 1000fc00 8555fc54 00000000 $ 4 : 00000000 00000000 0000000b 00000001 $ 8 : 00000008 800445f4 00000000 00000000 $12 : 0000004f 0000004e 00000041 00000001 $16 : 00000000 8555fc54 0000000b 8555fc54 $20 : c01eded0 8555fbd0 80240000 7f7ff0c4 $24 : 00000002 c01d6edc $28 : 8555e000 8555fab0 7f7ff0a0 8023b024 Hi : 00000000 Lo : 3b9aca00 epc : 8023afd0 strlen+0x0/0x28 Tainted: PF ra : 8023b024 strlcpy+0x2c/0x7c Status: 1000fc04 IEp Cause : 00000008 BadVA : 00000000 PrId : 0000c401 (Fusiv MIPS1) Modules linked in: xt_CLASSIFY [ skipped proprietary (aka evil) modules ] ip6_tunnel tunnel6 Process controllerd (pid: 751, threadinfo=8555e000, task=8783add8, tls=00000000) Stack : 1000fc01 7f7ff0d0 8008f4a4 7f7ff268 8555fc5f 8555fc54 8023aff8 80044500 c01d6cc8 c01d6ab8 000000a4 7f7ff0c4 00000000 80050000 00000000 c026ca78 c026ca78 8555fb18 8555fbd0 7f7ff0d0 7f7ff174 7f7ff0d0 7f7ff0c0 86458400 000000a4 c01d4440 80631224 8026d168 87008838 8026ca84 00000000 00000001 00000006 00000001 80631224 806312c0 1000fc01 fffffffe 805e5778 805e0000 ... Call Trace: [<8023afd0>] strlen+0x0/0x28 [<8023b024>] strlcpy+0x2c/0x7c [<c01d6cc8>] contoller_get_info+0x2c4/0x37c [controller_lkm] [<c01d4440>] controller_init+0x3e0/0xa64 [controller_lkm] Code: 00000000 03e00008 01031023 <80820000> 0808ebfa 00801821 24630001 80620000 00000000
Let's review the parts one by one.
[code]CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 8023afd0, ra == 8023b024 Oops[#1]:
A header indicates a particular reason for this crash.
On
CPU #0a load or store instruction at address
epcaccessed
a virtual address
0x00000000. There was no valid virtual to physical address translation available - hence,
the crash. In the middle of the report the same virtual address is shown as:
[code] BadVA : 00000000
BadVAis a register of the MIPS Coprocessor 0 that describes a memory address at which address exception occurred.
Unable to handle kernel paging request
is by far one of the most common reasons for crashes. You may also encounter:
[code]Kernel bug detected
It is triggered by one of the sanity checks in the kernel code, such as
BUG()or
BUG_ON(condition).
This mechanism does what
assert()does for user-space applications.
Other reasons can be found by running
'grep -rn die_if_kernel arch/mips/'in a Linux kernel tree.
Further, the content of the registers is displayed.
[code]$ 0 : 00000000 1000fc00 8555fc54 00000000 $ 4 : 00000000 00000000 0000000b 00000001 [ ... ] $24 : 00000002 c01d6edc $28 : 8555e000 8555fab0 7f7ff0a0 8023b024 Hi : 00000000 Lo : 3b9aca00 [ ... ] Status: 1000fc04 IEp Cause : 00000008
Registers
$0-31are general purpose MIPS
registers. To simplify reading, each of them has a mnemonic name in assembler code. Now it's time to take a quick look at Compiler
Register Usage. For example,
a0-3correspond to
$4-7,
which are used in the o32 calling convention to pass the first 4 arguments to a function. o32 is
the most commonly used calling convention on 32bit MIPS [2] and our examples here relate to it.
An ideal case is to have a complete dump of the memory used by the kernel. That would allow us to restore the environment - to see the content of local variables, various kernel data structures, etc. Kdump can
do this (no MIPS support at the moment). Nevertheless, in many cases the content of the registers alone reveals enough information to understand a problem.
The content of Status and Cause registers may be very useful in some cases, but we won't
consider them here.
[code]epc : 8023afd0 strlen+0x0/0x28 ra : 8023b024 strlcpy+0x2c/0x7c
epcshows the address of the instruction that caused a crash.
ra ($31)contains the return address from the last function called prior to a crash. In practice,
rausually
points either to a caller of the function where
epcbelongs to or to the same function as
epc.
An invalid address in
racan indicate stack corruption (at least for non-leaf functions).
Names of the functions where
epcand
rabelong
to are displayed if the kernel was built with
CONFIG_KALLSYMSenabled. In any case, these names can be located
with
objdump.
The
+0x0/0x28notation stands for
+offset/size,
where
offsetis the offset of the instruction within a function it belongs to, and
sizeis
the size of this function.
In most cases,
rapoints to the 2nd instruction that follows an instruction representing a function call (usually
jalor
jalr).
For example,
[code] 801f9bb8 <pci_bus_read_config_byte>: [...] 801f9c14: 02202821 move a1,s1 801f9c18: 0040f809 jalr v0 801f9c1c: 24070001 li a3,1 801f9c20: 00408021 move s0,v0 801f9c24: 8fa20018 lw v0,24(sp)
A function call is at
0x801f9c18. Control gets back to
pci_bus_read_config_byteat
0x801f9c20.
jalrsaves
this address into
rabefore jumping to an address
v0-
the start of a called function.
The instruction at
0x801f9c1cis located in the delay
slot of
jalrand is executed before any instruction in the called function. Delay slots of
jalrare
often used to initialize one of the function arguments. If the called function above has at least 4 arguments, its 4th argument will be 1.
Branch instructions, like
bltz(branch on less than zero), are another example of instructions with delay
slots.
We mentioned earlier that both
epcand
ramay
point to the same function. To illustrate this case, let's suppose that a crash occurs at
0x801f9c24in the
above disassembly. Provided that a function call at
0x801f9c18took place,
rawould
point to
0x801f9c20inside the same function as
epc.
[code] Modules linked in: xt_CLASSIFY [ ... ] ip6_tunnel tunnel6
This is a list of loaded kernel modules.
[code] Process controllerd (pid: 751, threadinfo=8555e000, task=8783add8, tls=00000000)
This is information about a process that was running at the moment of a crash.
In an ideal world where kernels and, especially, kernel modules behave well, user-space actions can never trigger a kernel crash. No matter what these actions are. Kernel-mode tasks have full privileges to cause havoc though.
A user-space task appears here in one of the following cases:
it has triggered a kernel action (usually via a system call like
ioctl()) that, due to a kernel bug or memory
corruption, results in a crash.
a crash has occurred in the interrupt context. Unless your kernel supports threaded interrupt handlers (i.e. interrupts are handled by dedicated threads) or separate interrupt stacks, the kernel-space stack of a current task is used by the interrupt handling
code. The displayed task is usually unrelated in this case.
[code] Stack : 1000fc01 7f7ff0d0 8008f4a4 7f7ff268 8555fc5f 8555fc54 8023aff8 80044500 [ ... ] 00000006 00000001 80631224 806312c0 1000fc01 fffffffe 805e5778 805e0000 ...
This is a partial dump of the kernel-space stack of the current task, which we have discussed in the previous section.
[code] Call Trace: [<8023afd0>] strlen+0x0/0x28 [<8023b024>] strlcpy+0x2c/0x7c [<c01d6cc8>] controller_get_info+0x2c4/0x37c [controller_lkm] [<c01d4440>] controller_init+0x3e0/0xa64 [controller_lkm] [ ... ]
As the name suggests, this is a call trace.
It does not always represent a genuine call trace though. When
epcpoints to an invalid address on MIPS or
raw_show_traceis
enabled in the kernel command line, the so-called raw call trace is displayed. It contains all the values from stack that look like valid return addresses. So there can be 'ghost' traces of previously run and completely unrelated functions. For curious readers,
the implementation of both methods is in
show_backtrace()and
show_raw_backtrace()in
arch/mips/kernel/traps.c.
[code] Code: 00000000 03e00008 01031023 <80820000> 0808ebfa 00801821 24630001 80620000 00000000
Finally, this last section displays a sequence of instructions (binary representation) at and around
epc,
with the instruction at
epcbeing indicated by <> symbols.
Analysis
Simple crash to learn the basics
Let's now analyze the crash that was used as an example in the previous section.[code] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 8023afd0, ra == 8023b024 [...] epc : 8023afd0 strlen+0x0/0x28 ra : 8023b024 strlcpy+0x2c/0x7c
The kernel was built with
CONFIG_KALLSYMSenabled, and that allows us to see the names of functions where
the instruction at
epcand
rabelong
to. If it was not the case, we may note that both
epcand
rabelong
to
kseg0[3], so we may expect to find them inside the kernel image (vmlinux).
A quick sanity check for
ra(recall that property of
rawe
mentioned above):
[code] 8023aff8 <strlcpy>: [...] 8023b018: afbf0020 sw ra,32(sp) 8023b01c: 0c08ebf4 jal 8023afd0 <strlen> 8023b020: 00a08021 move s0,a1 8023b024: 00408821 move s1,v0 <=== 'ra' points to this location [...]
rais indeed one instruction away (delay slot) from
jal.
It's also clear that a function being called is
strlen(). In cases when the address of a called function
is not known at build time (kernel modules), disassembly may look as follows:
[code] b6d38: 3c020000 lui v0,0x0 b6d3c: 24420000 addiu v0,v0,0 b6d40: 0040f809 jalr v0
The first 2 instructions are changed by the loader at run time. In such cases, don't get confused when disassembly and the
Code:sequence
of a crash dump display different instructions at the same address. Usually, there is a remote resemblance though. For instance, the instructions above might have been changed as follows:
[code] b6d38: 3c02804a lui v0,0x804a b6d3c: 2442346c addiu v0,v0,13420
This basically corresponds to
v0 = 0x804a346c.
For 'epc',
[code] 8023afd0 <strlen>: 8023afd0: 80820000 lb v0,0(a0) <=== 'epc' is here 8023afd4: 0808ebfa j 8023afe8 <strlen+0x18> 8023afd8: 00801821 move v1,a0 8023afdc: 24630001 addiu v1,v1,1 [...]
we may make the following observations:
it's indeed the first instruction (offset 0x0) of
strlen();
a0 ($4)is indeed 0. The instruction at
epcloads
a byte from address
MEM[a0 + 0]and
BadVAis
reported to be 0. Hence,
a0should have been 0 too.
[code]$ 4 : 00000000 [...]
The instructions from disassembly match the ones shown in the
Code:sequence.
[code]Code: [...] <80820000> 0808ebfa 00801821 24630001 80620000 00000000
These checks can be also applied as sanity checks to ensure you have got the right image for disassembly.
Now,
a0is supposed to hold the 1st (and only) argument of
strlen()[Compiler
Register Usage]. Given that
epcpoints to the 1st instruction,
a0has
not yet been reused for anything else inside
strlen(). This can be verified by analyzing an instructions
flow. The current working hypothesis is that
strlen(s)has been called with
s == NULL.
Let's see if we can figure out where this
s == NULLis coming from by examining the caller of strlen -
strlcpy().
But before doing so, we should consider a few more aspects common to all functions.
At the beginning of most of the functions, there is a sequence of instruction called "prologue". For example,
[code] 800a28d0 <vfs_write>: 800a28d0: 27bdffd0 addiu sp,sp,-48 800a28d4: afb30024 sw s3,36(sp) 800a28d8: afb20020 sw s2,32(sp) 800a28dc: afb1001c sw s1,28(sp) 800a28e0: afb00018 sw s0,24(sp) 800a28e4: afbf0028 sw ra,40(sp) [...]
The first instruction creates a stack frame by reserving space on the stack. As per o32 calling
convention, it's a job of a called function to preserve non-temporaries registers (like $s-registers) if they are to be reused. This is what those
sw $reg, off(sp)instructions are concerned with - saving to-be-reused-registers to the stack. Same applies to
rafor
non-leaf functions (those calling other functions). Obviously, local variables also reside on this stack frame.
An "epilogue" sequence does the opposite actions.
[code] 800a2908: 8fbf0028 lw ra,40(sp) 800a290c: 8fb30024 lw s3,36(sp) 800a2910: 8fb20020 lw s2,32(sp) 800a2914: 8fb1001c lw s1,28(sp) 800a2918: 8fb00018 lw s0,24(sp) 800a291c: 03e00008 jr ra 800a2920: 27bd0030 addiu sp,sp,48
The content of reused registers is restored. The stack frame is deleted - usually, by the last instruction in a delay slot of
jr.
Finally, control is given back to the caller by
jr ra.
Stack corruptions may overwrite a value corresponding to
ra, resulting in the control being given to unexpected
places (unless this is a result of a deliberate security attack). This is likely to result in
"Unable to handle kernel paging request",
"Unaligned access", or
"Invalid instruction". Quite often in such cases
epcis equal to
ra(or
close to it) or to both
raand
BadVA.
"Epilogue" is not necessarily placed at the very end of a function. Moreover, a function may have more than one "epilogue".
One more thing before we get back to the analysis. Function calls look as follows:
[code] 800a2aa0: 00c03821 move a3,a2 800a2aa4: 00002021 move a0,zero 800a2aa8: 02202821 move a1,s1 800a2aac: 0c028848 jal 800a2120 <rw_verify_area> 800a2ab0: 02603021 move a2,s3
this sequence corresponds to
rw_verify_area(a0, a1, a2, a3)with
a0being
0 (
zerois register $0). The instructions initializing a0-a3 do not have to be placed immediately next to
the
jalinstruction, but usually they are in some proximity.
The return value of a function, if any, is stored in
v0(
$2).
Let's get back to our analysis.
[code] 8023aff8 <strlcpy>: 8023aff8: 27bdffd8 addiu sp,sp,-40 8023affc: afb3001c sw s3,28(sp) 8023b000: 00809821 move s3,a0 8023b004: 00a02021 move a0,a1 <=== the argument for strlen() 8023b008: afb20018 sw s2,24(sp) 8023b00c: afb10014 sw s1,20(sp) 8023b010: afb00010 sw s0,16(sp) 8023b014: 00c09021 move s2,a2 8023b018: afbf0020 sw ra,32(sp) 8023b01c: 0c08ebf4 jal 8023afd0 <strlen> <=== the call is here 8023b020: 00a08021 move s0,a1 8023b024: 00408821 move s1,v0 <=== 'ra' points here [...]
strlen()accepts a single argument that is passed via
a0.
A few instructions above the actual call (see remarks) - at
0x8023b004, we can see that
a0is
being loaded with the content of
a1. After examining the remaining instructions it becomes clear that
a1still
holds its initial value that corresponds to the 2nd argument of
strlcpy(dst, src, len).
Now we can update our working hypothesis. It looks like
strlcpy()has been called with its 2nd argument,
src,
being NULL.
Real-life shortcut: we may simply examine the source code of
strlcpy()and notice that there is a single call
to
strlen(). This is in accordance with our hypothesis indeed.
[code] size_t strlcpy(char *dest, const char *src, size_t size) { size_t ret = strlen(src); [...]
What's next? We can do the same analysis for
contoller_get_info()that has supposedly called
strlcpy().
[code] Call Trace: [<8023afd0>] strlen+0x0/0x28 [<8023b024>] strlcpy+0x2c/0x7c [<c01d6cc8>] controller_get_info+0x2c4/0x37c [controller_lkm] [...]
Recall the remarks above regarding the validity of call traces. Basic sanity checks won't take much time. At the very least, check that
0xc01d6cc8could
have been a valid
ra(one instruction away from
jalr/jal).
If it is the case, verify the code of (in this example)
controller_get_info()to confirm that it does call
strlcpy().
Having disassembly intermixed with source code is helpful here [1].
In any case, there is obviously a limit as to how far "in the past" we would be able to look by analyzing a crash report even if we had a complete memory dump. Nevertheless, the results of this analysis - if not sufficient to reveal a root cause - are usually
very helpful in further debugging. As to this particular example, we would still need to analyze
controller_get_info()to
understand why
strlcpy()might have been called with
src == NULL.
Crash in a binary kernel module
This crash occurred in a kernel module for which no source code is available.[code] CPU 1 Unable to handle kernel paging request at virtual address 00000000, epc == c1c52470, ra == c1c63d64 [...] $ 0 : 00000000 10008d00 c1c523f0 00000000 $ 4 : 00000000 c1f18f5c 0000008c ffff00fe $ 8 : 80008fe1 15941794 8e038b00 fefe7dfd $12 : faf9fdfe 7dfffe7f fb7eff7d 7b7e7e7c $16 : 00000000 00000800 c1e2d178 c1e2cf98 $20 : 842ffe08 c1e2d1b4 00000050 c1c51fc0 $24 : 00000000 00000000 $28 : 842fc000 842ffdf0 00000000 c1c63d64 [...] epc : c1c52470 fast_memcpy+0x80/0x1cc [binary_blob_module] Tainted: P ra : c1c63d64 net_egress+0x80/0x2b78 [binary_blob_module] [...]
The load address of a kernel module is not known at build time, so we see relative addresses in the disassembly of
binary_blob_module.
We can use
'+0x80/0x1cc'to locate the instruction at
epc:
[code] a53f0 <fast_memcpy>: [...] a5468: 98ab000f lwr t3,15(a1) a546c: 98af001f lwr t7,31(a1) a5470: a8880000 swl t0,0(a0) <== 'epc' points here at offset 0x80
The instruction at
epcaccesses
MEM[a0 + 0], so
a0should have been 0 to result in a memory access at
virtual address 0x00000000. We can confirm this by verifying the content of the
a0(
$4)
register:
[code] $ 4 : 00000000 [...]
Next step is to examine the flow of instructions to trace the source of the value in
a0. A full listing is
not provided here, but what it revealed is that
a0is used read-only. At
epcit
still holds the 1st input argument of the function. Moreover, there are no explicit validity checks prior to its use. Thus, the 1st argument is expected to be a valid address.
The name of function,
fast_memcpy(), suggests its
memcpy-like
nature, so the 1st argument is likely to be
dst(of course, this can be verified by a careful analysis of
disassembly).
Let's examine the caller,
net_egress().
[code] b6ce4 <net_egress>: [...] b6d38: 3c020000 lui v0,0x0 (1) b6d3c: 24420000 addiu v0,v0,0 (2) b6d40: 0040f809 jalr v0 (3) b6d44: 97a4001a lhu a0,26(sp) (4) b6d48: 97a6001a lhu a2,26(sp) (5) b6d4c: 00408021 move s0,v0 (6) b6d50: 00402021 move a0,v0 (7) b6d54: 3c020000 lui v0,0x0 (8) b6d58: 24420000 addiu v0,v0,0 (9) b6d5c: 0040f809 jalr v0 (10) b6d60: 8fa5001c lw a1,28(sp) (11) b6d64: 02202021 move a0,s1 <== 'ra' points here at offset 0x80 [...]
There are 2 function calls here. The first one, at line
(3), seems to have a single argument which gets initialized
at line
(4). Let's refer to this called function as
unknown_function.
The second one, at line
(10), takes 3 arguments that are initialized at lines
(7),
(5),
and
(11)respectively. Supposedly, this is a call of fast_memcpy() where the crash occurred.
Can we say something specific about those arguments?
the 1st argument,
a0, of
fast_memcpy()gets
initialized with the return value,
v0, of
unknown_function()at
line
(7);
this return value is passed as is, i.e. there are no validity check;
the 1st argument of
unknown_function()and the 3rd one of
fast_memcpy()get
initilized with the same value loaded from
26(sp)at lines
(4)and
(5).
These observations suggest that the source code may look as follows:
[code] dst = unknown_function(len); fast_memcpy(dst, src, len);
In this particular case,
unknown_function()returned NULL - hence, the crash.
Further, a question regarding the nature of that
unknown_function(), accompanied by the analysis of the crash,
can be sent to a supplier of
binary_blob_module. By submitting a more detailed report and asking concrete
questions for hard-to-reproduce problems, we can somewhat decrease the chances of having a (sometimes) default reply such as "please try reproducing it on our reference software and/or hardware".
Suspected memory corruption
Finally, let's consider a case where memory corruption is suspected.[code] CPU 0 Unable to handle kernel paging request at virtual address 0004349c, epc == 0004349c, ra == 80012224 Oops[#1]: Cpu 0 $ 0 : 00000000 7f99bcc0 00000069 2abc7ea0 $ 4 : 00000000 7f99bd20 7f99be60 00000001 $ 8 : 00000000 80000008 0004349c fffffff4 $12 : 7f99bd08 00000001 00000000 00000000 $16 : 2ab01000 2ab01000 7f99bdc8 00000000 $20 : 2aafc6a8 00410000 7f99bde0 7f99be60 $24 : 00000000 2abc7e80 $28 : 85248000 85249f30 0040484c 80012224 Hi : 0000ba1a Lo : ff98c506 epc : 0004349c 0x4349c Tainted: PF W ra : 80012224 stack_done+0x20/0x3c Status: 1100ff03 KERNEL EXL IE Cause : 10800008 BadVA : 0004349c PrId : 00019554 (MIPS 34Kc) [...] Process screen_plugin (pid: 799, threadinfo=85248000, task=87140038, tls=00000000) Stack : 2ab85040 00000000 00000001 00000000 00000000 00000000 00000000 7f99bcc0 [...] 00000001 00000000 2ac2d530 7f99bce8 7f99bd18 2aae83d4 0100ff13 00028675 Call Trace: Code: (Bad address in epc)
Note that
epc == BadVA. A CPU has tried to fetch an instruction at
0x0004349c,
but this is not a valid kernel-space address.
Let's examine the code at
ra:
[code] ra : 80012224 stack_done+0x20/0x3c 80012140 <handle_sys>: [...] 800121e0: 000240c0 sll t0,v0,0x3 800121e4: 3c098001 lui t1,0x8001 800121e8: 25292460 addiu t1,t1,9312 800121ec: 01284821 addu t1,t1,t0 800121f0: 8d2a0000 lw t2,0(t1) 800121f4: 1140005e beqz t2,80012370 <illegal_syscall> 800121f8: 8d2b0004 lw t3,4(t1) 800121fc: 05610040 bgez t3,80012300 <stackargs> 80012200: afa70080 sw a3,128(sp) 80012204 <stack_done>: 80012204: 8f880008 lw t0,8(gp) 80012208: 3c098000 lui t1,0x8000 8001220c: 35290008 ori t1,t1,0x8 80012210: 01094024 and t0,t0,t1 80012214: 15000016 bnez t0,80012270 <syscall_trace_entry> 80012218: 00000000 nop 8001221c: 0140f809 jalr t2 80012220: 00000000 nop 80012224: 2408fb92 li t0,-1134 <=== 'ra' points here
rais the valid return address for a function call at
0x8001221c.
The address of that function is taken from
t2(
$10),
which indeed contains
0x0004349c:
[code] $ 8 : 00000000 80000008 0004349c fffffff4
So what we have is a call through a function pointer that contains a bogus (corrupted?) value.
Let's try to figure out where
t2is coming from:
[code] 800121e0: 000240c0 sll t0,v0,0x3 800121e4: 3c098001 lui t1,0x8001 800121e8: 25292460 addiu t1,t1,9312 800121ec: 01284821 addu t1,t1,t0 800121f0: 8d2a0000 lw t2,0(t1) 800121f4: 1140005e beqz t2,80012370 <illegal_syscall>
The instruction at
0x800121f0corresponds to
't2 = *t1'and is followed by an instruction that compares
t2to 0. In case of
t2being
0, control is given to
illegal_syscall.
Well, some knowledge of the kernel internals would be helpful here. In any case, the appearance of names such as
illegal_syscall,
handle_sys,
and
syscall_trace_entrysuggest that the code in question has something to do with the handling of system
calls. The relevant code can indeed be found in
arch/mips/kernel/scall32-o32.S.
Can we guess what syscall it was?
[code] 800121e4: 3c098001 lui t1,0x8001 800121e8: 25292460 addiu t1,t1,9312
t1 = 0x80019312;
nm shows that this value corresponds to
sys_call_table,
which is an array that contains addresses of all the system calls.
[code] 800121ec: 01284821 addu t1,t1,t0
t1 = t1 + t0;
t0is the offset in the table, which is being calculated as follows:
[code] 800121e0: 000240c0 sll t0,v0,0x3
t0 = v0 * 8;
and the value of
v0(
$2)
is
0x69(
105decimal):
[code] $ 0 : 00000000 7f99bcc0 00000069 2abc7ea0
The analysis of the source code of
handle_sys(it's written in assembler) reveals that
v0represents
a syscall number.
The syscall numbers are defined in
arch/mips/include/asm/unistd.h. The one we are interested in corresponds
to
sys_getitimer():
[code] #define __NR_getitimer (__NR_Linux + 105)
sys_call_tableis not modified at run time. Also, it was not the first and only call to
sys_gettimer()-
the system has been functioning properly for days prior to this crash. So where do we go from here?
Memory corruptions often result in seemingly unrelated crashes: both in kernel and user-space. What common though is that all these crashes may look "weird". That is, the careful analysis does not reveal any obvious problems with the code and, moreover, suggests
possible external influence, be it stack/memory corruption or hardware issues. Having multiple crashes in different parts of the core kernel code is usually a good indicator too.
Of course, it's always possible to overlook something. So the larger a set of crashes from which a conclusion is drawn, the better.
The strategy then is to look for common patterns.
In this particular case, there was another crash in the same location (among a dozen of crashes in yet other areas) where the syscall number and
epcwere
0x68and
0x000430d8correspondingly.
[code] syscall 0x69 (sys_getitimer) and epc: 0x0004349c syscall 0x68 (sys_setitimer) and epc: 0x000430d8
These 2 slots are neighboring in
sys_call_table. Maybe it's just a coincidence but is worth taking into account.
Now, what's about the content of
epc? Do we actually know how the correct values would look like? We can
look them up with nmor from disassembly:
[code] 8004349c <sys_getitimer>: 800430d8 <sys_setitimer>:
Is there another pattern? Yes, the only difference between good and bad values is in the high bit 0x80000000. ...
p.s. This missing-high-bit theory had explanatory power when applied to some of the other "weird" crashes for which it was possible to infer the good value. How could this bit be cleared? In the end, it has been found that DDR timing settings were not properly
set in the bootloader. However, as of the moment of this writing, it's not yet clear whether the problem has been completely resolved.
Perhaps, we can dedicate another post specifically to the analysis of "weird" crashes.
Many thanks to Yuri Leikind, Bero Brekalo, and Alina Krynina for review and useful suggestions.
Extra Details
[1] objdump
'objdump -d'alone may be sufficient in many cases (not to mention all the fun of matching disassembly and
source code on your own). Alternatively, you can reproduce the original binary (if possible) with debugging information enabled and then use
'-dS'.
Be careful though to double-check that the addresses you are interested in correspond to the same instructions in both original and new disassembly files. If it's not the case, code shifts/changes should be taken into account.
[2] Calling conventions
Be sure to verify the options used by your toolchain, if in doubt. For gcc, '-mabi=type'options are used.
For example,
'-mabi=32'corresponds to o32.
[3] Virtual Memory Layout on MIPS
Please refer to MIPS Address Space for a general review.Regarding the use in Linux:
1)
kusegrange
[0x00000000, 0x80000000)is user-space addresses.
A private address space of user-space processes resides in this range. From kernel-space this area can be safely accessed only by means of special-purpose functions, like
copy_to_user()and
copy_from_user().
Direct accesses are always a bug, even though, given the nature of MIPS's MMU, such accesses may appear to be working properly under certain circumstances.
2)
kseg0range
[0x80000000, 0xa0000000)is kernel-space addresses used by the kernel code and data (vmlinux).
Dynamic allocations via general purpose allocators, such as
kmalloc() and __get_free_pages()(but not from
the
ZONE_HIGHMEMzone) return addresses in this range. [ to-be-continued ]
3)
kseg2range
[0xc0000000, 0xffffffff)is kernel-space addresses used by the code and data of kernel modules.
vmalloc()and
vmap()allocations
return addresses in this range.
For
kusegand
kseg2,
the translation of virtual addresses into physical ones is done via MMU. Conversely,
kseg0addresses don't
require MMU translations; the translation is done simply by stripping off the top-bit. For example,
0x80100000corresponds
to
0x00100000 (1 MB)in RAM.
kseg0ranges are both virtually and physically contiguous, while
kusegand
kseg2are
only virtually contiguous.
相关文章推荐
- 【Linux编程】IO复用之poll详解
- 在Linux中升级Python
- Linux socket本地进程间通信之UDP
- Linux命令
- GNU Linux解析域名的三种命令及用法
- Linux(12.1-12.6)学习笔记
- Linux(12.1-12.6)学习笔记
- Linux + C + Epoll实现高并发服务器(线程池 + 数据库连接池)
- 在CentOS里使用MySQL Connector/C++
- 在CentOS里使用MySQL C API
- Linux下程序中获得对应的密码
- 常用linux系统分析工具总结
- Linux下hosts、host.conf、resolv.conf的区别
- 所有操作系统下载地址Windows Redhat CentOS
- 嵌入式Linux-linux连接脚本
- linux signal(1)
- express 在centos上运行(安装node)
- Centos编译Hadoop 2.x 源码
- Linux iptables常见防护攻击措施
- ftp vsftpd服务器安装(云虚拟机Centos 7.0)