您的位置:首页 > 运维架构 > Linux

Kprobes—insight into the Linux kernel—replace kernel function with module

2016-02-01 00:00 806 查看
http://stackoverflow.com/questions/1196944/can-i-replace-a-linux-kernel-function-with-a-module

http://www.redhat.com/magazine/005mar05/features/kprobes/


Gaining insight into the Linux? kernel with Kprobes


by William Cohen

Introduction

Kprobes

Examples

The future

Further reading

About the author


Introduction

Many times kernel developers have resorted to using the "diagnostic print
statements" approach to understand what is occurring in the Linux
kernel. This technique can be painful because a new kernel must be built
and installed on the machine. The machine must then be rebooted with the
new kernel. Each new experiment requires another reboot of the machine,
which could take minutes on some machines.

Developers have found the ability to inspect the operation of unmodified
executables to be very useful. In the case of userspace applications
developers can use debuggers to set breakpoints at specific locations in
the unmodified executable. When the processor encounters a breakpoint the
developer uses the debugger to inspect program state to gain insight into
how the program is operating (or failing). There are advantages to this
method of examining the program operation over the traditional technique
of compiling "diagnostic print statements" into the program:

The developer does not change the source code of the original
program.

The developer avoids unintended changes caused by rebuilding the
executable.

The developer can avoid the expense of recompiling the program and
restarting the program each time something else is examined. In some
cases it may not be feasible for the developer to rebuild the
application.

Due to interrupt handling it is not feasible to completely stop the Linux
kernel and wait for the developer to type in commands. However, it is
possible to place snippets of instrumentation code in the kernel to
collect information at specific locations to determine whether a specific
function is being executed and state of variables. The recent 2.6 Linux
kernels, including the x86 kernel in the upcoming Fedora Core 4, have
support to allow developers to gather information about the Linux kernel's
operation without compiling or booting a new kernel. This is implemented
with Kprobes, a dynamic instrumentation system. This article describes
how Kprobes operate and provides kernel instrumentation examples.


Kprobes

Kprobes is a dynamic instrumentation system in the mainline 2.6 Linux
kernel and will be enabled in the soon to be released x86 Fedora Core 4
kernels. Kprobes allows one to gather additional information about kernel
operation without recompiling or rebooting a kernel. Kprobes enables
locations in the kernel to be instrumented with code, and the
instrumentation code runs when the processor encounters that probe
point. Once the instrumentation code completes execution, the kernel
continues normal execution.

The Kprobes instrumentation is built as a kernel module. Thus, rather than
having to recompile and reboot the system with an instrumented kernel, a
kprobe instrumentation module can be written, compiled, and loaded on the
system. There is no need to reboot the system. Once the instrumentation
module has served its purpose, it can be unloaded, and the kernel returned
to its normal operation.

There are two types of kernel probes available: kprobes and jprobes. A
kprobe inserts a probe at a specific instruction. The instrumentation
provided by a kprobe could be inserted anywhere in a function, thus the
kprobe code cannot make assumptions about local variables or arguments
passed into the function being probed. A jprobe instruments the entry of
a function and allows the probe to examine the arguments passed into the
probed function.

The kprobe support in the kernel provides simple data structures and a set
of functions to allow the insertion and removal of kernel probes. A data
structure is filled out and registered with a call to either the register_kprobe or register_jprobe function. The data
structure passed to the register function must remain allocated until the
kernel probe is unregistered with either a matching unregister_kprobe or unregister_jprobe. Table 1, Kernel probes management functions” summarizes the functions used to register
and unregister the probes. The register functions return zero if the
operation was successful and a negative value if the operation was
unsuccessful.

int register_kprobe(struct kprobe *p);
int register_jprobe(struct jprobe *p);
void unregister_kprobe(struct kprobe *p);
void unregister_jprobe(struct jprobe *p);
Table 1. Kernel probes management functions

Listing 1, kprobe data structure shows the fields of struct kprobe. The addr field is the linear address of the
instruction being probed. The developer needs to determine the appropriate
address for addr. In the examples in this
article the address was an exported function and could be placed in the
code. In other cases the you may have to examine the System.Map file or the disassembled kernel code to
find the appropriate value for the address. The pre_handler field is a function pointer
to the function run before the execution of the probed instruction. The post_handler field is a function pointer
to the function executed following the execution of the instruction. The fault_handler field is a pointer to the
function to run if there is a fault during the execution of the probe
code.

struct kprobe {
/* elided fields for internal state information */

kprobe_opcode_t *addr;
kprobe_pre_handler_t pre_handler;
kprobe_post_handler_t post_handler;
kprobe_fault_handler_t fault_handler;

/* elided fields for internal state information */
};


Listing 1. kprobe data structure

The jprobe is built on top of the basic kprobe. The jprobes simplify the
instrumentation of function entries and allow one to inspect the arguments
passed to the function. The struct jprobe contains a struct kprobe for the kprobe information related
to the jprobe. There are two pieces of information that need to entered
into the struct jprobe: the entry field
which points to the instrumentation function that has the same arguments
list as the instrumented function and the addr field in kp. The other fields in the
struct kprobe are filled out when the
jprobe is registered.

struct jprobe {
struct kprobe kp;
kprobe_opcode_t *entry; /* probe handling code to jump to */
};


Listing 2. jprobe data structure

The execution of a kprobe has similarities to the execution of a
breakpoint set by a debugger. The instruction at the kernel probe location
is saved in a buffer, and the instruction at that location is replaced by
an breakpoint instruction. When the processor encounters the breakpoint,
the trap handler is invoked. A check is made to determine whether there is
a kprobe registered at this location. If there is no probe registered for
that location, the breakpoint is passed on to the normal handler. If a
probe is found, the pre_handler function
is executed, the probed instruction is executed, then the post_handler function is executed. The
execution resumes at the instruction following the probed instruction.


Examples

This article contains two examples: one example using a kprobe and the
other example using a jprobe. Most all of the block device I/O goes
through the function generic_make_request. It is useful to
instrument generic_make_request to
observe its operation. Both examples instrument the generic_make_request function.

You need to have the kernel-devel RPM matching the
running kernel installed to build these examples. Listing 3, Makefile shows the simple makefile used to build the
instrumentation modules after the kernel-devel RPM
has been installed. There are two source files in the directory: kprobebio.c and jprobebio.c. In
conjunction with the makefile supplied by kernel-devel, this makefile creates kprobebio.ko and jprobebio.ko,
the kernel modules.

Assuming that the kernel-devel RPM matching the
running kernel is installed, you can create the modules with the following
command:

make  -C /lib/modules/`uname -r`/build M=`pwd` modules


Kprobe example

The kprobe example kprobebio.c in Listing 4, kprobebio.c demonstrates how to counts the number of times the generic_make_request function is
called. Since the kprobe is a module, the instrumentation is inserted
when the module is loaded. When the instrumentation is removed, the
results of the instrumentation are written to /var/log/messages by a printk in
this example. Other means of extracting the data are possible.

The include for linux/kprobes.h contains the needed
data structures for kprobes and jprobes. The include for linux/blkdev.h declares the function generic_make_request, which is needed
to put the probe in the correct location.

The inst_generic_make_request function
is the instrumentation function that is called each time the generic_make_request function is
called. Normally, as in this case, the instrumentation function returns
a value of 0 to indicate that instrumented instruction should be handled
normally.

The function init_module sets up the
kprobe data structure and starts the instrumentation. There is only an
instrumentation function to execute before the executed instructions: pre_handler. Thus, the post_handler and fault_handler are set to NULL. The
address of the instrumented function is set in kp.addr. The data structure is
registered via register_kprobe. After
the register_kprobe, the
instrumentation is operating and counting the number of times that generic_make_request is called. The cleanup_module unregisters the probe
and then writes the data to /var/log/messages via a printk.

The instrumentation is started as root with the following command:

/sbin/insmod kprobebio.ko


The instrumentation is shutdown as root with the following command:

/sbin/rmmod kprobebio


When the module is unloaded, the data is written to /var/log/messages. Listing 5, Output of kprobebio module in /var/log/messages show the output from this particular
example.

Feb 23 12:09:20 slingshot kernel: kprobe registered
Feb 23 12:09:31 slingshot kernel: kprobe unregistered
Feb 23 12:09:31 slingshot kernel: generic_make_request() called 52 times.


Listing 5. Output of kprobebio module in /var/log/messages


Jprobe example

Another useful mechanism provided by Kprobes support is Jprobes. Jprobes
allow instrumentation of the function entry and access to the arguments
passed into the instrumented function. Listing 6, jprobebio.c shows the the code to generate instrumentation that counts the number of
times that generic_make_request is
called. The example in Listing 6, jprobebio.c also accumulates the
number of sectors moved in the requests and keeps track of the calls and
sectors on a per-device basis.

The linux/bio.h is included to describe the data
structure used by generic_make_request. This is required
because the instrumentation function inst_generic_make_request now has the
same arguments as the original generic_make_request function. These
arguments can be accessed inside the instrumentation function. For this
example the bio pointer is examined to determine the device for which
the request is being made and the number of sectors being transfered. A
simple hash table is implemented to separate the data for the different
devices.

Another significant difference between kprobes and jprobes is how the
instrumentation function is exited. In a jprobe there needs to be an
explicit jprobe_return rather than a
kprobe function's return 0;.

A jprobe uses a struct jprobe to describe the instrumentation
point. In this example the entry is made to point to the inst_generic_make_request function. In init_module the kprobe field in the jprobe struct is initialized to point at the function
being instrumented, generic_make_request. The other
fields in the kprobe field are set up appropriately for the jprobe when the register_jprobe function is
called.

When the module is removed from the kernel, cleanup_module is executed. This unregisters the
probe and prints out the recorded data much in the same way that the
earlier kprobe example operates. Like the kprobebio example,
jprobebio module instrumentation is started when it is loaded into the
kernel with an insmod command and writes out the
data when the module is removed with an rmmod command. Listing 7, Output of the jprobebio module in /var/log shows the output of
the module in /var/log/messages.

Feb 23 13:55:01 slingshot kernel: plant jprobe at c024f900, handler addr e09e4000
Feb 23 13:55:02 slingshot crond(pam_unix)[5969]: session closed for user root
Feb 23 13:55:21 slingshot kernel: jprobe unregistered
Feb 23 13:55:21 slingshot kernel: generic_make_request() called 119 times for 952 sectors.
Feb 23 13:55:21 slingshot kernel: bdev 0xcb199da8 (3,5) 26 208 sectors.
Feb 23 13:55:21 slingshot kernel: bdev 0xdf00eda8 (3,2) 93 744 sectors.


Listing 7. Output of the jprobebio module in /var/log


The future

The examples in this article show how to write simple instrumentation
using the Kprobes support in the Fedora Core 4 kernels. However, one might
notices that the instrumentation is written in raw C code, and it is quite
possible to crash the machine if the instrumentation code has a flaw in
it. The Kprobes mechanism is also a very low-level interface that simply
places individual probes where directed. There is no predefined library
that selects groups of probe points to measure things that a regular user
might be interested in. Thus, currently Kprobes requires a good
understanding of the kernel to know which locations in the kernel to
instrument to get data and to perform analysis on the collected data to
produce a meaningful result.

An effort has started to address these deficiencies in the current kprobe
instrumentation: SystemTap. SystemTap will provide a safer language for
writing the instrumentation and a library of useful instrumentation.


Further reading

The Linux Kernel Archives

Kernel
debugging with Kprobes

SystemTap


About the author

William Cohen is a performance tools engineer at Red Hat, Inc. Will
received his BS in electrical engineering from the University of
Kansas. He earned a MSEE and a PhD from Purdue University. In his spare
time he bicycles and takes pictures with his digital cameras.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: