您的位置:首页 > 运维架构 > Linux

[MEMO] Linux Kernel Debugging Training 琐碎(补充ing)

2016-02-28 19:35 746 查看
1. 网址:
https://www.kernel.org/
http://lxr.oss.org.cn/ 搜索kernel 代码

2. Yum install kernel-devel-`uname -r` 安装本机版本的kernel

3. /boot
查看该目录,注意三个文件
ls -lh /boot/
总用量 83M
-rw-r--r--. 1 root root 157K 10月  5 23:58 config-4.2.3-300.fc23.x86_64
-rw-rw-r--. 1 root root  19M 2月  26 15:28 initramfs-4.2.3-300.fc23.x86_64.img
-rwxr-xr-x. 1 root root 5.8M 10月  5 23:59 vmlinuz-4.2.3-300.fc23.x86_64

The kernel is in the form of vmlinuz-xxxx
The initial ram disk file system - initramfs
The kernel configuration settings - config-xxx
file vmlinuz-4.2.3-300.fc23.x86_64
vmlinuz-4.2.3-300.fc23.x86_64: Linux kernel x86 boot executable bzImage, version 4.2.3-300.fc23.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.o, RO-rootFS, swap_dev 0x5, Normal VGA

3.1 boot loader
Used for arguments or for multiple kernel choices (different OSes)
User mode processes
|
- BIOS/EFI POST —> Boot device  —> Sector 0 —> Kernel (standard boot) —> PID 1 (init/systemd)
|                         |
Boot Loader —> arguments, UI


4. 内存地址

4.1 user mode & kernel mode

user mode -> kernel mode :

1) by system_call

2) by do_IRQ: interrupt.

$$ 可以是别的进程ID
cat /proc/$$/maps
55850f08f000-55850f187000 r-xp 00000000 fd:00 131057                     /usr/bin/bash
55850f386000-55850f38a000 r--p 000f7000 fd:00 131057                     /usr/bin/bash
55850f38a000-55850f393000 rw-p 000fb000 fd:00 131057                     /usr/bin/bash
55850f393000-55850f398000 rw-p 00000000 00:00 0
558510032000-55851015a000 rw-p 00000000 00:00 0                          [heap]
7fe625710000-7fe62571b000 r-xp 00000000 fd:00 139665                     /usr/lib64/libnss_files-2.22.so
7fe62571b000-7fe62591a000 ---p 0000b000 fd:00 139665                     /usr/lib64/libnss_files-2.22.so
7fe62591a000-7fe62591b000 r--p 0000a000 fd:00 139665                     /usr/lib64/libnss_files-2.22.so
7fe62591b000-7fe62591c000 rw-p 0000b000 fd:00 139665                     /usr/lib64/libnss_files-2.22.so
7fe62591c000-7fe625922000 rw-p 00000000 00:00 0
7fe625922000-7fe62c275000 r--p 00000000 fd:00 135794                     /usr/lib/locale/locale-archive
7fe62c275000-7fe62c42c000 r-xp 00000000 fd:00 139184                     /usr/lib64/libc-2.22.so
7fe62c42c000-7fe62c62c000 ---p 001b7000 fd:00 139184                     /usr/lib64/libc-2.22.so
7fe62c62c000-7fe62c630000 r--p 001b7000 fd:00 139184                     /usr/lib64/libc-2.22.so
<span style="font-family:Arial, Helvetica, sans-serif;">...</span>

user mode 进程地址空间是 0x0- 0x7FFFFFFFFFFF ->(47 bits, not 64,系统是64位)
为什么是47 而不是64 bit?
64 - sizeof(void*) . The actual number of bits usable are only 48 of which we use 47 bit (128TB of VM).
48 bits = 256 TB. the bottom 128TB => user program. The Top -> kernel.

Kernel is a shared memory region with plenty of low level code to do the functions we have described
This is what we call the MONOLITHIC architecture (all kernel code resides in one shared address space)
User mode processes get addresses from 0x0-0x7FFFFFFFFFFF — > 47 bits! (NOT 64!)
+———————————+ 0xFFFFFFFFFFFFFFFFF
|             KERNEL               |   -  text kernel + drivers (modules)
|                      SPACE       |       - data of the kernel
+———————————-+ <span style="color:#ff00;"><span style="font-weight: bold;"> 0xFFFFFFFF8xxxxxxx</span></span>
|      RES        |  |    RES      |
|                  |  |              |
+—————+  +————+  0x7FFFFFFFFFFF
|                                      <span style="font-family:Arial, Helvetica, sans-serif;">|</span>
|         P1          |   |      P2      |
|                     |   |                |
|                     |   |                |
|                     |   |                |
|                     |   |                |
|                     |   |                |
+————— +  +———— +  0x000000000000


0xffffffff8xxxxxxx -> in kernel code
0xfffffffffaxxxxxxx -> in module code
0xffff88xxxxxxxx -> in kernel data

5. 查看kernel 函数的内存地址
查看虚拟内存地址,先enable kptr
echo 0 >  /proc/sys/kernel/kptr_restrict
more  /proc/<span style="color:#ff00;">kallsyms </span>|grep sys_open
ffffffff8121d1b0 T do_sys_open
ffffffff8121d3e0 T sys_open
ffffffff8121d400 T sys_openat
ffffffff81270a50 T compat_sys_open
ffffffff81270a70 T compat_sys_openat
ffffffff81271380 T compat_sys_open_by_handle_at
ffffffff8127c610 T sys_open_by_handle_at
ffffffff81292830 t proc_sys_open
ffffffff81cd1600 d sys_open_test


/proc/kallsyms是个非常重要且有用的文件,列出了每个内核函数的虚拟内存地址。

中间的 symbol type分别有以下的含义:

A:Global absolute symbol.

a:Local absolute symbol.

U:Undefined symbol.

B:Global "bss" (that is, uninitialized data space) symbol.b:Local bss symbol.D:Global data symbol.d:Local data symbol.T:Global text symbol.t:Local text symbol.

cat /proc/iomem |grep Kernel ,查看 ram中kernel的物理内存地址。kernel physical footprint .
cat /proc/iomem |grep Kernel
01000000-0177daf6 : Kernel code
0177daf7-01d3c1ff : Kernel data
01ebf000-02040fff : Kernel bss


6. 内核模块
ls /usr/lib/modules/`uname -r`/kernel/
arch  crypto  drivers  fs  kernel  lib  mm  net  security  sound
列出已加载的模块
lsmod
Module                  Size  Used by
bnep                   20480  2
bluetooth             483328  5 bnep
rfkill                 24576  3 bluetooth
fuse                   94208  2
xt_CHECKSUM            16384  1
ipt_MASQUERADE         16384  3
nf_nat_masquerade_ipv4    16384  1 ipt_MASQUERADE
...<span id="transmark"></span>

数字表示使用该module的 modules的个数
modprobe/ insmod modulex,加载 module,区别在于 insmod 可能需要加载依赖的module, 但是modprobe可以自动帮忙加载依赖module
对应的移除module命令是: rmmod
7. 内核调度

7.1 查看CPU
less /proc/cpuinfo 查看cpu信息,不是指的物理CPU,而是逻辑CPU,可能是hyperthread/core。
- Physical CPU
- Core on a physical CPU
- Hyperthread on a physical CPU/ inside a core.
For linux - they are all the same.
通过siblings 和 cpu cores 属性可以确定processor本身的属性(core or hyper thread or physical cpu)。
less /proc/cpuinfo
<span style="color:#ff00;">processor</span>       :<span style="color:#ff00;"> 0</span>
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
stepping        : 7
microcode       : 0x70d
cpu MHz         : 2000.000
cache size      : 15360 KB
physical id     : 0
<span style="color:#ff00;">siblings        </span>: 2
core id         : 0
<span style="color:#ff00;">cpu cores</span>       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm ida arat pln pts dtherm xsaveopt
bugs            :
bogomips        : 4000.00
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
stepping        : 7
microcode       : 0x70d
...


less /proc/interrupts 查看中断
less /proc/$$/sched 查看thread的vruntime 值等。
less /proc/$$/sched
bash (3476, #threads: 1)
-------------------------------------------------------------------
se.exec_start                                :      16471626.504582
se.<span style="color:#ff0000;">vruntime                                  </span>:           777.258893
se.sum_exec_runtime                          :           188.600606
se.statistics.sum_sleep_runtime              :      15290929.354044
se.statistics.wait_start                     :             0.000000
se.statistics.sleep_start                    :      16471626.504582
se.statistics.block_start                    :             0.000000
se.statistics.sleep_max                      :       6655769.432574
se.statistics.block_max                      :            16.151335
se.statistics.exec_max                       :             1.004242
se.statistics.slice_max                      :             0.464383
se.statistics.wait_max                       :             2.005330
se.statistics.wait_sum                       :            20.006769
se.statistics.wait_count                     :                 1022


7.2 任务调度策略

1. PREEMPTIVE Multitasking 抢占式
- The system will automatically kick out a thread when a certain time slice (QUANTUM) expires.
—> the system will keep time, and when a certain time arrives —> no matter where thread is
we stop it.
Problem: if a thread is executing on CPU - how can we kick it out?
Solution: use interrupts
- whenever an interrupt occurs - does not matter user/kernel context - we immediately jump to kernel mode to interrupt handler.
Problem: Interrupts are unpredictable.
Solution : TIMER
2.定时器 Timer
是可预测的中断 predictable interrupt HZ times/sec. (#define HZ - 100, 250, 300, 1000)
—— ——
less interrupts more interruptions
longer gaps (10ms) smaller gaps (1ms)
less context switch more context switch
From 2.6.21 - Shut down the timer when not required. 该技术被称为dyntick,dyntick就是在系统空闲的时候,彻底停止时间中断,
避免cpu空转,它会带来节能方面的好处。
Q:系统如何知道什么时候不需要timer?
A: ps -aux 查看进程的状态,当所有进程都处于sleeping状态时,timer(仅指CPU调度器timer)就会被关闭。
<span style="color:#111111;"> ps -aux
Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html USER       PID %CPU %MEM    VSZ   RSS TTY      </span><span style="color:#ff0000;">STAT</span><span style="color:#111111;"> START   TIME COMMAND
root         1  0.0  0.1  31652  3160 ?        Ss    2015   0:02 /sbin/init
root         2  0.0  0.0      0     0 ?        S     2015   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S     2015   0:48 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S     2015   0:00 [kworker/u:0]
root         6  0.0  0.0      0     0 ?        S     2015   0:00 [migration/0]
root         7  0.0  0.0      0     0 ?        S     2015   0:22 [watchdog/0]
</span>


best of both worlds - when we need interrupts - we start timer
when we don’t need interrupts - we stop timer altogether -> we can hibernate CPU

3. CFS(Completely Fair Scheduler) 参考CFS 算法描述
CFS是一种 O(lg n)的调度算法,使用红黑树来按照vruntime值维护task 数据结构,每次都选择vruntime最小的优先调度执行。

vruntime= f ( priority ↓, sleep time ↓, run time ↑)

那么在CFS下:

1.高priority的 task 仍然能够拿到larger slices

2.schedules CPU-starved threads first

3. low priority threads get smaller sclices but at least execute.

8. IO Scheduler

9. 编写自己的第一个内核module

10 常规调试技巧

/proc/kallsyms, dmesg, /proc/kcore

11.内核编译

12. kprobe

13. ftrace
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: