您的位置:首页 > 运维架构 > Linux

Linux kernel_thread 的一些注意事项

2011-03-31 23:23 169 查看
 

在x86_64平台上,很多内核函数的入口定义都在entry_64.S中。

kernel_thread定义如下:

 

/*
* Create a kernel thread.
*
* C extern interface:
*	extern long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
*
* asm input arguments:
*	rdi: fn, rsi: arg, rdx: flags
*/
ENTRY(kernel_thread)
CFI_STARTPROC
FAKE_STACK_FRAME $child_rip
SAVE_ALL
# rdi: flags, rsi: usp, rdx: will be &pt_regs
movq %rdx,%rdi
orq  kernel_thread_flags(%rip),%rdi
movq $-1, %rsi
movq %rsp, %rdx
xorl %r8d,%r8d
xorl %r9d,%r9d
# clone now
call do_fork
movq %rax,RAX(%rsp)
xorl %edi,%edi
/*
* It isn't worth to check for reschedule here,
* so internally to the x86_64 port you can rely on kernel_thread()
* not to reschedule the child before returning, this avoids the need
* of hacks for example to fork off the per-CPU idle tasks.
* [Hopefully no generic code relies on the reschedule -AK]
*/
RESTORE_ALL
UNFAKE_STACK_FRAME
ret
CFI_ENDPROC
END(kernel_thread)


 

其中的FAKE_STACK_FRAME宏,虚构了一个中断堆栈信息,意思就是:中断返回后的执行第一条指令就是child_rip。

然后就是准备参数寄存器,调用do_fork这跟传统的穿件进程与线程一样的。主线程返回时,把创建的伪中断堆栈恢复,然后继续正常执行。

 

而子进程则返回到child_rip处,跟父进程的执行路径不一样,并不恢复伪堆栈的信息。执行路径的不同主要由以下几段代码来控制:

copy_thread中的一段:

p->thread.sp = (unsigned long) childregs;
p->thread.sp0 = (unsigned long) (childregs+1);
p->thread.usersp = me->thread.usersp;
set_tsk_thread_flag(p, TIF_FORK);
p->thread.fs = me->thread.fs;
p->thread.gs = me->thread.gs;
savesegment(gs, p->thread.gsindex);
savesegment(fs, p->thread.fsindex);
savesegment(es, p->thread.es);
savesegment(ds, p->thread.ds);
 

以及switch_to宏:

#define switch_to(prev, next, last) /
asm volatile(SAVE_CONTEXT					  /
"movq %%rsp,%P[threadrsp](%[prev])/n/t" /* save RSP */	  /
"movq %P[threadrsp](%[next]),%%rsp/n/t" /* restore RSP */	  /
"call __switch_to/n/t"					  /
".globl thread_return/n"					  /
"thread_return:/n/t"					  /
"movq "__percpu_arg([current_task])",%%rsi/n/t"		  /
__switch_canary						  /
"movq %P[thread_info](%%rsi),%%r8/n/t"			  /
"movq %%rax,%%rdi/n/t" 					  /
"testl  %[_tif_fork],%P[ti_flags](%%r8)/n/t"	  /
"jnz   ret_from_fork/n/t"					  /
RESTORE_CONTEXT						  /
: "=a" (last)					  	  /
__switch_canary_oparam					  /
: [next] "S" (next), [prev] "D" (prev),			  /
[threadrsp] "i" (offsetof(struct task_struct, thread.sp)), /
[ti_flags] "i" (offsetof(struct thread_info, flags)),	  /
[_tif_fork] "i" (_TIF_FORK),			  	  /
[thread_info] "i" (offsetof(struct task_struct, stack)),   /
[current_task] "m" (per_cpu_var(current_task))		  /
__switch_canary_iparam					  /
: "memory", "cc" __EXTRA_CLOBBER)
 

在创建子进程时,将thread_info中的flag TIF_FORK标志位置为1。子进程创建后,并不会立即调度执行,而是通过父进程让其调度执行后才被调度。这时在switch_to宏中就会对TIF_FORK标志位判断,是否为新创建的进程,如果是,就跳到ret_from_fork进行执行。

代码如下

/*
* A newly forked process directly context switches into this address.
*
* rdi: prev task we switched from
*/
ENTRY(ret_from_fork)
DEFAULT_FRAME
LOCK ; btr $TIF_FORK,TI_flags(%r8)
push kernel_eflags(%rip)
CFI_ADJUST_CFA_OFFSET 8
popf					# reset kernel eflags
CFI_ADJUST_CFA_OFFSET -8
call schedule_tail			# rdi: 'prev' task parameter
GET_THREAD_INFO(%rcx)
RESTORE_REST
testl $3, CS-ARGOFFSET(%rsp)		# from kernel_thread?
je   int_ret_from_sys_call
testl $_TIF_IA32, TI_flags(%rcx)	# 32-bit compat task needs IRET
jnz  int_ret_from_sys_call
RESTORE_TOP_OF_STACK %rdi, -ARGOFFSET
jmp ret_from_sys_call			# go to the SYSRET fastpath
CFI_ENDPROC
END(ret_from_fork)
 

此时跳入int_ret_from_sys_call,因为此时堆栈中CS寄存器特权级为0.

/*
* Syscall return path ending with IRET.
* Has correct top of stack, but partial stack frame.
*/
GLOBAL(int_ret_from_sys_call)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
testl $3,CS-ARGOFFSET(%rsp)
je retint_restore_args
retint_restore_args:	/* return to kernel space */
DISABLE_INTERRUPTS(CLBR_ANY)
/*
* The iretq could re-enable interrupts:
*/
TRACE_IRQS_IRETQ
restore_args:
RESTORE_ARGS 0,8,0
irq_return:
INTERRUPT_RETURN
 

 

这样,子进程就会调用INTERRUPT_RETURN指令(iret)。硬件中断恢复父进程创建的伪中断堆栈,执行child_rip代码段:

ENTRY(child_rip)
pushq $0		# fake return address
CFI_STARTPROC
/*
* Here we are in the child and the registers are set as they were
* at kernel_thread() invocation in the parent.
*/
movq %rdi, %rax
movq %rsi, %rdi
call *%rax
# exit
mov %eax, %edi
call do_exit
ud2			# padding for call trace
CFI_ENDPROC
END(child_rip)
 

此时rdi寄存器存放的kernel_thread传递的执行函数指针int (*fn)(void *), rsi 存放args,通过call,进入到kernel_thread所需要执行的函数。执行完do_exit。

 

整个过程就是这样子的,关键是整个过程中堆栈情况要搞清楚就行了。其中do_fork的工作,和用户空间系统调用fork陷入do_fork做的事一样的。如果是用户空间过来的,在ret_from_fork中,对比cs寄存器时,特权级为3,就会从ret_from_sys_call返回用户空间。还是很烦的。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息