您的位置:首页 > 大数据 > 人工智能

x64 Tail Call Elimination

2016-05-20 14:59 295 查看
今天编译一份x64发行版代码,发现调用某函数的代码反汇编代码并不是call,而是jmp指令。本人是读过x64 deep dive的,所以对这种情形并不陌生。这是编译器对代码的优化,但是什么情况下优化,本人有记不太清了。于是又看了下x64 deep dive,上面这样说的

X64 compiler can optimize the last call made from a function by replacing it with a jump to the callee. This avoids the overhead of setting up the stack frame for the callee. The caller and the callee share the same stack frame
and the callee returns directly to the caller's caller. This is especially beneficial when the caller and the callee have the same parameters, since, if the relevant parameters are already in the required registers and those registers haven't changed, they
don't have to be reloaded. Figure 2 shows tail call elimination in Function1 when calling Function4. Function1 jumps to Function4 and when Function4 finishes execution, it returns directly to the caller of Function1.


Figure 2 : Tail Call Elimination
根据本人蹩脚的英文水平,使用jmp替代call指令可以减少建立栈帧的花销,调用者和被调用者共享相同的栈帧,被调用者直接返回调用者的调用者。当调用者和被调用者有相同的参数,参数已经在寄存器中并且没有发生变化,这个时候不需要重新加载寄存器,在此情形下使用此种优化特别有作用。

我并不是不赞同上面说的,而是觉得上面没有把在何种情况下使用此优化的条件表达出来。

既然由被调用者直接返回调用者的调用者,那么调用者在jmp至被调用者之前至少不能再有分配的栈空间,否则被调用者返回时栈不在平衡,返回地址也会错误。

分析下我写的代码

VOID CalcRoutine (
__in struct _KDPC *Dpc,
__in_opt PVOID DeferredContext,
__in_opt PVOID SystemArgument1,
__in_opt PVOID SystemArgument2
)
{
int a = 10;
int b = 30;

int c = Calc(a,b);
DbgPrint("Calc(%d , %d) = %d ,Calc(%d , %d) = %d \n", a, b, c, a, b, c);
}


当DbgPrint有3个参数时,传递参数只需要寄存器即可,并且编译器对局部变量也进行了优化,不再需要在栈上存储变量,此时CalcRoutine不再需要分配栈空间,所以对最后一个函数的调用使用了jmp指令

kd> uf GoonSys!CalcRoutine
GoonSys!CalcRoutine [e:\driverproj\topdesk\topdesk\main.c @ 34]:
34 fffff880`02df706c ba0a000000      mov     edx,0Ah
39 fffff880`02df7071 488d0d98000000  lea     rcx,[GoonSys! ?? ::FNODOBFM::`string' (fffff880`02df7110)]
39 fffff880`02df7078 41b9e0fcffff    mov     r9d,0FFFFFCE0h
39 fffff880`02df707e 448d4214        lea     r8d,[rdx+14h]
40 fffff880`02df7082 e97f000000      jmp     GoonSys!DbgPrint (fffff880`02df7106)


当函数DbgPrint有六个参数时,调用者不得不分配栈空间传递参数,此时编译器不在优化。

VOID CalcRoutine (
__in struct _KDPC *Dpc,
__in_opt PVOID DeferredContext,
__in_opt PVOID SystemArgument1,
__in_opt PVOID SystemArgument2
)
{
int a = 10;
int b = 30;

int c = Calc(a,b);
DbgPrint("Calc(%d , %d) = %d ,Calc(%d , %d) = %d \n", a, b, c, a, b, c);
}
kd> uf GoonSys!CalcRoutine
GoonSys!CalcRoutine [e:\driverproj\topdesk\topdesk\main.c @ 34]:
34 fffff880`02df106c 4883ec48        sub     rsp,48h
39 fffff880`02df1070 41b81e000000    mov     r8d,1Eh
39 fffff880`02df1076 41b9e0fcffff    mov     r9d,0FFFFFCE0h
39 fffff880`02df107c 488d0dad000000  lea     rcx,[GoonSys! ?? ::FNODOBFM::`string' (fffff880`02df1130)]
39 fffff880`02df1083 44894c2430      mov     dword ptr [rsp+30h],r9d
39 fffff880`02df1088 418d50ec        lea     edx,[r8-14h]
39 fffff880`02df108c 4489442428      mov     dword ptr [rsp+28h],r8d
39 fffff880`02df1091 89542420        mov     dword ptr [rsp+20h],edx
39 fffff880`02df1095 e88c000000      call    GoonSys!DbgPrint (fffff880`02df1126)
40 fffff880`02df109a 4883c448        add     rsp,48h
40 fffff880`02df109e c3              ret
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: