LLVM学习笔记(7)
2017-04-07 11:44
495 查看
2.2.6. 调度信息
在Instruction定义430行的Itinerary以及433行的SchedRW用于描述指令调度的信息。其中Itinerary是从指令执行步骤方面来描述指令。目标机器从InstrItinClass派生对应指令的定义。对像X86这样指令复杂且版本繁多的处理器来说,需要定义的InstrItinClass派生定义数量众多。它们都在X86Schedule.td里,几乎每条(类)指令对应一个InstrItinClass定义。注意这些定义实际上是给Atom这样的顺序流水线机器使用的,因此它们不支持的指令就无需定义对应的InstrItinClass(比如AVX指令集)。举例而言,除法的InstrItinClass定义是这样的:
167 // div
168 def IIC_DIV8_MEM : InstrItinClass;
169 def IIC_DIV8_REG : InstrItinClass;
170 def IIC_DIV16 : InstrItinClass;
171 def IIC_DIV32 : InstrItinClass;
172 defIIC_DIV64 : InstrItinClass;
指令执行的一个流水线步骤则由InstrStage来描述:
57 class
InstrStage<int cycles, list<FuncUnit> units,
58 int timeinc = -1,
59 ReservationKind kind =Required> {
60 int Cycles = cycles;
// length ofstage in machine cycles
61 list<FuncUnit> Units = units;
// choice offunctional units
62 int TimeInc = timeinc;
// cycles tillstart of next stage
63 int Kind = kind.Value;
// kind of FUreservation
64 }
Cycles代表完成这个这个步骤(阶段)所需的周期数。Units代表用于完成该阶段的功能单元的选择。比如,IntUnit1,IntUnit2。TimeInc表示在执行步骤中,从这个阶段的开始到下个阶段的开始,需要经历多少个周期。例如:可以两种方式之一来指明一个阶段:
InstrStage<1, [FU_x, FU_y]> - TimeInc缺省为Cycles
InstrStage<1, [FU_x, FU_y], 0> - 显式指定TimeInc
一条(类)指令如何在(顺序)流水线中执行,则需要InstrItinData派生定义将InstrItinClass与stages绑定起来。
110 class
InstrItinData<InstrItinClass Class,list<InstrStage> stages,
111 list<int>operandcycles = [],
112 list<Bypass> bypasses= [], int uops = 1> {
113 InstrItinClass TheClass = Class;
114 int NumMicroOps = uops;
115 list<InstrStage> Stages = stages;
116 list<int> OperandCycles =operandcycles;
117 list<Bypass> Bypasses = bypasses;
118 }
NumMicroOps代表该类指令解码后的微操作(micro-operation)的数量。如果数量是0,意味着该指令可以解码为需要动态确定的、数量不定的微操作。这直接关系到执行步骤限制每周期可发布的微操作数的全局IssueWidth属性。
OperandCycles是可选的“周期数”。它们指出在指令发出这些周期后,指定的操作数完成写或读。
Bypasses是可选的“流水线转发路径”(即处理器将执行写入操作指令的结果直接交给后续的读操作指令,绕过寄存器的接力),如果在一条指令的值在一个特定旁路上可用,且另一条指令可以从这个旁路读出这个值,那么操作数的使用时延降低1个周期。
那么在这个例子里:
InstrItinData<IIC_iLoad_i , [InstrStage<1,[A9_Pipe1]>,
InstrStage<1,[A9_AGU]>],
[3, 1],[A9_LdBypass]>,
InstrItinData<IIC_iMVNr, [InstrStage<1,[A9_Pipe0, A9_Pipe1]>],
[1, 1],[NoBypass, A9_LdBypass]>,
IIC_iLoadi类别指令在发出后的周期1上读入输入,在周期3这次载入的结果可用。这个结果可以通过转发路径A9_LdBypass得到。如果IIC_iMVNr类别指令的第一个源操作数使用它,那么操作数时延减少1。
但对于具有乱序执行能力的处理器(即一周期里可以执行多条指令),比如SandyBridge即后续架构的处理器,这样的描述难以利用处理器提供的指令级并行性。这时就要使用Instruction定义433行的SchedRW。这是一个列表,与指令的输入、输出参数对应,描述该指令处理这些操作数时如何占用处理器资源。这样在使用资源不冲突,且没有依赖关系时,多条指令可以并发执行。这部分细节我们在后面的章节再来讨论。
注意,对很多指令而言,同时使用了Itinerary与SchedRW来描述调度。但实际上,哪个起作用,则取决于实际的目标机器。下面我们将会看到,对Atom处理器定义了一系列的InstrItinData定义,在Atom处理器上Itinerary将用于调度。而在Sandy Bridge则给出了一系列SchedReadWrite定义,指令通过SchedRW来调度。
2.2.7. X86指令的定义
Instruction的定义是目标机器无关的。因此,目标机器几乎总是从Instruction派生出自己所需要的指令定义。以X86为例,X86有一个庞大复杂的芯片族,因此定义了自己一个在Instruction基础上极大扩展了的基类X86Inst(X86InstrFormats.td):220 class
X86Inst<bits<8> opcod, Format f, ImmType i,
dag outs, dag ins,
221 string AsmStr,
222 InstrItinClass itin,
223 Domain d = GenericDomain>
224 :
Instruction {
225
let Namespace= "X86";
226
227 bits<8> Opcode = opcod;
228 Format Form = f;
229 bits<7> FormBits = Form.Value;
230 ImmType ImmT = i;
231
232
dagOutOperandList = outs;
233
dagInOperandList = ins;
234 string AsmString = AsmStr;
235
236
// If this is apseudo instruction, mark it isCodeGenOnly.
237
letisCodeGenOnly = !eq(!cast<string>(f),"Pseudo");
238
239
let Itinerary= itin;
240
241 //
242 // Attributesspecific to X86 instructions...
243 //
244 bit ForceDisassemble = 0;
// Force instruction to disassemble even though it's
245 // isCodeGenonly.Needed to hide an ambiguous
246 // AsmString from theparser, but still disassemble.
247
248 OperandSize OpSize = OpSizeFixed;
// Does this instruction's encoding change
249 // based onoperand size of the mode?
250 bits<2> OpSizeBits = OpSize.Value;
251 AddressSize AdSize = AdSizeX;
// Does this instruction's encoding change
252 // based onaddress size of the mode?
253 bits<2> AdSizeBits = AdSize.Value;
254
255 Prefix OpPrefix = NoPrfx;
// Which prefix byte does this inst have?
256 bits<3> OpPrefixBits = OpPrefix.Value;
257 Map OpMap = OB;
// Whichopcode map does this inst have?
258 bits<3> OpMapBits = OpMap.Value;
259 bit hasREX_WPrefix = 0;
// Does this inst require the REX.W prefix?
260 FPFormat FPForm = NotFP;
// What flavor ofFP instruction is this?
261 bit hasLockPrefix = 0;
// Does thisinst have a 0xF0 prefix?
262 Domain ExeDomain = d;
263 bit hasREPPrefix = 0;
// Does thisinst have a REP prefix?
264 Encoding OpEnc = EncNormal;
// Encoding used by this instruction
265 bits<2> OpEncBits = OpEnc.Value;
266 bit hasVEX_WPrefix = 0;
// Does thisinst set the VEX_W field?
267 bit hasVEX_4V = 0;
// Doesthis inst require the VEX.VVVV field?
268 bit hasVEX_4VOp3 = 0;
// Does thisinst require the VEX.VVVV field to
269 // encode the thirdoperand?
270 bit hasVEX_i8ImmReg = 0;
// Does this instrequire the last source register
271 // to be encoded in aimmediate field?
272 bit hasVEX_L = 0;
// Doesthis inst use large (256-bit) registers?
273 bit ignoresVEX_L = 0;
// Does thisinstruction ignore the L-bit
274 bit hasEVEX_K = 0;
// Doesthis inst require masking?
275 bit hasEVEX_Z = 0;
// Doesthis inst set the EVEX_Z field?
276 bit hasEVEX_L2 = 0;
// Does thisinst set the EVEX_L2 field?
277 bit hasEVEX_B = 0;
// Doesthis inst set the EVEX_B field?
278 bits<3> CD8_Form = 0; // Compresseddisp8 form - vector-width.
279 // Declare it intrather than bits<4> so that all bits are defined when
280 // assigning tobits<7>.
281 int CD8_EltSize = 0;
// Compresseddisp8 form - element-size in bytes.
282 bit has3DNow0F0FOpcode =0;// Wacky 3dNow! encoding?
283 bit hasMemOp4Prefix = 0;
// Same bit asVEX_W, but used for swapping operands
284 bit hasEVEX_RC = 0;
//Explicitly specified rounding control in FP instruction.
285
286 bits<2> EVEX_LL;
287
let EVEX_LL{0}= hasVEX_L;
288
letEVEX_LL{1} = hasEVEX_L2;
289
// Vector size inbytes.
290 bits<7> VectSize = !shl(16, EVEX_LL);
291
292
// The scalingfactor for AVX512's compressed displacement is either
293 // - the size of a power-of-two number of elements or
294 // - the size of a single element forbroadcasts or
295 // - the total vector size divided by apower-of-two number.
296 // Possiblevalues are: 0 (non-AVX512 inst), 1, 2, 4, 8, 16, 32 and 64.
297 bits<7> CD8_Scale = !if (!eq (OpEnc.Value,EncEVEX.Value),
298 !if (CD8_Form{2},
299 !shl(CD8_EltSize, CD8_Form{1-0}),
300 !if (hasEVEX_B,
301 CD8_EltSize,
302 !srl(VectSize, CD8_Form{1-0}))), 0);
303
304
// TSFlags layoutshould be kept in sync with X86BaseInfo.h.
305
letTSFlags{6-0} = FormBits;
306
letTSFlags{8-7} = OpSizeBits;
307
letTSFlags{10-9} = AdSizeBits;
308
letTSFlags{13-11} = OpPrefixBits;
309
letTSFlags{16-14} = OpMapBits;
310
letTSFlags{17} = hasREX_WPrefix;
311
letTSFlags{21-18} = ImmT.Value;
312
letTSFlags{24-22} = FPForm.Value;
313
letTSFlags{25} = hasLockPrefix;
314
letTSFlags{26} = hasREPPrefix;
315
letTSFlags{28-27} = ExeDomain.Value;
316
letTSFlags{30-29} = OpEncBits;
317
letTSFlags{38-31} = Opcode;
318
letTSFlags{39} = hasVEX_WPrefix;
319
letTSFlags{40} = hasVEX_4V;
320
let TSFlags{41} = hasVEX_4VOp3;
321
letTSFlags{42} = hasVEX_i8ImmReg;
322
letTSFlags{43} = hasVEX_L;
323
letTSFlags{44} = ignoresVEX_L;
324
letTSFlags{45} = hasEVEX_K;
325
letTSFlags{46} = hasEVEX_Z;
326
letTSFlags{47} = hasEVEX_L2;
327
letTSFlags{48} = hasEVEX_B;
328
// If we run outof TSFlags bits, it's possible to encode this in 3 bits.
329
letTSFlags{55-49} = CD8_Scale;
330
letTSFlags{56} = has3DNow0F0FOpcode;
331
letTSFlags{57} = hasMemOp4Prefix;
332 letTSFlags{58} = hasEVEX_RC;
333 }
在228行的Form指定了指令的格式,相关的定义在X86InstrFormats.td中。它们描述了X86指令的“mod-reg-r/m”字节等的内容。另外248行以下的内容也都是跟指令编码有关的。指令编码与格式不影响指令选择与指令分配,它只对反汇编器的生成有关。因此,这里我们跳过它们。
LLVM从X86Inst又派生了若干类,它们区别主要在X86Inst定义230行的ImmT,即带不带立即数,以及立即数的大小。这些类构成进一步定义指令的基础。我们看两个例子(X86InstrFormats.td):
340 class
I<bits<8>o, Format f, dag outs,
dagins, string asm,
341 list<dag>pattern, InstrItinClass itin = NoItinerary,
342 Domain d = GenericDomain>
343 :
X86Inst<o, f,NoImm, outs, ins, asm, itin, d> {
344
let Pattern =pattern;
345
let CodeSize= 3;
346 }
347 class
Ii8 <bits<8> o, Formatf,
dag outs, dagins, string asm,
348 list<dag>pattern, InstrItinClass itin = NoItinerary,
349 Domain d = GenericDomain>
350 : X86Inst<o, f, Imm8, outs, ins, asm,itin, d> {
351
let Pattern =pattern;
352
let CodeSize= 3;
353 }
定义“I”是不带立即数的,“Ii8”则带有一个i8类型的立即数。参数itin与指令调度有关,用于描述指令在CPU中的执行路线图,作为基类,默认设为NoItinerary——没有路线图。从这两个类派生出来的def的例子有(X86InstrArithmetic.td):
295 let hasSideEffects =1
in { // so thatwe don't speculatively execute
296 let SchedRW = [WriteIDiv]
in{
297 let Defs = [AL, AH, EFLAGS], Uses = [AX]
in
298 def DIV8r :I<0xF6, MRM6r, (outs), (insGR8:$src),
//AX/r8 = AL,AH
299 "div{b}\t$src", [],IIC_DIV8_REG>;
300 let Defs = [AX, DX, EFLAGS], Uses = [AX,DX]
in
301 def DIV16r : I<0xF7, MRM6r, (outs), (ins GR16:$src), // DX:AX/r16 = AX,DX
302 "div{w}\t$src", [],IIC_DIV16>, OpSize16;
303 let Defs = [EAX, EDX, EFLAGS], Uses = [EAX,EDX]
in
304 def DIV32r : I<0xF7, MRM6r, (outs), (ins GR32:$src), // EDX:EAX/r32 = EAX,EDX
305 "div{l}\t$src", [],IIC_DIV32>, OpSize32;
306 // RDX:RAX/r64 = RAX,RDX
307 let Defs = [RAX, RDX, EFLAGS], Uses = [RAX,RDX]
in
308 def DIV64r : RI<0xF7, MRM6r, (outs), (insGR64:$src),
309 "div{q}\t$src", [],IIC_DIV64>;
310 }
//SchedRW
注释解释了这些指令的操作。其中IIC_XXX都是特定的指令执行路线图,留到指令调度再来研究它们。MRM6r则是指令的格式,这是反汇编器需要的。Defs与Uses描述了操作数以外的寄存器的使用情况,Defs定义的是内容会被改变的寄存器集,Uses定义的是内容会被援引的寄存器集。
204 let Defs = [EFLAGS]
in {
205 let SchedRW = [WriteIMul]
in{
206 // Register-Integer Signed Integer Multiply
207 def IMUL16rri : Ii16<0x69, MRMSrcReg,
//GR16 = GR16*I16
208 (outsGR16:$dst), (ins GR16:$src1, i16imm:$src2),
209 "imul{w}\t{$src2,$src1, $dst|$dst, $src1, $src2}",
210 [(setGR16:$dst, EFLAGS,
211 (X86smul_flagGR16:$src1, imm:$src2))],
212 IIC_IMUL16_RRI>,OpSize16;
213 def IMUL16rri8 : Ii8<0x6B, MRMSrcReg,
//GR16 = GR16*I8
214 (outsGR16:$dst), (ins GR16:$src1, i16i8imm:$src2),
215 "imul{w}\t{$src2,$src1, $dst|$dst, $src1, $src2}",
216 [(setGR16:$dst, EFLAGS,
217 (X86smul_flagGR16:$src1, i16immSExt8:$src2))],
218 IIC_IMUL16_RRI>,OpSize16;
这两个def有两个基类,除了Ii8与li16,还有OpSize16。在X86InstrFormats.td里有两个OpSize16,一个是class,另一个是def。这里用的是class的版本。Def是不允许作为基类的(类似于加了final的class)。OpSize16表示使用32位操作数时(默认是16位)指令需要0x66的前缀(操作数大小更改前缀),显然这也是给反汇编器用的。
注意,DIVXr的定义中都没有指定Pattern,这意味着DIVXr无需匹配。这是怎么做到的呢?秘密就X86DAGToDAGISel::Select。这个方法处理X86目标机器不适用通用指令选择的特定类型节点。比如,它将ISD::SDIVREM匹配为DIVXr(而ISD::SDIVREM,则是在合法阶段,从ISD::SDIV类型SDNode对象得到的。而ISD::SDIV类型SDNode对象又是从LLVMIR,通过visitSDiv方法生成的。这真是一个漫长的过程,我们将花上很长的篇幅来谈论它。)
而IMUL16rri定义中匹配模式则这样解读:因为匹配模式的操作符是set,因此最里层dag值被解释为源模式(要匹配的),而(IMUL16rriGR16:$src1, imm:$src2)称为目标模式(匹配后,要产生这样结果的目标机器DAG,在处理这个定义时,TableGen会生成这个定义)。
另外,X86smul_flag是X86目标机器特定的,这个匹配模式不适合从LLVMIR直接生成的SDNode对象,因此,在X86InstrCompiler.td,还有一个这样的匿名Pat定义:
def : Pat<(mulGR16:$src1, imm:$src2), (IMUL16rri GR16:$src1, imm:$src2)>;
这个定义将从LLVM IR生成的通用DAG形式直接匹配为IMULX定义的目标模式。
2.2.8. 指令展开的例子
对于IMUL16rri,展开后是这个样子:defIMUL16rri { // Instruction X86InstIi16 OpSize16
Domain X86Inst:d = GenericDomain;
string Namespace = "X86";
dag OutOperandList = (outs GR16:$dst);
dag InOperandList = (ins GR16:$src1,i16imm:$src2);
string AsmString = "imul{w} {$src2, $src1, $dst|$dst, $src1,$src2}";
list<dag> Pattern = [(set GR16:$dst,EFLAGS, (X86smul_flag GR16:$src1, imm:$src2))];
list<Register> Uses = [];
list<Register> Defs = [EFLAGS];
list<Predicate> Predicates = [];
int Size = 0;
string DecoderNamespace = "";
int CodeSize = 3;
int AddedComplexity = 0;
bit isReturn = 0;
bit isBranch = 0;
bit isIndirectBranch = 0;
bit isCompare = 0;
bit isMoveImm = 0;
bit isBitcast = 0;
bit isSelect = 0;
bit isBarrier = 0;
bit isCall = 0;
bit canFoldAsLoad = 0;
bit mayLoad = ?;
bit mayStore = ?;
bit isConvertibleToThreeAddress = 0;
bit isCommutable = 0;
bit isTerminator = 0;
bit isReMaterializable = 0;
bit isPredicable = 0;
bit hasDelaySlot = 0;
bit usesCustomInserter = 0;
bit hasPostISelHook = 0;
bit hasCtrlDep = 0;
bit isNotDuplicable = 0;
bit isConvergent = 0;
bit isAsCheapAsAMove = 0;
bit hasExtraSrcRegAllocReq = 0;
bit hasExtraDefRegAllocReq = 0;
bitisRegSequence = 0;
bit isPseudo = 0;
bit isExtractSubreg = 0;
bit isInsertSubreg = 0;
bit hasSideEffects = ?;
bit isCodeGenOnly = 0;
bit isAsmParserOnly = 0;
InstrItinClass Itinerary = IIC_IMUL16_RRI;
list<SchedReadWrite> SchedRW = [WriteIMul];
string Constraints = "";
string DisableEncoding = "";
string PostEncoderMethod = "";
string DecoderMethod = "";
bits<64> TSFlags = { 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0,1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,0, 0, 0, 1, 0, 1 };
string AsmMatchConverter = "";
string TwoOperandAliasConstraint ="";
bit UseNamedOperandTable = 0;
bits<8> Opcode = { 0, 1, 1, 0, 1, 0, 0,1 };
Format Form = MRMSrcReg;
bits<7> FormBits = { 0, 0, 0, 0, 1, 0,1 };
ImmType ImmT = Imm16;
bit ForceDisassemble = 0;
OperandSize OpSize = OpSize16;
bits<2> OpSizeBits = { 0, 1 };
AddressSize AdSize = AdSizeX;
bits<2> AdSizeBits = { 0, 0 };
Prefix OpPrefix = NoPrfx;
bits<3> OpPrefixBits = { 0, 0, 0 };
Map OpMap = OB;
bits<3> OpMapBits = { 0, 0, 0 };
bit hasREX_WPrefix = 0;
FPFormat FPForm = NotFP;
bit hasLockPrefix = 0;
Domain ExeDomain = GenericDomain;
bit hasREPPrefix = 0;
Encoding OpEnc = EncNormal;
bits<2> OpEncBits = { 0, 0 };
bit hasVEX_WPrefix = 0;
bit hasVEX_4V = 0;
bit hasVEX_4VOp3 = 0;
bit hasVEX_i8ImmReg = 0;
bit hasVEX_L = 0;
bit ignoresVEX_L = 0;
bit hasEVEX_K = 0;
bit hasEVEX_Z = 0;
bit hasEVEX_L2 = 0;
bit hasEVEX_B = 0;
bits<3> CD8_Form = { 0, 0, 0 };
int CD8_EltSize = 0;
bit has3DNow0F0FOpcode = 0;
bit hasMemOp4Prefix = 0;
bit hasEVEX_RC = 0;
bits<2> EVEX_LL = { 0, 0 };
bits<7> VectSize = { 0, 0, 1, 0, 0, 0,0 };
bits<7> CD8_Scale = { 0, 0, 0, 0, 0, 0,0 };
string NAME = ?;
}
在这个定义里许多相关的定义被整合进来,包括对PatFrag的内联,这将在下面描述。