您的位置：首页 > 其它

LLVM学习笔记（7）

2017-04-07 11:44 495 查看

2.2.6. 调度信息

在Instruction定义430行的Itinerary以及433行的SchedRW用于描述指令调度的信息。

其中Itinerary是从指令执行步骤方面来描述指令。目标机器从InstrItinClass派生对应指令的定义。对像X86这样指令复杂且版本繁多的处理器来说，需要定义的InstrItinClass派生定义数量众多。它们都在X86Schedule.td里，几乎每条（类）指令对应一个InstrItinClass定义。注意这些定义实际上是给Atom这样的顺序流水线机器使用的，因此它们不支持的指令就无需定义对应的InstrItinClass（比如AVX指令集）。举例而言，除法的InstrItinClass定义是这样的：

167      // div

168      def IIC_DIV8_MEM   : InstrItinClass;

169      def IIC_DIV8_REG   : InstrItinClass;

170      def IIC_DIV16      : InstrItinClass;

171      def IIC_DIV32      : InstrItinClass;

172      defIIC_DIV64       : InstrItinClass;

指令执行的一个流水线步骤则由InstrStage来描述：

57        class
InstrStage<int cycles, list<FuncUnit> units,

58                         int timeinc = -1,

59                         ReservationKind kind =Required> {

60        int Cycles          = cycles;
// length ofstage in machine cycles

61        list<FuncUnit> Units = units;
// choice offunctional units

62        int TimeInc         = timeinc;
// cycles tillstart of next stage

63        int Kind            = kind.Value;
// kind of FUreservation

64        }

Cycles代表完成这个这个步骤（阶段）所需的周期数。Units代表用于完成该阶段的功能单元的选择。比如，IntUnit1，IntUnit2。TimeInc表示在执行步骤中，从这个阶段的开始到下个阶段的开始，需要经历多少个周期。例如：可以两种方式之一来指明一个阶段：

InstrStage<1, [FU_x, FU_y]>     - TimeInc缺省为Cycles

InstrStage<1, [FU_x, FU_y], 0> - 显式指定TimeInc

一条（类）指令如何在（顺序）流水线中执行，则需要InstrItinData派生定义将InstrItinClass与stages绑定起来。

110      class
InstrItinData<InstrItinClass Class,list<InstrStage> stages,

111                          list<int>operandcycles = [],

112                          list<Bypass> bypasses= [], int uops = 1> {

113      InstrItinClass TheClass = Class;

114      int NumMicroOps = uops;

115      list<InstrStage> Stages = stages;

116      list<int> OperandCycles =operandcycles;

117      list<Bypass> Bypasses = bypasses;

118      }

NumMicroOps代表该类指令解码后的微操作（micro-operation）的数量。如果数量是0，意味着该指令可以解码为需要动态确定的、数量不定的微操作。这直接关系到执行步骤限制每周期可发布的微操作数的全局IssueWidth属性。

OperandCycles是可选的“周期数”。它们指出在指令发出这些周期后，指定的操作数完成写或读。

Bypasses是可选的“流水线转发路径”（即处理器将执行写入操作指令的结果直接交给后续的读操作指令，绕过寄存器的接力），如果在一条指令的值在一个特定旁路上可用，且另一条指令可以从这个旁路读出这个值，那么操作数的使用时延降低1个周期。

那么在这个例子里：

InstrItinData<IIC_iLoad_i , [InstrStage<1,[A9_Pipe1]>,

                               InstrStage<1,[A9_AGU]>],

                              [3, 1],[A9_LdBypass]>,

InstrItinData<IIC_iMVNr, [InstrStage<1,[A9_Pipe0, A9_Pipe1]>],

                              [1, 1],[NoBypass, A9_LdBypass]>,

IIC_iLoadi类别指令在发出后的周期1上读入输入，在周期3这次载入的结果可用。这个结果可以通过转发路径A9_LdBypass得到。如果IIC_iMVNr类别指令的第一个源操作数使用它，那么操作数时延减少1。

但对于具有乱序执行能力的处理器（即一周期里可以执行多条指令），比如SandyBridge即后续架构的处理器，这样的描述难以利用处理器提供的指令级并行性。这时就要使用Instruction定义433行的SchedRW。这是一个列表，与指令的输入、输出参数对应，描述该指令处理这些操作数时如何占用处理器资源。这样在使用资源不冲突，且没有依赖关系时，多条指令可以并发执行。这部分细节我们在后面的章节再来讨论。

注意，对很多指令而言，同时使用了Itinerary与SchedRW来描述调度。但实际上，哪个起作用，则取决于实际的目标机器。下面我们将会看到，对Atom处理器定义了一系列的InstrItinData定义，在Atom处理器上Itinerary将用于调度。而在Sandy Bridge则给出了一系列SchedReadWrite定义，指令通过SchedRW来调度。

2.2.7. X86指令的定义

Instruction的定义是目标机器无关的。因此，目标机器几乎总是从Instruction派生出自己所需要的指令定义。以X86为例，X86有一个庞大复杂的芯片族，因此定义了自己一个在Instruction基础上极大扩展了的基类X86Inst（X86InstrFormats.td）：

220      class
X86Inst<bits<8> opcod, Format f, ImmType i,
dag outs, dag ins,

221                    string AsmStr,

222                    InstrItinClass itin,

223                    Domain d = GenericDomain>

224      :
Instruction {

225
let Namespace= "X86";

226

227      bits<8> Opcode = opcod;

228      Format Form = f;

229      bits<7> FormBits = Form.Value;

230      ImmType ImmT = i;

231

232
dagOutOperandList = outs;

233
dagInOperandList = ins;

234      string AsmString = AsmStr;

235

236
// If this is apseudo instruction, mark it isCodeGenOnly.

237
letisCodeGenOnly = !eq(!cast<string>(f),"Pseudo");

238

239
let Itinerary= itin;

240

241      //

242      // Attributesspecific to X86 instructions...

243      //

244      bit ForceDisassemble = 0;
// Force instruction to disassemble even though it's

245                                  // isCodeGenonly.Needed to hide an ambiguous

246                                  // AsmString from theparser, but still disassemble.

247

248      OperandSize OpSize = OpSizeFixed;
// Does this instruction's encoding change

249                                          // based onoperand size of the mode?

250      bits<2> OpSizeBits = OpSize.Value;

251      AddressSize AdSize = AdSizeX;
// Does this instruction's encoding change

252                                      // based onaddress size of the mode?

253      bits<2> AdSizeBits = AdSize.Value;

254

255      Prefix OpPrefix = NoPrfx;
// Which prefix byte does this inst have?

256      bits<3> OpPrefixBits = OpPrefix.Value;

257      Map OpMap = OB;
// Whichopcode map does this inst have?

258      bits<3> OpMapBits = OpMap.Value;

259      bit hasREX_WPrefix = 0;
// Does this inst require the REX.W prefix?

260      FPFormat FPForm = NotFP;
// What flavor ofFP instruction is this?

261      bit hasLockPrefix = 0;
// Does thisinst have a 0xF0 prefix?

262      Domain ExeDomain = d;

263      bit hasREPPrefix = 0;
// Does thisinst have a REP prefix?

264      Encoding OpEnc = EncNormal;
// Encoding used by this instruction

265      bits<2> OpEncBits = OpEnc.Value;

266      bit hasVEX_WPrefix = 0;
// Does thisinst set the VEX_W field?

267      bit hasVEX_4V = 0;
// Doesthis inst require the VEX.VVVV field?

268      bit hasVEX_4VOp3 = 0;
// Does thisinst require the VEX.VVVV field to

269                                  // encode the thirdoperand?

270      bit hasVEX_i8ImmReg = 0;
// Does this instrequire the last source register

271                                 // to be encoded in aimmediate field?

272      bit hasVEX_L = 0;
// Doesthis inst use large (256-bit) registers?

273      bit ignoresVEX_L = 0;
// Does thisinstruction ignore the L-bit

274      bit hasEVEX_K = 0;
// Doesthis inst require masking?

275      bit hasEVEX_Z = 0;
// Doesthis inst set the EVEX_Z field?

276      bit hasEVEX_L2 = 0;
// Does thisinst set the EVEX_L2 field?

277      bit hasEVEX_B = 0;
// Doesthis inst set the EVEX_B field?

278      bits<3> CD8_Form = 0;     // Compresseddisp8 form - vector-width.

279      // Declare it intrather than bits<4> so that all bits are defined when

280      // assigning tobits<7>.

281      int CD8_EltSize = 0;
// Compresseddisp8 form - element-size in bytes.

282      bit has3DNow0F0FOpcode =0;// Wacky 3dNow! encoding?

283      bit hasMemOp4Prefix = 0;
// Same bit asVEX_W, but used for swapping operands

284      bit hasEVEX_RC = 0;
//Explicitly specified rounding control in FP instruction.

285

286      bits<2> EVEX_LL;

287
let EVEX_LL{0}= hasVEX_L;

288
letEVEX_LL{1} = hasEVEX_L2;

289
// Vector size inbytes.

290      bits<7> VectSize = !shl(16, EVEX_LL);

291

292
// The scalingfactor for AVX512's compressed displacement is either

293      //   - the size of a power-of-two number of elements or

294      //   - the size of a single element forbroadcasts or

295      //   - the total vector size divided by apower-of-two number.

296      // Possiblevalues are: 0 (non-AVX512 inst), 1, 2, 4, 8, 16, 32 and 64.

297      bits<7> CD8_Scale = !if (!eq (OpEnc.Value,EncEVEX.Value),

298                                 !if (CD8_Form{2},

299                                      !shl(CD8_EltSize, CD8_Form{1-0}),

300                                      !if (hasEVEX_B,

301                                          CD8_EltSize,

302                                           !srl(VectSize, CD8_Form{1-0}))), 0);

303

304
// TSFlags layoutshould be kept in sync with X86BaseInfo.h.

305
letTSFlags{6-0}   = FormBits;

306
letTSFlags{8-7}   = OpSizeBits;

307
letTSFlags{10-9} = AdSizeBits;

308
letTSFlags{13-11} = OpPrefixBits;

309
letTSFlags{16-14} = OpMapBits;

310
letTSFlags{17}    = hasREX_WPrefix;

311
letTSFlags{21-18} = ImmT.Value;

312
letTSFlags{24-22} = FPForm.Value;

313
letTSFlags{25}    = hasLockPrefix;

314
letTSFlags{26}    = hasREPPrefix;

315
letTSFlags{28-27} = ExeDomain.Value;

316
letTSFlags{30-29} = OpEncBits;

317
letTSFlags{38-31} = Opcode;

318
letTSFlags{39}    = hasVEX_WPrefix;

319
letTSFlags{40}    = hasVEX_4V;

320
let TSFlags{41}    = hasVEX_4VOp3;

321
letTSFlags{42}    = hasVEX_i8ImmReg;

322
letTSFlags{43}    = hasVEX_L;

323
letTSFlags{44}    = ignoresVEX_L;

324
letTSFlags{45}    = hasEVEX_K;

325
letTSFlags{46}    = hasEVEX_Z;

326
letTSFlags{47}    = hasEVEX_L2;

327
letTSFlags{48}    = hasEVEX_B;

328
// If we run outof TSFlags bits, it's possible to encode this in 3 bits.

329
letTSFlags{55-49} = CD8_Scale;

330
letTSFlags{56}    = has3DNow0F0FOpcode;

331
letTSFlags{57}    = hasMemOp4Prefix;

332      letTSFlags{58}    = hasEVEX_RC;

333      }

在228行的Form指定了指令的格式，相关的定义在X86InstrFormats.td中。它们描述了X86指令的“mod-reg-r/m”字节等的内容。另外248行以下的内容也都是跟指令编码有关的。指令编码与格式不影响指令选择与指令分配，它只对反汇编器的生成有关。因此，这里我们跳过它们。

LLVM从X86Inst又派生了若干类，它们区别主要在X86Inst定义230行的ImmT，即带不带立即数，以及立即数的大小。这些类构成进一步定义指令的基础。我们看两个例子（X86InstrFormats.td）：

340      class
I<bits<8>o, Format f, dag outs,
dagins, string asm,

341              list<dag>pattern, InstrItinClass itin = NoItinerary,

342              Domain d = GenericDomain>

343      :
X86Inst<o, f,NoImm, outs, ins, asm, itin, d> {

344
let Pattern =pattern;

345
let CodeSize= 3;

346      }

347      class
Ii8 <bits<8> o, Formatf,
dag outs, dagins, string asm,

348                 list<dag>pattern, InstrItinClass itin = NoItinerary,

349                 Domain d = GenericDomain>

350      : X86Inst<o, f, Imm8, outs, ins, asm,itin, d> {

351
let Pattern =pattern;

352
let CodeSize= 3;

353      }

定义“I”是不带立即数的，“Ii8”则带有一个i8类型的立即数。参数itin与指令调度有关，用于描述指令在CPU中的执行路线图，作为基类，默认设为NoItinerary——没有路线图。从这两个类派生出来的def的例子有（X86InstrArithmetic.td）：

295      let hasSideEffects =1
in { // so thatwe don't speculatively execute

296      let SchedRW = [WriteIDiv]
in{

297      let Defs = [AL, AH, EFLAGS], Uses = [AX]
in

298      def DIV8r :I<0xF6, MRM6r, (outs), (insGR8:$src),
//AX/r8 = AL,AH

299                     "div{b}\t$src", [],IIC_DIV8_REG>;

300      let Defs = [AX, DX, EFLAGS], Uses = [AX,DX]
in

301      def DIV16r : I<0xF7, MRM6r, (outs), (ins GR16:$src),  // DX:AX/r16 = AX,DX

302                     "div{w}\t$src", [],IIC_DIV16>, OpSize16;

303      let Defs = [EAX, EDX, EFLAGS], Uses = [EAX,EDX]
in

304      def DIV32r : I<0xF7, MRM6r, (outs), (ins GR32:$src),  // EDX:EAX/r32 = EAX,EDX

305                     "div{l}\t$src", [],IIC_DIV32>, OpSize32;

306      // RDX:RAX/r64 = RAX,RDX

307      let Defs = [RAX, RDX, EFLAGS], Uses = [RAX,RDX]
in

308      def DIV64r : RI<0xF7, MRM6r, (outs), (insGR64:$src),

309                      "div{q}\t$src", [],IIC_DIV64>;

310      }
//SchedRW

注释解释了这些指令的操作。其中IIC_XXX都是特定的指令执行路线图，留到指令调度再来研究它们。MRM6r则是指令的格式，这是反汇编器需要的。Defs与Uses描述了操作数以外的寄存器的使用情况，Defs定义的是内容会被改变的寄存器集，Uses定义的是内容会被援引的寄存器集。

204      let Defs = [EFLAGS]
in {

205      let SchedRW = [WriteIMul]
in{

206      // Register-Integer Signed Integer Multiply

207      def IMUL16rri : Ii16<0x69, MRMSrcReg,
//GR16 = GR16*I16

208                            (outsGR16:$dst), (ins GR16:$src1, i16imm:$src2),

209                            "imul{w}\t{$src2,$src1, $dst|$dst, $src1, $src2}",

210                            [(setGR16:$dst, EFLAGS,

211                                  (X86smul_flagGR16:$src1, imm:$src2))],

212                                  IIC_IMUL16_RRI>,OpSize16;

213      def IMUL16rri8 : Ii8<0x6B, MRMSrcReg,
//GR16 = GR16*I8

214                           (outsGR16:$dst), (ins GR16:$src1, i16i8imm:$src2),

215                           "imul{w}\t{$src2,$src1, $dst|$dst, $src1, $src2}",

216                           [(setGR16:$dst, EFLAGS,

217                                 (X86smul_flagGR16:$src1, i16immSExt8:$src2))],

218                                 IIC_IMUL16_RRI>,OpSize16;

这两个def有两个基类，除了Ii8与li16，还有OpSize16。在X86InstrFormats.td里有两个OpSize16，一个是class，另一个是def。这里用的是class的版本。Def是不允许作为基类的（类似于加了final的class）。OpSize16表示使用32位操作数时（默认是16位）指令需要0x66的前缀（操作数大小更改前缀），显然这也是给反汇编器用的。

注意，DIVXr的定义中都没有指定Pattern，这意味着DIVXr无需匹配。这是怎么做到的呢？秘密就X86DAGToDAGISel::Select。这个方法处理X86目标机器不适用通用指令选择的特定类型节点。比如，它将ISD::SDIVREM匹配为DIVXr（而ISD::SDIVREM，则是在合法阶段，从ISD::SDIV类型SDNode对象得到的。而ISD::SDIV类型SDNode对象又是从LLVMIR，通过visitSDiv方法生成的。这真是一个漫长的过程，我们将花上很长的篇幅来谈论它。）

而IMUL16rri定义中匹配模式则这样解读：因为匹配模式的操作符是set，因此最里层dag值被解释为源模式（要匹配的），而(IMUL16rriGR16:$src1, imm:$src2)称为目标模式（匹配后，要产生这样结果的目标机器DAG，在处理这个定义时，TableGen会生成这个定义）。

另外，X86smul_flag是X86目标机器特定的，这个匹配模式不适合从LLVMIR直接生成的SDNode对象，因此，在X86InstrCompiler.td，还有一个这样的匿名Pat定义：

def : Pat<(mulGR16:$src1, imm:$src2), (IMUL16rri GR16:$src1, imm:$src2)>;

这个定义将从LLVM IR生成的通用DAG形式直接匹配为IMULX定义的目标模式。

2.2.8. 指令展开的例子

对于IMUL16rri，展开后是这个样子：

defIMUL16rri {         // Instruction X86InstIi16 OpSize16

Domain X86Inst:d = GenericDomain;

string Namespace = "X86";

dag OutOperandList = (outs GR16:$dst);

dag InOperandList = (ins GR16:$src1,i16imm:$src2);

string AsmString = "imul{w}               {$src2, $src1, $dst|$dst, $src1,$src2}";

list<dag> Pattern = [(set GR16:$dst,EFLAGS, (X86smul_flag GR16:$src1, imm:$src2))];

list<Register> Uses = [];

list<Register> Defs = [EFLAGS];

list<Predicate> Predicates = [];

int Size = 0;

string DecoderNamespace = "";

int CodeSize = 3;

int AddedComplexity = 0;

bit isReturn = 0;

bit isBranch = 0;

bit isIndirectBranch = 0;

bit isCompare = 0;

bit isMoveImm = 0;

bit isBitcast = 0;

bit isSelect = 0;

bit isBarrier = 0;

bit isCall = 0;

bit canFoldAsLoad = 0;

bit mayLoad = ?;

bit mayStore = ?;

bit isConvertibleToThreeAddress = 0;

bit isCommutable = 0;

bit isTerminator = 0;

bit isReMaterializable = 0;

bit isPredicable = 0;

bit hasDelaySlot = 0;

bit usesCustomInserter = 0;

bit hasPostISelHook = 0;

bit hasCtrlDep = 0;

bit isNotDuplicable = 0;

bit isConvergent = 0;

bit isAsCheapAsAMove = 0;

bit hasExtraSrcRegAllocReq = 0;

bit hasExtraDefRegAllocReq = 0;

  bitisRegSequence = 0;

bit isPseudo = 0;

bit isExtractSubreg = 0;

bit isInsertSubreg = 0;

bit hasSideEffects = ?;

bit isCodeGenOnly = 0;

bit isAsmParserOnly = 0;

InstrItinClass Itinerary = IIC_IMUL16_RRI;

list<SchedReadWrite> SchedRW = [WriteIMul];

string Constraints = "";

string DisableEncoding = "";

string PostEncoderMethod = "";

string DecoderMethod = "";

bits<64> TSFlags = { 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0,1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,0, 0, 0, 1, 0, 1 };

string AsmMatchConverter = "";

string TwoOperandAliasConstraint ="";

bit UseNamedOperandTable = 0;

bits<8> Opcode = { 0, 1, 1, 0, 1, 0, 0,1 };

Format Form = MRMSrcReg;

bits<7> FormBits = { 0, 0, 0, 0, 1, 0,1 };

ImmType ImmT = Imm16;

bit ForceDisassemble = 0;

OperandSize OpSize = OpSize16;

bits<2> OpSizeBits = { 0, 1 };

AddressSize AdSize = AdSizeX;

bits<2> AdSizeBits = { 0, 0 };

Prefix OpPrefix = NoPrfx;

bits<3> OpPrefixBits = { 0, 0, 0 };

Map OpMap = OB;

bits<3> OpMapBits = { 0, 0, 0 };

bit hasREX_WPrefix = 0;

FPFormat FPForm = NotFP;

bit hasLockPrefix = 0;

Domain ExeDomain = GenericDomain;

bit hasREPPrefix = 0;

Encoding OpEnc = EncNormal;

bits<2> OpEncBits = { 0, 0 };

bit hasVEX_WPrefix = 0;

bit hasVEX_4V = 0;

bit hasVEX_4VOp3 = 0;

bit hasVEX_i8ImmReg = 0;

bit hasVEX_L = 0;

bit ignoresVEX_L = 0;

bit hasEVEX_K = 0;

bit hasEVEX_Z = 0;

bit hasEVEX_L2 = 0;

bit hasEVEX_B = 0;

bits<3> CD8_Form = { 0, 0, 0 };

int CD8_EltSize = 0;

bit has3DNow0F0FOpcode = 0;

bit hasMemOp4Prefix = 0;

bit hasEVEX_RC = 0;

bits<2> EVEX_LL = { 0, 0 };

bits<7> VectSize = { 0, 0, 1, 0, 0, 0,0 };

bits<7> CD8_Scale = { 0, 0, 0, 0, 0, 0,0 };

string NAME = ?;

}

在这个定义里许多相关的定义被整合进来，包括对PatFrag的内联，这将在下面描述。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： compiler 编译器 llvm

相关文章推荐

新的分享

章节导航