13.07.2015 Views

CS202 Compiler Construction

CS202 Compiler Construction

CS202 Compiler Construction

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

If node is spilled to memory:Insert store instruction to store value inmemoryInsert load instruction to retrieve value atevery use pointCS 202-35 register allocation4Assume t100 to be spilled:mov %i0,t100…add t100,t101,t101Rewrite instructions as:store %i0,x(fp)…load x(fp),txxxadd txxx,t101,t101CS 202-35 spill example5NB: still want to assign txxx to registerMust re-run register allocation on new code,but…Lifetime of txxx is very shortWill interfere with fewer other registersGraph coloring more likelyAlternate (similar) approach: split a lifetimeCreate two (or more) lifetimesAdd a move instruction at joinsCS 202-35 splitting62


ed1: mov 1,a2: add a,a,c3: loop:a b 4: and a,c,b5: xor c,b,d6: sub b,d,efc7: or d,e,f8: add e,a,g9: xor f,g,cg10: add g,c,a11: and a,c,bg interferes with b 12: cmp g,ball degrees > 2 13: bne loopcan’t color with simp/sel 14: nopCS 202-35 split example7ed1: mov 1,a2: add a,a,c3: loop:a b5: xor c,b,d4: and a,c,b6: sub b,d,ef7: or d,e,fc8: add e,a,g19: xor f,g1,c9a: mov g1,g2g1 g210: add g,c,a11: and a,c,bsplit g into g1,g2 12: cmp g,binsert mov g1,g2 13: loop bneeach has degree = 2 14: nopCS 202-35 split example8ed1: mov 1,a2: add a,a,c3: loop:a b5: xor c,b,d4: and a,c,b6: sub b,d,ef7: or d,e,fc8: add e,a,g19: xor f,g1,c9a: mov g1,g2g1 g210: add g,c,a11: and a,c,b12: cmp g,bcan now be colored 13: loop bnecan coalesce g1,g2 14: nopCS 202-35 split example93


Split graph may not be colorableMay need to spillSplit and spill interact wellSplit intelligently…Create 3 lifetimes2 with usageMiddle with no usageMiddle ideal for spillCS 202-35 split and spill10Want to spill temps with limited usageMinimal cost for spillWant to spill long lived tempsFurther simplifies graphDefine cost function to order tempsMore usage adds to costDivide cost by lifetimeSpill least costly temp with degree > kCS 202-35 choosing spills11Usage not simple countNot every instruction executed onceWithin loops count moreWithin if counts lessConsider CFGUsage in loopCount as more (often 10)Multiply by depth of loopsConditional usage counts as halfCS 202-35 computing usage124


Spill has inherent costMemory access slower than reg accessMakes easy cost functionSplit has only potential costadded move instruction(s)Move is cheaperMay be coalescedNeed different heuristicCS 202-35 split cost13Never split inside loop bodyMultiplies cost of added moveBiggest win in splitting long lifetimesFind longest stretch without usageSplit before, after stretchCS 202-35 choosing splits14loop1:loop2:exit:imov 1,i +add i,1,i + lifetime of icmp i,10 + has long stretchble loop1 * of instructionsmov 8,j * not using iadd j,3,k *cmp k,l *bne exit *sub i,1,i +cmp i,0 +bgt loop2 *CS 202-35 split example155


loop1:loop2:exit:imov 1,i +add i,1,i + never put splitcmp i,10 + within loopble loop1 *+mov 8,j * ignore instructionsadd j,3,k * inside loopscmp k,l *bne exit *sub i,1,i +cmp i,0 +bgt loop2 *+CS 202-35 split example16loop1:loop2:exit:mov 1,i1add i1,1,i1 split i into 3 varscmp i1,10ble loop1mov i1,i2insert movesmov 8,jat split pointsadd j,3,kcmp k,lmov i2,i3insert movesbne exitbefore jumpssub i3,1,i3cmp i3,0bgt loop2CS 202-35 split example17loop1:loop2:exit:mov 1,i1add i1,1,i1cmp i1,10ble loop1mov i1,i2i2 has long lifetimemov 8,jminimal usageadd j,3,k all outside loopcmp k,lmov i2,i3choose i2 for spillbne exitsub i3,1,i3cmp i3,0bgt loop2CS 202-35 spill/split186


loop1:loop2:exit:mov 1,i1add i1,1,i1cmp i1,10ble loop1st i1,x(fp) change movesmov 8,jinto load/storeadd j,3,kcmp k,lld x(fp),i3bne exitsub i3,1,i3cmp i3,0bgt loop2CS 202-35 spill/split19Introduction to optimizationOptimization: change the generated code toreduce the cpu time required to execute the code,reduce the memory used in executing the codeor bothOptimization cannot change any behaviordirectly visible to the program anddefined by the language semanticsCS 202-35 optimization20If code is correct(uses only defined constructs)optimization will generate same resultafter optimizationWhat constructs are undefined?order of evaluation in many languagesa[i++] = b[i++]is a[0] == b[1] or a[1] == b[0]?usage of uninitialized variablesint i;a[i] == ?CS 202-35 optimization limits217


Within these constraints the compiler isfree to do anything it wantsThree basic approaches to making codefaster:1. Do less workDon’t do things that don’t change the answer2. Do work less oftenDon’t redo work that was already done3. Do easier workUse a cheaper way to do workCS 202-35 optimization kinds221. Do less work.Instructions that don’t have an effect onthe result can be removedCoalesced moves are an easy exampleDead variable removal is more importantIf every def of an instruction is unused, theinstruction has no effectVariable assigned to is deadInstruction can be removedRemoving instruction may kill earliervariablesCS 202-35 do less work23Dead variable removalSimplest data-flow based optimizationAlso based on liveness graphVariable assignment is dead if value ofassignment never usedCS 202-35 dead variables248


Recall meaning of data flow setsdef : location is modified by instrout : value in location is exported (used atsome later point)For any assembler instructionIf (singleton) def set not empty andintersection of def and out is emptyInstruction is dead assignmentCan be deleted if no side-effectCS 202-35 dead variables251: a = b * 2 def: {a}, out: {a,b}2: c = a + b def: {c}, out: {b}3: d = b/2 def: {d }, out:{d}4: %o0 = d def : {%o0},out {%o0}Assume no more usage of a,b,c or dInstruction 2Defines value for c, exports only bIntersection is emptyc is dead variable at assignmentDelete instruction 2CS 202-35 dead assignment example261: a = b * 2 def: {a}, out: {a,b}2: c = a + b def: {c}, out: {b}3: d = b/2 def: {d }, out:{d}4: %o0 = d def : {%o0},out {%o0}Deleting instruction 2No longer any usage of aout for 1 is now just { b }Intersection with def is emptyDelete instruction 1 alsoCS 202-35 dead assignment example279


2. Do work less oftenSome work needs to be doneJust not many timesBest example: loop invariant instructionsInstr within a loop is executed many times(in general)if it computes same result every iterationWhy execute every time?<strong>Compiler</strong> can be move it outside loopExecuted only once thenCS 202-35 do work less often28Another examplecommon subexpression eliminationconsider codea = i*j + 3;b = i*j -4;Why compute i*j twice?Compute i*j once and use the answer twiceCS 202-35 do work less often293. Do cheaper workLook for bargainsSimplest way is to use cheaper opsi = j*2 is the same as i = j+ji = j*4 is the same as i = j


May be cheaper to recompute than reloadfrom memoryIs this legal?What if another process/thread wrote tomemoryIn C and C++,<strong>Compiler</strong> can assume no other writers,unless variable declared “volatile”In Java<strong>Compiler</strong> must assume other writersCS 202-35 do cheaper work31Optimizations scoperange of instructions consideredPeephole optimizersLook at one or a few instructionse.g. reduce operator strength, remove jumps tojumpsBasic block optimizationCan do some interesting optimizationsMuch easier when focusing on isolated blocksLoop optimizationBiggest bang for the buckFunction-wide optimizationGlobal optimizationCS 202-35 optimization scope32When does optimization occur?Three choices, depending onoptimization:1. Before code generation (on IR)Machine independent optimizations2. Before register allocation (on assem)Many peephole3. After register allocationRemaining peepholeCS 202-35 optimization time3311


Cover seven peephole optimizations:Redundant movesConstant foldingConstant conditional branchesJump to jumpJump over jumpReduction in strengthSpecial instructionsCS 202-35 peephole optimizations34Common generate code that looks likemove loc2 loc1move loc1 loc2Second instruction unnecessary, just needmove loc2 loc1More common (and more costly)where loc1 or loc2 is memoryThis optimization can be performed at anystageCS 202-35 redundant moves35If one or both operands are constants, can docomputation at compile timeFor example:add const1 const2 loc2Can be replaced withmove const1+const2,loc2Especially important when combined withconstant propagationCan be done on IRCan use tile patterns in code selectionCS 202-35 constant folding3612


If comparison for branch is constantCan be turned into an unconditional branch orremoved altogetherFor example:cmp 0,1beq L1nopL2:can be replaced withL2:CS 202-35 constant conditional branches37Alternatively, the codecmp 0,1bne L1nopL2:Can be replaced withbr L1nopL2:Can be performed at any stageCS 202-35 constant conditional branches38Many jumps to jumps generatedfor examplebr L1...L1: br L2Can be replaced withbr L2...L1: br L2CS 202-35 jump to jump3913


Conditional branchessecond opportunity for removing extra branchesFor example:beq L1nopbr L2nopL1:can be replaced withbne L2nopL1:CS 202-35 jump over jump40Not all operations are created equalWant to use cheapest instruction availableMany are generic to all machinesoperationreplacementi*2 i+ii*2 n i


Many machines offer special faster instructionsMay be easier to recognize in peepholethan during code generationOther peephole optims may enable instructionAlways machine specificGenerally done after register allocationCS 202-35 special instructions43consider x86 codemov word[0(eax)],word[0(ebx)]mov word[2(eax)],word[2(ebx)]Can be replaced withmov dword[0(eax)],dword[0(ebx)]Combining four memory accesses into twoCS 202-35 special instruction44x86 (or mips) codebeq L1mov r1,r2L1:can be replaced by cmov instructioncmov eq,r1,r2L1:Loses branchVery big win for pipelineCS 202-35 special instructions4515


Peephole optimizer given 4 args:Code fragmentTarget basic blockNext basic blockBranch target block (if any)Uses several functions:Remove basic block (on fragment)Remove instruction (on block)Replace instruction (on block)Insert instruction before i (on block)CS 202-35 implementing peephole46Code loops over list of instructionsCould be IR or assemExtractCurrent instr and next 1-4 instrsFirst real instr of next blockFirst real instr of branch target blkLook for patterns over these instructionsMuch like tile matching in maximal munchEdit instructions as neededRestart loop after changeCS 202-35 implementing peephole4716

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!