12.07.2015 Views

Lecture 6: Tomasulo Algorithm – register renaming and tag-based ...

Lecture 6: Tomasulo Algorithm – register renaming and tag-based ...

Lecture 6: Tomasulo Algorithm – register renaming and tag-based ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Lecture</strong> 6: <strong>Tomasulo</strong> <strong>Algorithm</strong> –<strong>register</strong> <strong>renaming</strong> <strong>and</strong> <strong>tag</strong>-<strong>based</strong>dependence check<strong>Tomasulo</strong> design, big example1


Scoreboarding MeritEnable out-of-order execution <strong>and</strong> reduces stallcycles caused by RAW dependencesUse <strong>register</strong> result status table to detect WAW <strong>and</strong> RAWdependence between a newly fetched instruction <strong>and</strong> pendinginstructionsRecord dependence into the FU status of scoreboard (in formsof Qj, Qk <strong>and</strong> Rj, Rk)Keep an instruction waiting if it has dependences with pendinginstructionsWhen an inst writes result, wake up dependent instructions bymatching the inst’s FU index with Qj <strong>and</strong> Qk of every FU statusentry2


Good Dynamic Scheduling NeedsMoreBetter h<strong>and</strong>ling of WAR <strong>and</strong> WAW dependences Scoreboarding: (1) Stalls issuing when WAW is detected; (2)Delay writing results when WAR is detectedIs it necessary to enforce WAR <strong>and</strong> WAW dependences?Better h<strong>and</strong>ling of structure hazardsWhy stall pipeline when two instructions go to the same FU?(Particular a problem for memory/integer instructions)Better pipeline efficiencyTwo extra cycles between the EXs of two dependentinstructions – Need data forwardingMore ILP beyond a basic blockNeed speculative execution, branch predictions, <strong>and</strong> dynamicmemory disambiguation3


<strong>Tomasulo</strong> <strong>Algorithm</strong>History:1966: scoreboarding in CDC6600, implementinglimited dynamic schedulingThree years later: <strong>Tomasulo</strong> in IBM 360/91,introducing <strong>register</strong> <strong>renaming</strong> <strong>and</strong> reservation stationNow appearing in todays Dec Alpha, SGI MIPS, SUNUltraSparc, Intel Pentium, IBM PowerPC <strong>and</strong> others indifferent formsapted from UCB CS252 S98, Copyright 1998 USB4


What <strong>Tomasulo</strong> ProvidesBetter h<strong>and</strong>ling of WAR <strong>and</strong> WAW dependences Use <strong>register</strong> <strong>renaming</strong> to remove WAR <strong>and</strong> WAW dependences –No stalls or delays anymoreBetter h<strong>and</strong>ling of structure hazardsMultiple reservation stations per FU – instruction is assigned toa reservation stationBetter pipeline efficiencyOne extra (instead of two) between EXs of two dependentinstructionsDynamic memory disambiguationEnforce dependence between stores <strong>and</strong> loadsDistributed scheduling logicRegister <strong>and</strong> RS status no longer centralized5


Three S<strong>tag</strong>es of <strong>Tomasulo</strong> <strong>Algorithm</strong>1. Issue—get instruction from FP Op Queue Condition: a free RS at the required FU Actions: (1) decode the instruction; (2) allocate a RS; (3)do source <strong>register</strong> <strong>renaming</strong>; (4) do dest <strong>register</strong><strong>renaming</strong>; (5) read <strong>register</strong> file; (6) dispatch the decoded<strong>and</strong> renamed instruction to the RS entry2. Execution—operate on oper<strong>and</strong>s (EX) Condition: At a given FU, At lease one instruction is ready Action: select a ready instruction <strong>and</strong> send it to the FU3. Write result—finish execution (WB) Condition: At a given FU, some instruction finishes FUexecution Actions: (1) the FU writes the result onto the CommonData (store inst writes to memory instead); (2) the <strong>tag</strong>(RS index) is broadcast to all other RS to wake uppotential waiting instructions; (3) update <strong>register</strong> status;(4) de-allocate the RSdapted from UCB CS252 S986


Reservation Station ComponentsOp—Operation to perform in the unit (e.g., + or –)Vj, Vk—Value of Source oper<strong>and</strong>s Store buffers has V field, result to be storedQj, Qk—Reservation stations producing source<strong>register</strong>s Note: No ready flags as in Scoreboard; Qj,Qk=0=> ready Store buffers only have Qi for RS producing resultA – to hold memory address for load <strong>and</strong>store instructoinsBusy—Indicates reservation station or FU is busyHow are dependences represented in reservationstation?dapted from UCB CS252 S98B8


Register Renaming in <strong>Tomasulo</strong>Register Result Status Table: Indicates whichRS will write each <strong>register</strong>, if one exists. Blankwhen no pending instructions that will writethat <strong>register</strong>.Is very similar to the one in scoreboardingPerforms <strong>register</strong> <strong>renaming</strong> to remove name(WAR <strong>and</strong> WAW) dependencesModern term: <strong>register</strong> alias table (sometimes<strong>register</strong> <strong>renaming</strong> table)9


Register Renaming in <strong>Tomasulo</strong>Source Renaming: rename source <strong>register</strong> tomapped RS index (if renamed)Destination <strong>renaming</strong>: record the <strong>renaming</strong>from the destination <strong>register</strong> to allocated RSindex; used in future <strong>renaming</strong>Example:LD F2,45(R3) => LD (LD1),45(R3)MULT F0, F2, F4 => MULT (MULT1), (LD1), F4Key difference from that in scoreboarding: use RSindex as <strong>tag</strong> of the oper<strong>and</strong>s10


Renaming Implementation0- - - …F0 F2 F4LD F2,45(R3) => LD (LD1), 45(R3)PdRd Rs RtRenaming1- LD1 - …F0 F2 F4MULT F0, F2, F4 => MULT (MULT1),(LD1), F42 Mult1 LD1 -F0 F2 F4…11PsPtFor every inst:1. Translate source<strong>register</strong>s ifrenamed2. Record the<strong>renaming</strong> for dest


Selection of Instruction for ExecutionOnly “ready” instructions can join thecompetitionThere is a select logic to selectinstructions for FU execution Some policy may be used, e.g. age <strong>based</strong>Non-ready instructions can be “wakenup” during writeback of its parent inst12


Writeback <strong>and</strong> Common DataBusNormal data bus: data + destination (“go to”bus)Common data bus: data + source (“come from”bus) 64 bits of data + 4 bits of source index (<strong>tag</strong>) Does the broadcast to every instruction in theflyChild instructions do <strong>tag</strong> matching <strong>and</strong>update their ready bits <strong>and</strong> value fields (ifthe <strong>tag</strong> matches theirs)Register update involves <strong>tag</strong> matchingapted from UCB CS252 S98, Copyright 1998 USB13


Code ExampleLD F6,34(R2)LD F2,45(R3)MULTI F0,F2,F4SUBD F8,F6,F2DIVD F10,F0,F6ADD F6,F8,F2LD1SUBDADDLD2MULTIDIVDOperation latencies: load/store 2 cycles,Add/sub 2 cycles, Mult 10 cycles, divide 40 cycle14


What to Observe1. Whether some instructions can be issued=> pay attention to (1) RS (or load/store buffer)allocation (decode, RS allocation, source <strong>register</strong><strong>renaming</strong>, dispatch); (2) change to <strong>register</strong>status (dest <strong>renaming</strong>)2. Whether some instruction can be selected forexecution (for every FU)=> Pay attention to instruction status change;the inst will finish in a given number of cycle3. Whether some instruction is finishing execution=> Pay attention to instruction status change;the inst may write its result the next cycle4. Whether some instruction is writing result=> Pay attention to (1) wakeup of thedependent instructions; (2) <strong>register</strong> statuschange; (3) RS de-allocation15


<strong>Tomasulo</strong> Example Cycle 0Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No0 Mult1 No0 Mult2 NoRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F300 FUapted from UCB CS252 S98, Copyright 1998 USB16


<strong>Tomasulo</strong> Example Cycle 1Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 No Yes 34+R2LD F2 45+ R3 Load2 NoMULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 NoAdd3 No0 Mult1 No0 Mult2 NoRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1apted from UCB CS252 S98, Copyright 1998 USB17


<strong>Tomasulo</strong> Example Cycle 2Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 NoAdd3No0 Mult1 No0 Mult2 NoRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1apted from UCB CS252 S98, Copyright 1998 USB18


<strong>Tomasulo</strong> Example Cycle 3Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 NoAdd3No0 Mult1 Yes MULTD R(F4) Load20 Mult2 NoRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mult1 Load2 Load1• Note: <strong>register</strong>s names are removed (“renamed”) in ReservationStations• Load1 completing; what is waiting for Load1?apted from UCB CS252 S98, Copyright 1998 USB19


<strong>Tomasulo</strong> Example Cycle 4Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(34+R2) Load20 Add2 NoAdd3No0 Mult1 Yes MULTD R(F4) Load20 Mult2 NoRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(34+R2) Add1• Load2 completing; what is waiting for it?apted from UCB CS252 S98, Copyright 1998 USB20


<strong>Tomasulo</strong> Example Cycle 5Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk2 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 NoAdd3No10 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mult1 M(45+R3) M(34+R2) Add1 Mult2apted from UCB CS252 S98, Copyright 1998 USB21


<strong>Tomasulo</strong> Example Cycle 6Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDDF6 F8 F2 6Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk1 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1Add3No9 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 M(45+R3) Add2 Add1 Mult2apted from UCB CS252 S98, Copyright 1998 USB22


<strong>Tomasulo</strong> Example Cycle 7Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDDF6 F8 F2 6Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1Add3No8 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 M(45+R3) Add2 Add1 Mult2• Add1 completing; what is waiting for it?apted from UCB CS252 S98, Copyright 1998 USB23


<strong>Tomasulo</strong> Example Cycle 8Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No2 Add2 Yes ADDD M()-M() M(45+R3)0 Add3 No7 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 M(45+R3) Add2 M()-M() Mult2apted from UCB CS252 S98, Copyright 1998 USB24


<strong>Tomasulo</strong> Example Cycle 9Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No1 Add2 Yes ADDD M()–M() M(45+R3)0 Add3 No6 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 M(45+R3) Add2 M()–M() Mult2apted from UCB CS252 S98, Copyright 1998 USB25


<strong>Tomasulo</strong> Example Cycle 10Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6 10Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 Yes ADDD M()–M() M(45+R3)0 Add3 No5 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 M(45+R3) Add2 M()–M() Mult2• Add2 completing; what is waiting for it?apted from UCB CS252 S98, Copyright 1998 USB26


<strong>Tomasulo</strong> Example Cycle 11Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No4 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 M(45+R3) (M-M)+M() M()ŠM() Mult2• Write result of ADDD here vs. scoreboard?apted from UCB CS252 S98, Copyright 1998 USB27


<strong>Tomasulo</strong> Example Cycle 12Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBDF8 F6 F2 4 7 8DIVDF10 F0 F6 5ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for kTime NameBusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No3 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 M(45+R3) (M-M)+M()M()–M()Mult2apted from UCB CS252 S98, Copyright 1998 USB28


<strong>Tomasulo</strong> Example Cycle 13Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 NoAdd3No2 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2apted from UCB CS252 S98, Copyright 1998 USB29


<strong>Tomasulo</strong> Example Cycle 14Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No1 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2apted from UCB CS252 S98, Copyright 1998 USB30


<strong>Tomasulo</strong> Example Cycle 15Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 NoAdd3No0 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2• Mult1 completing; what is waiting for it?apted from UCB CS252 S98, Copyright 1998 USB31


<strong>Tomasulo</strong> Example Cycle 16Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 NoAdd3No0 Mult1 No40 Mult2 Yes DIVD M*F4 M(34+R2)Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2• Note: Just waiting for divideapted from UCB CS252 S98, Copyright 1998 USB32


<strong>Tomasulo</strong> Example Cycle 55Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 NoAdd3No0 Mult1 No1 Mult2 Yes DIVD M*F4 M(34+R2)Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3055 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2apted from UCB CS252 S98, Copyright 1998 USB33


<strong>Tomasulo</strong> Example Cycle 56Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 NoAdd3No0 Mult1 No0 Mult2 Yes DIVD M*F4 M(34+R2)Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2• Mult 2 completing; what is waiting for it?apted from UCB CS252 S98, Copyright 1998 USB34


<strong>Tomasulo</strong> Example Cycle 57Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTDF0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for kTime Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 NoAdd3No0 Mult1 No0 Mult2 NoRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F3057 FU M*F4 M(45+R3) (M–M)+M() M()–M() M*F4/M• Again, in-oder issue,out-of-order execution, completionapted from UCB CS252 S98, Copyright 1998 USB35


The Use of TagIn <strong>Tomasulo</strong>, RS index is used as <strong>tag</strong> (<strong>tag</strong> isa modern term).Tag is a unique identifier for a pending<strong>register</strong> resultTag decouples the <strong>register</strong> result from thearchitectural <strong>register</strong> specifierTag removes WAR <strong>and</strong> WAWdependences without changing RAWdependencesIt is not necessary to use RS index as <strong>tag</strong>!36


Machine CorrectnessE(D,P) = E(S,P) ifE(D,P) <strong>and</strong> E(S,P) execute the same set ofinstructionsFor any inst i, i receives the outputs in E(D,P)of its parents in E(S,P)In E(D,P) any <strong>register</strong> or memory wordreceives the output of inst j, where j is thelast instruction writes to the <strong>register</strong> ormemory word in E(S,P)RAW <strong>and</strong> WAW dependences are notnecessary for a correct execution!37


<strong>Tomasulo</strong> SummaryReservations stations: Increases effective <strong>register</strong> number Distributes scheduling logicRegister <strong>renaming</strong>: Avoids WAR <strong>and</strong> WAWdependenceTag + Data broadcasting for waking up childinstructionsPros: can be effectively combined withspeculative executionCons: CDB broadcasting adds one-cycle delay(addressed in modern instruction scheduling)apted from UCB CS252 S98, Copyright 1998 USB38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!