03.03.2013 Views

4 Instruction tables - Agner Fog

4 Instruction tables - Agner Fog

4 Instruction tables - Agner Fog

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

K10<br />

AMD K10<br />

List of instruction timings and macro-operation breakdown<br />

Explanation of column headings:<br />

<strong>Instruction</strong>:<br />

<strong>Instruction</strong> name. cc means any condition code. For example, Jcc can be JB,<br />

JNE, etc.<br />

Operands:<br />

i = immediate constant, r = any register, r32 = 32-bit register, etc., mm = 64 bit<br />

mmx register, xmm = 128 bit xmm register, sr = segment register, m = any<br />

memory operand including indirect operands, m64 means 64-bit memory operand,<br />

etc.<br />

Ops:<br />

Latency:<br />

Reciprocal throughput:<br />

Execution unit:<br />

Number of macro-operations issued from instruction decoder to schedulers. <strong>Instruction</strong>s<br />

with more than 2 macro-operations use microcode.<br />

This is the delay that the instruction generates in a dependency chain. The numbers<br />

are minimum values. Cache misses, misalignment, and exceptions may increase<br />

the clock counts considerably. Floating point operands are presumed to<br />

be normal numbers. Denormal numbers, NAN's, infinity and exceptions increase<br />

the delays. The latency listed does not include the memory operand where the<br />

operand is listed as register or memory (r/m).<br />

This is also called issue latency. This value indicates the average number of clock<br />

cycles from the execution of an instruction begins to a subsequent independent<br />

instruction of the same kind can begin to execute. A value of 1/3 indicates that the<br />

execution units can handle 3 instructions per clock cycle in one thread. However,<br />

the throughput may be limited by other bottlenecks in the pipeline.<br />

Indicates which execution unit is used for the macro-operations. ALU means any<br />

of the three integer ALU's. ALU0_1 means that ALU0 and ALU1 are both used.<br />

AGU means any of the three integer address generation units. FADD means floating<br />

point adder unit. FMUL means floating point multiplier unit. FMISC means<br />

floating point store and miscellaneous unit. FA/M means FADD or FMUL is used.<br />

FANY means any of the three floating point units can be used. Two macro-operations<br />

can execute simultaneously if they go to different execution units.<br />

Integer instructions<br />

<strong>Instruction</strong><br />

Move instructions<br />

Operands Ops Latency Reciprocal<br />

throughput<br />

Execution unit Notes<br />

MOV r,r 1 1 1/3 ALU<br />

MOV r,i 1 1 1/3 ALU<br />

MOV r8,m8 1 4 1/2 ALU, AGU Any addressing<br />

MOV<br />

MOV<br />

MOV<br />

r16,m16<br />

r32,m32<br />

r64,m64<br />

1<br />

1<br />

1<br />

4<br />

3<br />

3<br />

1/2<br />

1/2<br />

1/2<br />

ALU, AGU<br />

AGU<br />

AGU<br />

mode. Add 1 clock if<br />

code segment base<br />

≠ 0<br />

MOV m8,r8H 1 8 1/2 AGU AH, BH, CH, DH<br />

Any other 8-bit<br />

MOV m8,r8L 1 3 1/2 AGU register<br />

MOV m16/32/64,r 1 3 1/2 AGU Any addressing<br />

MOV m,i 1 3 1/2 AGU mode<br />

MOV m64,i32 1 3 1/2 AGU<br />

MOV r,sr 1 3-4 1/2<br />

MOV sr,r/m 6 8-26 8 from AMD manual<br />

Page 26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!