03.03.2013 Views

4 Instruction tables - Agner Fog

4 Instruction tables - Agner Fog

4 Instruction tables - Agner Fog

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

AMD K8<br />

List of instruction timings and macro-operation breakdown<br />

K8<br />

Explanation of column headings:<br />

<strong>Instruction</strong>:<br />

<strong>Instruction</strong> name. cc means any condition code. For example, Jcc can be JB, JNE,<br />

etc.<br />

Operands:<br />

i = immediate constant, r = any register, r32 = 32-bit register, etc., mm = 64 bit<br />

mmx register, xmm = 128 bit xmm register, sr = segment register, m = any memory<br />

operand including indirect operands, m64 means 64-bit memory operand, etc.<br />

Ops:<br />

Latency:<br />

Reciprocal throughput:<br />

Execution unit:<br />

Number of macro-operations issued from instruction decoder to schedulers. <strong>Instruction</strong>s<br />

with more than 2 macro-operations use microcode.<br />

This is the delay that the instruction generates in a dependency chain. The numbers<br />

are minimum values. Cache misses, misalignment, and exceptions may increase<br />

the clock counts considerably. Floating point operands are presumed to be<br />

normal numbers. Denormal numbers, NAN's, infinity and exceptions increase the<br />

delays. The latency listed does not include the memory operand where the operand<br />

is listed as register or memory (r/m).<br />

This is also called issue latency. This value indicates the average number of clock<br />

cycles from the execution of an instruction begins to a subsequent independent instruction<br />

of the same kind can begin to execute. A value of 1/3 indicates that the<br />

execution units can handle 3 instructions per clock cycle in one thread. However,<br />

the throughput may be limited by other bottlenecks in the pipeline.<br />

Indicates which execution unit is used for the macro-operations. ALU means any of<br />

the three integer ALU's. ALU0_1 means that ALU0 and ALU1 are both used. AGU<br />

means any of the three integer address generation units. FADD means floating<br />

point adder unit. FMUL means floating point multiplier unit. FMISC means floating<br />

point store and miscellaneous unit. FA/M means FADD or FMUL is used. FANY<br />

means any of the three floating point units can be used. Two macro-operations can<br />

execute simultaneously if they go to different execution units.<br />

Integer instructions<br />

<strong>Instruction</strong><br />

Move instructions<br />

Operands Ops Latency Reciprocal<br />

throughput<br />

Execution unit Notes<br />

MOV r,r 1 1 1/3 ALU<br />

MOV r,i 1 1 1/3 ALU<br />

MOV r8,m8 1 4 1/2 ALU, AGU Any addressing<br />

MOV<br />

MOV<br />

MOV<br />

r16,m16<br />

r32,m32<br />

r64,m64<br />

1<br />

1<br />

1<br />

4<br />

3<br />

3<br />

1/2<br />

1/2<br />

1/2<br />

ALU, AGU<br />

AGU<br />

AGU<br />

mode. Add 1 clock if<br />

code segment base<br />

≠ 0<br />

MOV m8,r8H 1 8 1/2 AGU AH, BH, CH, DH<br />

Any other 8-bit re-<br />

MOV m8,r8L 1 3 1/2 AGU gister<br />

MOV m16/32/64,r 1 3 1/2 AGU Any addressing mode<br />

MOV m,i 1 3 1/2 AGU<br />

MOV m64,i32 1 3 1/2 AGU<br />

MOV r,sr 1 2 1/2-1<br />

MOV sr,r/m 6 9-13 8<br />

Page 16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!