4 Instruction tables - Agner Fog
4 Instruction tables - Agner Fog
4 Instruction tables - Agner Fog
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Atom<br />
Intel Atom<br />
List of instruction timings and μop breakdown<br />
Explanation of column headings:<br />
<strong>Instruction</strong>:<br />
<strong>Instruction</strong> name. cc means any condition code. For example, Jcc can be JB,<br />
JNE, etc.<br />
Operands:<br />
i = immediate data, r = register, mm = 64 bit mmx register, xmm = 128 bit xmm<br />
register, (x)mm = mmx or xmm register, sr = segment register, m = memory,<br />
m32 = 32-bit memory operand, etc.<br />
μops: The number of μops from the decoder or ROM.<br />
Unit:<br />
Tells which execution unit is used. <strong>Instruction</strong>s that use the same unit cannot<br />
execute simultaneously.<br />
ALU0 and ALU1 means integer unit 0 or 1, respectively.<br />
ALU0/1 means that either unit can be used. ALU0+1 means that both units<br />
are used.<br />
Mem means memory in/out unit.<br />
FP0 means floating point unit 0 (includes multiply, divide and other SIMD instructions).<br />
FP1 means floating point unit 1 (adder).<br />
MUL means multiplier, shared between FP and integer units.<br />
DIV means divider, shared between FP and integer units.<br />
np means not pairable: Cannot execute simultaneously with any other instruction.<br />
Latency:<br />
This is the delay that the instruction generates in a dependency chain. The<br />
numbers are minimum values. Cache misses, misalignment, and exceptions<br />
may increase the clock counts considerably. Floating point operands are presumed<br />
to be normal numbers. Denormal numbers, NAN's and infinity increase<br />
the delays very much, except in XMM move, shuffle and Boolean instructions.<br />
Floating point overflow, underflow, denormal or NAN results give a similar<br />
delay.<br />
Reciprocal throughput:<br />
Integer instructions<br />
The average number of clock cycles per instruction for a series of independent<br />
instructions of the same kind in the same thread.<br />
Operands μops Unit Latency Reciprocalthroughput<br />
Remarks<br />
Move instructions<br />
MOV r,r 1 ALU0/1 1 1/2<br />
MOV r,i 1 ALU0/1 1 1/2<br />
MOV r,m 1 ALU0, Mem 1-3 1 All addr. modes<br />
MOV m,r 1 ALU0, Mem 1 1 All addr. modes<br />
MOV m,i 1 ALU0, Mem 1<br />
MOV r,sr 1 1 1<br />
MOV m,sr 2 5<br />
MOV sr,r 7 21<br />
MOV sr,m 8 26<br />
MOVNTI m,r 1 ALU0, Mem 2.5<br />
MOVSX MOVZX MOVSXD r,r/m 1 ALU0 1 1<br />
CMOVcc r,r 1 ALU0+1 2 2<br />
Page 155