4 Instruction tables - Agner Fog
4 Instruction tables - Agner Fog
4 Instruction tables - Agner Fog
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Nano 3000<br />
VIA Nano 3000 series<br />
List of instruction timings and μop breakdown<br />
Explanation of column headings:<br />
Operands:<br />
i = immediate data, r = register, mm = 64 bit mmx register, xmm = 128 bit xmm<br />
register, (x)mm = mmx or xmm register, sr = segment register, m = memory,<br />
m32 = 32-bit memory operand, etc.<br />
μops:<br />
Port:<br />
Latency:<br />
The number of micro-operations from the decoder or ROM. Note that the VIA<br />
Nano 3000 processor has no reliable performance monitor counter for μops.<br />
Therefore the number of μops cannot be determined except in simple cases.<br />
Tells which execution port or unit is used. <strong>Instruction</strong>s that use the same port<br />
cannot execute simultaneously.<br />
I1: Integer add, Boolean, shift, etc.<br />
I2: Integer add, Boolean, move, jump.<br />
I12: Can use either I1 or I2, whichever is vacant first.<br />
MA: Multiply, divide and square root on all operand types.<br />
MB: Various Integer and floating point SIMD operations.<br />
MBfadd: Floating point addition subunit under MB.<br />
SA: Memory store address.<br />
ST: Memory store.<br />
LD: Memory load.<br />
This is the delay that the instruction generates in a dependency chain. The<br />
numbers are minimum values. Cache misses, misalignment, and exceptions<br />
may increase the clock counts considerably. Floating point operands are presumed<br />
to be normal numbers. Denormal numbers, NAN's and infinity increase<br />
the delays very much, except in XMM move, shuffle and Boolean instructions.<br />
Floating point overflow, underflow, denormal or NAN results give a similar delay.<br />
Note: There is an additional latency for moving data from one unit or subunit to<br />
another. A table of these latencies is given in manual 3: "The microarchitecture<br />
of Intel, AMD and VIA CPUs". These additional latencies are not included in the<br />
listings below where the source and destination operands are of the same type.<br />
Reciprocal throughput: The average number of clock cycles per instruction for a series of independent<br />
instructions of the same kind in the same thread.<br />
Integer instructions<br />
Operands μops Port Latency Reciprocalthruoghput<br />
Remarks<br />
Move instructions<br />
MOV r,r 1 I2 1 1<br />
MOV r,i 1 I12 1 1/2<br />
MOV r,m 1 LD 2 1<br />
Latency 4 on pointer<br />
register<br />
MOV m,r 1 SA, ST 2 1.5<br />
MOV m,i 1 SA, ST 1.5<br />
MOV r,sr I12 1/2<br />
MOV m,sr 1.5<br />
MOV sr,r 20 20<br />
MOV sr,m 20 20<br />
MOVNTI m,r SA, ST 2 1.5<br />
Page 174