03.03.2013 Views

4 Instruction tables - Agner Fog

4 Instruction tables - Agner Fog

4 Instruction tables - Agner Fog

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Nano 3000<br />

VIA Nano 3000 series<br />

List of instruction timings and μop breakdown<br />

Explanation of column headings:<br />

Operands:<br />

i = immediate data, r = register, mm = 64 bit mmx register, xmm = 128 bit xmm<br />

register, (x)mm = mmx or xmm register, sr = segment register, m = memory,<br />

m32 = 32-bit memory operand, etc.<br />

μops:<br />

Port:<br />

Latency:<br />

The number of micro-operations from the decoder or ROM. Note that the VIA<br />

Nano 3000 processor has no reliable performance monitor counter for μops.<br />

Therefore the number of μops cannot be determined except in simple cases.<br />

Tells which execution port or unit is used. <strong>Instruction</strong>s that use the same port<br />

cannot execute simultaneously.<br />

I1: Integer add, Boolean, shift, etc.<br />

I2: Integer add, Boolean, move, jump.<br />

I12: Can use either I1 or I2, whichever is vacant first.<br />

MA: Multiply, divide and square root on all operand types.<br />

MB: Various Integer and floating point SIMD operations.<br />

MBfadd: Floating point addition subunit under MB.<br />

SA: Memory store address.<br />

ST: Memory store.<br />

LD: Memory load.<br />

This is the delay that the instruction generates in a dependency chain. The<br />

numbers are minimum values. Cache misses, misalignment, and exceptions<br />

may increase the clock counts considerably. Floating point operands are presumed<br />

to be normal numbers. Denormal numbers, NAN's and infinity increase<br />

the delays very much, except in XMM move, shuffle and Boolean instructions.<br />

Floating point overflow, underflow, denormal or NAN results give a similar delay.<br />

Note: There is an additional latency for moving data from one unit or subunit to<br />

another. A table of these latencies is given in manual 3: "The microarchitecture<br />

of Intel, AMD and VIA CPUs". These additional latencies are not included in the<br />

listings below where the source and destination operands are of the same type.<br />

Reciprocal throughput: The average number of clock cycles per instruction for a series of independent<br />

instructions of the same kind in the same thread.<br />

Integer instructions<br />

Operands μops Port Latency Reciprocalthruoghput<br />

Remarks<br />

Move instructions<br />

MOV r,r 1 I2 1 1<br />

MOV r,i 1 I12 1 1/2<br />

MOV r,m 1 LD 2 1<br />

Latency 4 on pointer<br />

register<br />

MOV m,r 1 SA, ST 2 1.5<br />

MOV m,i 1 SA, ST 1.5<br />

MOV r,sr I12 1/2<br />

MOV m,sr 1.5<br />

MOV sr,r 20 20<br />

MOV sr,m 20 20<br />

MOVNTI m,r SA, ST 2 1.5<br />

Page 174

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!