03.03.2013 Views

4 Instruction tables - Agner Fog

4 Instruction tables - Agner Fog

4 Instruction tables - Agner Fog

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

s<br />

Intel Pentium<br />

May be up to 3 clocks more when output needed for FST, FCHS,<br />

or FABS.<br />

MMX instructions (Pentium MMX)<br />

A list of MMX instruction timings is not needed because they all take one clock cycle, except the<br />

MMX multiply instructions which take 3. MMX multiply instructions can be pipelined to yield a<br />

throughput of one multiplication per clock cycle.<br />

The EMMS instruction takes only one clock cycle, but the first floating point instruction after an<br />

EMMS takes approximately 58 clocks extra, and the first MMX instruction after a floating point instruction<br />

takes approximately 38 clocks extra. There is no penalty for an MMX instruction after<br />

EMMS on the PMMX.<br />

There is no penalty for using a memory operand in an MMX instruction because the MMX arithmetic<br />

unit is one step later in the pipeline than the load unit. But the penalty comes when you<br />

store data from an MMX register to memory or to a 32-bit register: The data have to be ready one<br />

clock cycle in advance. This is analogous to the floating point store instructions.<br />

All MMX instructions except EMMS are pairable in either pipe. Pairing rules for MMX instructions<br />

are described in manual 3: "The microarchitecture of Intel, AMD and VIA CPUs".<br />

Page 63

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!