4 Instruction tables - Agner Fog
4 Instruction tables - Agner Fog
4 Instruction tables - Agner Fog
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
s<br />
Intel Pentium<br />
May be up to 3 clocks more when output needed for FST, FCHS,<br />
or FABS.<br />
MMX instructions (Pentium MMX)<br />
A list of MMX instruction timings is not needed because they all take one clock cycle, except the<br />
MMX multiply instructions which take 3. MMX multiply instructions can be pipelined to yield a<br />
throughput of one multiplication per clock cycle.<br />
The EMMS instruction takes only one clock cycle, but the first floating point instruction after an<br />
EMMS takes approximately 58 clocks extra, and the first MMX instruction after a floating point instruction<br />
takes approximately 38 clocks extra. There is no penalty for an MMX instruction after<br />
EMMS on the PMMX.<br />
There is no penalty for using a memory operand in an MMX instruction because the MMX arithmetic<br />
unit is one step later in the pipeline than the load unit. But the penalty comes when you<br />
store data from an MMX register to memory or to a 32-bit register: The data have to be ready one<br />
clock cycle in advance. This is analogous to the floating point store instructions.<br />
All MMX instructions except EMMS are pairable in either pipe. Pairing rules for MMX instructions<br />
are described in manual 3: "The microarchitecture of Intel, AMD and VIA CPUs".<br />
Page 63