4 Instruction tables - Agner Fog

More documents

Recommendations

Info

Intel Pentium TEST m , i 2 np INC DEC r 1 uv INC DEC m 3 uv NEG NOT r/m 1/3 np MUL IMUL r8/r16/m8/m16 11 np MUL IMUL all other versions 9 d) np DIV r8/m8 17 np DIV r16/m16 25 np DIV r32/m32 41 np IDIV r8/m8 22 np IDIV r16/m16 30 np IDIV r32/m32 46 np CBW CWDE 3 np CWD CDQ 2 np SHR SHL SAR SAL r , i 1 u SHR SHL SAR SAL m , i 3 u SHR SHL SAR SAL r/m, CL 4/5 np ROR ROL RCR RCL r/m, 1 1/3 u ROR ROL r/m, i(>= 3 e) np conditional jump short/near 1/4/5/6 e) v CALL JMP r/m 2/5 e np RETN 2/5 e np RETN i 3/6 e) np RETF 4/7 e) np RETF i 5/8 e) np J(E)CXZ short 4-11 e) np LOOP short 5-10 e) np BOUND r , m 8 np CLC STC CMC CLD STD 2 np CLI STI 6-9 np LODS 2 np REP LODS 7+3*n g) np STOS 3 np REP STOS 10+n g) np MOVS 4 np Page 60
Intel Pentium REP MOVS 12+n g) np SCAS 4 np REP(N)E SCAS 9+4*n g) np CMPS 5 np REP(N)E CMPS 8+4*n g) np BSWAP r 1 a) np CPUID 13-16 a) np RDTSC 6-13 a) j) np Notes: a b versions with FS and GS have a 0FH prefix. see note a. c versions with SS, FS, and GS have a 0FH prefix. see note a. d versions with two operands and no immediate have a 0FH prefix, see note a. e high values are for mispredicted jumps/branches. f only pairable if register is AL, AX or EAX. g add one clock cycle for decoding the repeat prefix unless preceded by a multi-cycle instruction (such as CLD). h pairs as if it were writing to the accumulator. i 9 if SP divisible by 4 (imperfect pairing). j on P1: 6 in privileged or real mode; 11 in non-privileged; error in virtual mode. On PMMX: 8 and 13 clocks respectively. Floating point instructions (Pentium and Pentium MMX) Explanation of column headings Operands r = register, m = memory, m32 = 32-bit memory operand, etc. Clock cycles The numbers are minimum values. Cache misses, misalignment, denormal operands, and exceptions may increase the clock counts considerably. Pairability + = pairable with FXCH, np = not pairable with FXCH. i-ov Overlap with integer instructions. i-ov = 4 means that the last four clock cycles can overlap with subsequent integer instructions. fp-ov This instruction has a 0FH prefix which takes one clock cycle extra to decode on a P1 unless preceded by a multi-cycle instruction. Overlap with floating point instructions. fp-ov = 2 means that the last two clock cycles can overlap with subsequent floating point instructions. (WAIT is considered a floating point instruction here) <strong>Instruction</strong> Operand Clock cycles Pairability i-ov fp-ov FLD r/m32/m64 1 0 0 0 FLD m80 3 np 0 0 FBLD m80 48-58 np 0 0 FST(P) r 1 np 0 0 FST(P) m32/m64 2 m) np 0 0 FST(P) m80 3 m) np 0 0 FBSTP m80 148-154 np 0 0 FILD m 3 np 2 2 Page 61
Page 1 and 2:
Introduction 4 Instruction tables L
Page 3 and 4:
Definition of terms Operands Latenc
Page 5 and 6:
Definition of terms It is not possi
Page 7 and 8:
AMD K7 AMD K7 List of instruction t
Page 9 and 10: AMD K7 IMUL r32,r32/m32 2 4 2.5 ALU
Page 11 and 12: AMD K7 Other NOP (90) 1 0 1/3 ALU L
Page 13 and 14: AMD K7 PUNPCKH/LBW/WD mm,r/m 1 2 2
Page 15 and 16: AMD K7 Integer instructions PAVGUSB
Page 17 and 18: MOVNTI m,r 1 2-3 AGU MOVZX, MOVSX r
Page 19 and 20: RCL, RCR m,1 1 7 4 ALU, AGU RCL m,i
Page 21 and 22: K8 FNSTSW m16 2 8 FMISC, ALU do. FN
Page 23 and 24: PADDB/W/D/Q PADDSB/W PADDUSB/W PSUB
Page 25 and 26: MAXSS/D MINSS/D r,r/m 1 2 1 FADD MA
Page 27 and 28: K10 MOVNTI m,r 1 1 AGU MOVZX, MOVSX
Page 29 and 30: K10 RCR m,CL 8 7 5 ALU, AGU SHLD, S
Page 31 and 32: K10 FADD(P),FSUB(R)(P) r/m 1 4 1 FA
Page 33 and 34: K10 PADDB/W/D/Q PADDSB/W PADDUSB/W
Page 35 and 36: K10 LDMXCSR m 12 12 10 STMXCSR m 3
Page 37 and 38: Bulldozer Integer instructions Inst
Page 39 and 40: Bulldozer TEST m,r 1 0.5 EX01 TEST
Page 41 and 42: Bulldozer FLD m80 8 14 4 fp FBLD m8
Page 43 and 44: Bulldozer MASKMOVQ mm,mm 31 38 37 P
Page 45 and 46: Bulldozer MOVDDUP x,x 1 2 1 P1 ivec
Page 47 and 48: Logic AND/ANDN/OR/XORPS/ PD VAND/AN
Page 49 and 50: Bobcat AMD Bobcat List of instructi
Page 51 and 52: Bobcat IMUL r16,(r16),i 2 4 3 I0 IM
Page 53 and 54: Bobcat Move instructions FLD r 1 2
Page 55 and 56: Bobcat PUNPCKH/LBW/WD/ DQ PUNPCKH/L
Page 57 and 58: Bobcat MOVSHDUP, MOVSLDUP r,m 2 12
Page 59: Intel Pentium Intel Pentium and Pen
Page 63 and 64: s Intel Pentium May be up to 3 cloc
Page 65 and 66: Pentium II and III PUSHF(D) 3 11 1
Page 67 and 68: Pentium II and III a) Faster under
Page 69 and 70: Pentium II and III PMOVMSKB d) r32,
Page 71 and 72: Pentium M Intel Pentium M, Core Sol
Page 73 and 74: Pentium M DIV IDIV r16 4 3 1 15-24
Page 75 and 76: Pentium M FLD r 1 1 1 FLD m32/64 1
Page 77 and 78: Pentium M MOVNTDQ PACKSSWB/DW m128,
Page 79 and 80: Pentium M MOVUPS/D m128,xmm 8 4 2 2
Page 81 and 82: Pentium M RSQRTSS xmm,xmm 1 1 3 1 R
Page 83 and 84: Merom Move instructions MOV r,r/i 1
Page 85 and 86: Merom ROR ROL m,i/cl 3 2 x x 1 1 1
Page 87 and 88: Merom FLDPI FLDL2E etc. 2 2 2 float
Page 89 and 90: Merom PALIGNR h) xmm,m128,i 2 2 x x
Page 91 and 92: Merom UNPCKH/LPD xmm,m128 2 1 1 1 f
Page 93 and 94: Wolfdale Intel Core 2 (Wolfdale, 45
Page 95 and 96: Wolfdale ADC SBB r,m 2 2 x x x 1 2
Page 97 and 98: Wolfdale LODS 3 2 1 1 REP LODS 4+7n
Page 99 and 100: Wolfdale Other FNOP 1 1 1 float 1 W
Page 101 and 102: Wolfdale Arithmetic instructions PA
Page 103 and 104: Wolfdale SHUFPS xmm,m128,i 2 1 1 1
Page 105 and 106: Wolfdale DPPD j) xmm,m128,i 4 4 x x
Page 107 and 108: Latency: Reciprocal throughput: Neh
Page 109 and 110: Nehalem CBW CWDE CDQE 1 1 x x x int
Page 111 and 112:
Nehalem Floating point x87 instruct
Page 113 and 114:
Nehalem MOVNTDQA j) PACKSSWB/DW PAC
Page 115 and 116:
Nehalem PHMINPOSUW j) PABSB PABSW P
Page 117 and 118:
Nehalem CVTDQ2PS xmm,xmm 1 1 1 floa
Page 119 and 120:
Sandy Bridge Intel Sandy Bridge Lis
Page 121 and 122:
Sandy Bridge INC DEC NEG NOT r 1 1
Page 123 and 124:
Sandy Bridge CALL m 3 2 1 2 1 2 RET
Page 125 and 126:
Sandy Bridge FYL2XP1 464 464 726 FP
Page 127 and 128:
Sandy Bridge PCMPEQ/GTB/W/D (x)mm,m
Page 129 and 130:
Sandy Bridge MOVMSKPS/D r32,x 1 1 1
Page 131 and 132:
Sandy Bridge ADDSS/D SUBSS/D x,x 1
Page 133 and 134:
Pentium 4 Intel Pentium 4 List of i
Page 135 and 136:
Pentium 4 SFENCE 4 2 40 sse LFENCE
Page 137 and 138:
Pentium 4 RETF i 4 33 11 0 86 IRET
Page 139 and 140:
Pentium 4 Math FSQRT 1 0 43 0 43 1
Page 141 and 142:
Pentium 4 PMIN/MAXSW r,r/m 1 0 2 1
Page 143 and 144:
Pentium 4 h) Throughput of FP-MUL u
Page 145 and 146:
Prescott MOV r64,i64 2 0 0 1 1 alu1
Page 147 and 148:
Prescott IDIV r8/m8 1 21 76 0 34 1
Page 149 and 150:
Prescott RDPMC (bit 31 = 1) 1 37 10
Page 151 and 152:
Prescott Other FNOP 1 0 1 0 1 0 mov
Page 153 and 154:
Prescott Other EMMS Notes: 10 10 12
Page 155 and 156:
Atom Intel Atom List of instruction
Page 157 and 158:
Atom IMUL r16,r16 2 ALU0, Mul 6 5 I
Page 159 and 160:
Atom NOP (90) 1 ALU0/1 1/2 Long NOP
Page 161 and 162:
Atom PACKSSWB/DW PACKUSWB (x)mm, (x
Page 163 and 164:
Atom HADDPS HSUBPS xmm,xmm 5 FP0+1
Page 165 and 166:
VIA Nano 2000 MOVSX MOVSXD MOVZX r,
Page 167 and 168:
VIA Nano 2000 SETcc m 1 CLC STC CMC
Page 169 and 170:
VIA Nano 2000 FABS 1 MB 1 1 FCHS 1
Page 171 and 172:
VIA Nano 2000 Floating point XMM in
Page 173 and 174:
VIA Nano 2000 REP XCRYPTECB 192 bit
Page 175 and 176:
Nano 3000 MOVSX MOVZX r,r 1 I12 1 1
Page 177 and 178:
Nano 3000 Control transfer instruct
Page 179 and 180:
Nano 3000 FCOMPP FUCOMPP 1 MB 1 FCO
Page 181 and 182:
PABSB PABSW PABSD PSIGNB PSIGNW PSI
Page 183:
Nano 3000 Other LDMXCSR m32 31 STMX
show all

4 Instruction tables - Agner Fog

Create successful ePaper yourself

Delete template?

Save as template?