Sable CPU Module Specification

Sable CPU Module Specification Sable CPU Module Specification

people.freebsd.org
from people.freebsd.org More from this publisher
20.02.2013 Views

3.6 Cache Organization Copyright © 1993 Digital Equipment Corporation. The 21064 includes two on-chip caches, a data cache (Dcache) and an instruction cache (Icache). All memory cells in both Dcache and Icache are fully static six transistor CMOS structures. 3.6.1 Data Cache (Dcache) The DECchip DECchip 21064 Dcache contains an 8 KB (16 KB for the DECchip 21064-A275 ) data cache. It is a write-through, direct mapped, read allocate physical cache and has 32-byte blocks. System components can keep the Dcache coherent with memory by using the invalidate bus described in the pin bus section of this specification. 3.6.2 Instruction Cache (Icache) The DECchip 21064 Icache is an 8-KB (16-KB for the DECchip 21064-A275), physical direct-mapped cache. Icache blocks, or lines, contain 32-bytes of instruction stream data with associated tag, as well as a six-bit ASN field, a one-bit ASM field, and an eight-bit branch history field per block. It does not contain hardware for maintaining coherency with memory and is unaffected by the invalidate bus. The chip also contains a single-entry Icache stream buffer that, together with its supporting logic, reduces the performance penalty due to Icache misses incurred during in-line instruction processing. Stream buffer prefetch requests never cross physical page boundaries, but instead wrap around to the first block of the current page. 3.7 Pipeline Organization The 21064 has a seven-stage pipeline for integer operate and memory reference instructions. Floating point operate instructions progress through a ten stage pipeline. The Ibox maintains state for all pipeline stages to track outstanding register writes, and determine Icache hit/miss. The following figures show the integer operate, memory reference, and the floating point operate pipelines for the Ibox, Ebox, Abox, and Fbox. The first four cycles are executed in the Ibox and the last stages are box specific. There are bypassers in all of the boxes that allow the results of one instruction to be used as operands of a following instruction without having to be written to the register file. Section 3.8 describes the pipeline scheduling rules. Functions Located on the DECchip 21064 59

Copyright © 1993 Digital Equipment Corporation. Figure 31 shows the Integer Operate Pipeline. Figure 31: Integer Operate Pipeline IF SW I0 I1 A1 A2 WR [0] [1] [2] [3] [4] [5] [6] • IF: Instruction Fetch • SW: Swap Dual Issue Instruction /Branch Prediction • I0: Decode • I1: Register file(s) access / Issue check • A1: Computation cycle 1 / Ibox computes new PC • A2: Computation cycle 2 / ITB look-up • WR: Integer register file write / Icache Hit/Miss Figure 32 shows the Memory Reference Pipeline. Figure 32: Memory Reference Pipeline IF SWAP I0 I1 AC TB HM [0] [1] [2] [3] [4] [5] [6] LJ-01880-TI0 • AC: Abox calculates the effective D-stream address • TB: DTB look-up • HM: Dcache Hit/Miss and load data register file write 60 Functions Located on the DECchip 21064 LJ-01875-TI0

3.6 Cache Organization<br />

Copyright © 1993 Digital Equipment Corporation.<br />

The 21064 includes two on-chip caches, a data cache (Dcache) and an instruction<br />

cache (Icache). All memory cells in both Dcache and Icache are fully static six transistor<br />

CMOS structures.<br />

3.6.1 Data Cache (Dcache)<br />

The DECchip DECchip 21064 Dcache contains an 8 KB (16 KB for the DECchip<br />

21064-A275 ) data cache. It is a write-through, direct mapped, read allocate physical<br />

cache and has 32-byte blocks. System components can keep the Dcache coherent<br />

with memory by using the invalidate bus described in the pin bus section of this<br />

specification.<br />

3.6.2 Instruction Cache (Icache)<br />

The DECchip 21064 Icache is an 8-KB (16-KB for the DECchip 21064-A275), physical<br />

direct-mapped cache. Icache blocks, or lines, contain 32-bytes of instruction stream<br />

data with associated tag, as well as a six-bit ASN field, a one-bit ASM field, and an<br />

eight-bit branch history field per block. It does not contain hardware for maintaining<br />

coherency with memory and is unaffected by the invalidate bus.<br />

The chip also contains a single-entry Icache stream buffer that, together with its<br />

supporting logic, reduces the performance penalty due to Icache misses incurred<br />

during in-line instruction processing. Stream buffer prefetch requests never cross<br />

physical page boundaries, but instead wrap around to the first block of the current<br />

page.<br />

3.7 Pipeline Organization<br />

The 21064 has a seven-stage pipeline for integer operate and memory reference instructions.<br />

Floating point operate instructions progress through a ten stage pipeline.<br />

The Ibox maintains state for all pipeline stages to track outstanding register writes,<br />

and determine Icache hit/miss.<br />

The following figures show the integer operate, memory reference, and the floating<br />

point operate pipelines for the Ibox, Ebox, Abox, and Fbox. The first four cycles are<br />

executed in the Ibox and the last stages are box specific. There are bypassers in<br />

all of the boxes that allow the results of one instruction to be used as operands of<br />

a following instruction without having to be written to the register file. Section 3.8<br />

describes the pipeline scheduling rules.<br />

Functions Located on the DECchip 21064 59

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!