Sable CPU Module Specification

Sable CPU Module Specification Sable CPU Module Specification

people.freebsd.org
from people.freebsd.org More from this publisher
20.02.2013 Views

Copyright © 1993 Digital Equipment Corporation. 7.2.1.3 Hardware 0 - Hardware Error Interrupts generating a Hardware Interrupt 0 are caused by the detection of hardware errors on the CPU, I/O, Memory modules, and/or the Cobra-bus. These errors consist of RAM array correctable and uncorrectable errors as well as bus transport and protocol errors. Correctable error interrupts are individually maskable at each modules detection point. Before servicing the Hardware Error Interrupt it should be cleared in the local System Interrupt Clear register. B-Cache Tag Parity or Uncorrectable EDC Error A given CPU module must scrub its own B-Cache Tag Parity and Uncorrectable EDC errors. If the error causing this interrupt is a result of this module initiating a transaction, then it also causes a machine check exception and the processing of the error is left up to the machine check handler. If however this node was not the transaction initiator, (Cobra-bus probe) then the interrupt should initiate the scrubbing/logging process. Parity errors in the Tag or Tag Control Stores of a B-Cache can only be scrubbed using the FORCE EDC/CONTROL and SDV bits of the BCC CSR. When scrubbing dirty entries from the B-Cache using the Allocate Invalid Address Space, if a Tag, or Tag Control parity error is detected, the expected location will not be scrubbed. If an Uncorrectable data error is encountered, the data will be written to the Cobra-bus coincident with the assertion of the CUCERR L signal. This causes the location to be written into Main Memory with a bad EDC code. The Tag Control Store of the B-Cache location in question will not be updated. B-Cache Single Bit EDC Error A given CPU module must scrub its own B-Cache Single bit EDC errors. If the error causing this interrupt is a result of this module initiating a transaction, and if EDC correction is disabled in the B-Cache Control Register, it will also cause a machine check exception and the processing of the error is left up to the machine check handler. If however this node was not the transaction initiator, (Cobra-bus probe) and EDC correction was disabled, then the interrupt should initiate the scrubbing/logging process. When scrubbing dirty entries from the B-Cache using the Allocate Invalid Address Space, if a Tag, or Tag Control parity error is detected, the expected location will not be scrubbed. If an Uncorrectable data error is encountered, the data will be written to the Cobra-bus coincident with the assertion of the CUCERR L signal. This causes the location to be written into Main Memory with a bad EDC code. The Tag Control Store of the B-Cache location in question will not be updated. When EDC correction is enabled, no machine checks occur for this error, so the interrupt handler is responsible for scrubbing/logging errors that occur in its own cache. As compared with hardware error correction, this approach is vulnerable to singlebit errors which may occur during I-stream reads of the PAL code machine check handler, to single-bit errors which occur in multiple quadwords of a cache fill block, and to single-bit errors which occur as a result of multiple silo’ed load misses. Exceptions and Interrupts 175

Copyright © 1993 Digital Equipment Corporation. Cobra-bus Parity Error Cobra-bus Parity Errors indicate that a node on the Cobra-bus has a bad driver or receiver, or a problem exists in the physical interconnect. It is unlikely that this is a correctable error so the handler should attempt to discover which node is bad, and if possible disable it so it won’t interfere with normal bus operation. If no Uncorrectable error can be detected, then retrying the operation causing the error maybe help to isolate it. Generally when an uncorrectable error of this type is detected it is system fatal. A C/A Parity Error may result in the C/A not ack’ed bit being set in one of the commander CSR’s. This is due to the fact that responders will not ack a C/A that has bad parity. Refer to Section 7.1.1.3.10 for further details. Invalid Cobra-bus Address Bystander When and Invalid Cobra-bus Address is broadcast by a CPU node, and not ack’ed, the Cobra-bus C_ERR L signal is asserted. The CPU that initiated the transaction will machine check and receive a Hardware Error Interrupt; the bystander CPU will only receive the Interrupt. The transaction initiator should handle the logging/error recovery for this error. This results in an access violation. Refer to Section 7.1.1.3.11 for further details. Invalid Cobra-bus Address I/O Commander When and Invalid Cobra-bus Address is broadcast by the I/O node, and not ack’ed, the Cobra-bus C_ERR L signal is asserted. Both CPU’s will receive a Hardware Error Interrupt; and one should be designated to handle the logging/error recovery. Information found in the IO error register will indicate whether this error occurred as a result of mailbox operations or DMA assesses and the logging and recovery can be handled differently for each. Memory Uncorrectable EDC Error I/O Commander When an Uncorrectable EDC Error is detected while the I/O module is the Cobrabus Commander, the I/O module asserts the C_ERR L signal which causes the CPU modules to receive a Hardware Error Interrupt. When this occurs, one CPU should be designated to handle the logging/error recovery. This may or may not be SYSTEM FATAL. Memory Correctable EDC Error When a Correctable EDC Error is detected the Memory module will assert the C_ ERR L signal if the Enable CRD reporting bit is set in Memory module CSR6. This in turn causes the CPU module(s) to receive a Hardware Error Interrupt. When this occurs, one CPU should be designated to handle the scrubbing/logging activity. Duplicate Tag Store Parity Error Duplicate tag store parity errors are treated in hardware as uncorrectable errors; however the system will always recover from a parity error of this sort without any loss of data and/or memory coherence. 176 Exceptions and Interrupts

Copyright © 1993 Digital Equipment Corporation.<br />

Cobra-bus Parity Error<br />

Cobra-bus Parity Errors indicate that a node on the Cobra-bus has a bad driver or<br />

receiver, or a problem exists in the physical interconnect. It is unlikely that this is<br />

a correctable error so the handler should attempt to discover which node is bad, and<br />

if possible disable it so it won’t interfere with normal bus operation.<br />

If no Uncorrectable error can be detected, then retrying the operation causing the<br />

error maybe help to isolate it.<br />

Generally when an uncorrectable error of this type is detected it is system fatal.<br />

A C/A Parity Error may result in the C/A not ack’ed bit being set in one of the<br />

commander CSR’s. This is due to the fact that responders will not ack a C/A that<br />

has bad parity. Refer to Section 7.1.1.3.10 for further details.<br />

Invalid Cobra-bus Address Bystander<br />

When and Invalid Cobra-bus Address is broadcast by a <strong>CPU</strong> node, and not ack’ed,<br />

the Cobra-bus C_ERR L signal is asserted. The <strong>CPU</strong> that initiated the transaction<br />

will machine check and receive a Hardware Error Interrupt; the bystander <strong>CPU</strong> will<br />

only receive the Interrupt. The transaction initiator should handle the logging/error<br />

recovery for this error. This results in an access violation. Refer to Section 7.1.1.3.11<br />

for further details.<br />

Invalid Cobra-bus Address I/O Commander<br />

When and Invalid Cobra-bus Address is broadcast by the I/O node, and not ack’ed,<br />

the Cobra-bus C_ERR L signal is asserted. Both <strong>CPU</strong>’s will receive a Hardware Error<br />

Interrupt; and one should be designated to handle the logging/error recovery.<br />

Information found in the IO error register will indicate whether this error occurred<br />

as a result of mailbox operations or DMA assesses and the logging and recovery can<br />

be handled differently for each.<br />

Memory Uncorrectable EDC Error I/O Commander<br />

When an Uncorrectable EDC Error is detected while the I/O module is the Cobrabus<br />

Commander, the I/O module asserts the C_ERR L signal which causes the <strong>CPU</strong><br />

modules to receive a Hardware Error Interrupt. When this occurs, one <strong>CPU</strong> should<br />

be designated to handle the logging/error recovery. This may or may not be SYSTEM<br />

FATAL.<br />

Memory Correctable EDC Error<br />

When a Correctable EDC Error is detected the Memory module will assert the C_<br />

ERR L signal if the Enable CRD reporting bit is set in Memory module CSR6. This<br />

in turn causes the <strong>CPU</strong> module(s) to receive a Hardware Error Interrupt. When this<br />

occurs, one <strong>CPU</strong> should be designated to handle the scrubbing/logging activity.<br />

Duplicate Tag Store Parity Error<br />

Duplicate tag store parity errors are treated in hardware as uncorrectable errors;<br />

however the system will always recover from a parity error of this sort without any<br />

loss of data and/or memory coherence.<br />

176 Exceptions and Interrupts

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!