26.12.2014 Views

Fabric Manager Users Guide, Version 6.1, Revision A - QLogic

Fabric Manager Users Guide, Version 6.1, Revision A - QLogic

Fabric Manager Users Guide, Version 6.1, Revision A - QLogic

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2–Advanced <strong>Fabric</strong> <strong>Manager</strong> Capabilities<br />

Packet and Switch Timers<br />

Packet LifeTime<br />

• HeadOfQueueLife (HoqLife)<br />

• SwitchLifeTime<br />

• VLStallCount<br />

These can be used to relieve fabric congestion and avoid fabric deadlocks by<br />

discarding packets. Discards help prevent back pressure from propagating deep<br />

into the core of the fabric, however such discards will cause end nodes to time-out<br />

and retransmit.<br />

If a packet stays at the Head of a Switch Egress Port for more than HoqLife, it is<br />

discarded. Similarly a packet queued in a switch for more than SwitchLifetime is<br />

discarded. SwitchLifetime and HoqLife can also be set to infinite in which case no<br />

discards will occur.<br />

VLStallCount controls a second tier more aggressive discard. If VLStallCount<br />

packets in a row are discarded due to HoqLife by a given VL on an egress port,<br />

that egress port's VL enters the VL Stalled State and discards all that VL's egress<br />

packets for 8*HoqLife.<br />

Packets discarded for any of these reasons will be included in the TxDiscards<br />

counter for the Port, which can be queried using Fast<strong>Fabric</strong>. Such discards are<br />

also included in the Congestion information monitored by the PM and available<br />

using Fast<strong>Fabric</strong> tools such as iba_top, iba_rfm and iba_paquery. A<br />

congestion which is severe enough to cause packet discards is given a heavy<br />

weight so that it will not go unnoticed.<br />

Within an Host Channel Adapter/Target Channel Adapter, every Reliable Queue<br />

Pair (QP) has a time-out configured. If there is no acknowledgment (ACK) for a<br />

transmitted QP packet within the time-out, the QP will retry the send. There is a<br />

limit on retries (up to 7) after which the QP will fail with a Retry Timeout Exceeded<br />

error.<br />

The InfiniBand standard defines that the timeout for a QP should be computed<br />

based on the Packet LifeTime reported by the SA in a PathRecord. The LifeTime<br />

represents the one way transit time through the fabric. Therefore, the actual QP<br />

timeout will be at least 2x the Packet LifeTime (plus some overhead to allow for<br />

processing delays in the Target Channel Adapter/Host Channel Adapter at each<br />

end of the fabric).<br />

Careful selection of Packet LifeTime (and QP time-outs) is important. If time-outs<br />

are set too large, then the impact of a lost packet could be significant. Conversely<br />

if the time-outs are set to low, then minor fabric delays could cause unnecessary<br />

retries and possibly even Retry Timeout Exceeded errors and the resulting<br />

disruption of applications.<br />

The SM allows for two approaches to configure Packet LifeTime<br />

2-20 IB0054608-01 B

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!