26.12.2014 Views

Fabric Manager Users Guide, Version 6.1, Revision A - QLogic

Fabric Manager Users Guide, Version 6.1, Revision A - QLogic

Fabric Manager Users Guide, Version 6.1, Revision A - QLogic

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2–Advanced <strong>Fabric</strong> <strong>Manager</strong> Capabilities<br />

<strong>Fabric</strong> Change Detection<br />

During the analysis and reprogramming process the fabric may still be changing.<br />

In this case errors may occur. When more than the set SweepErrorsThreshold<br />

parameter of non-recoverable errors (such as a device going offline mid sweep)<br />

occur during a single sweep, the SM will abandon the sweep and start it over, to<br />

obtain a complete and accurate view of the fabric. The SM will not abandon the<br />

sweep more than the set SweepAbandonThreshold parameter of consecutive<br />

times, after which it will do the best it can. This helps to handle a fabric which is<br />

constantly changing, such as a fabric with an unstable link.<br />

Similarly, if a port issues more than the TrapThreshold number of changes per<br />

minute, the SM considers the link unstable and will disable the port, removing it<br />

from the fabric, preventing any traffic to be routed over it.<br />

Additionally the Suppress1x configuration parameter allows the SM to disable<br />

1x links, removing low speed and low quality links from the fabric. All hardware<br />

available from <strong>QLogic</strong> is capable of 4x or more. 1x links typically represent<br />

devices with partially bad cables or hardware.<br />

Tolerance of Slow Nodes<br />

In rare cases, nodes under a heavy load, such as when running high stress MPI<br />

applications, will be slow to respond to SM sweeps. To avoid disrupting<br />

application runs, the SM can be configured using NonRespTimeout and<br />

NonRespMaxCount to be more tolerant of such devices and assume their<br />

capabilities have not changed since the last successful sweep.<br />

The trade-off in increasing the tolerance is that loss of nodes will be detected<br />

much slower. Typically this capability is only relevant to Host Channel Adapters.<br />

The risk in this feature mainly applies to nodes which hang but keep their link up<br />

so that the neighbor switch does not report a port state change.<br />

NOTE:<br />

When using OFED, it is recommended to set RENICE_IB_MAD=yes, this<br />

will ensure rapid responses by the SMA and actually reduce overhead by<br />

avoiding the cost of retries. This option is enabled by default when using<br />

<strong>QLogic</strong> OFED+<br />

Multicast Denial of Service<br />

In rare cases, nodes send excessive multicast creates/deletes several times a<br />

second for the same group causing continuous SM sweeps. To stop the<br />

continuous SM sweeps, Multicast (MC) Denial of Service (DOS) can be set up in<br />

the configuration file to monitor the MC DOS Threshold, set up the interval of<br />

monitoring, and either bounce the port or disable the port.<br />

2-2 IB0054608-01 B

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!