Software Engineering for Students A Programming Approach

Software Engineering for Students A Programming Approach Software Engineering for Students A Programming Approach

web.firat.edu.tr
from web.firat.edu.tr More from this publisher
21.08.2013 Views

17.6 Recovery blocks 251 Recovery blocks will, however, also cope with hardware faults. For example, suppose that a fault develops in the region of main memory containing the primary sort method. The recovery block mechanism can then recover by switching over to an alternative method. There are stories that the developers of the recovery block mechanism at Newcastle University, England, used to invite visitors to remove memory boards from a live computer and observe that the computer continued apparently unaffected. We now examine some of the other aspects of recovery blocks. The acceptance test You might think that acceptance tests would be cumbersome methods, incurring high overheads, but this need not be so. Consider for example a method to calculate a square root. A method to check the outcome, simply by multiplying the answer by itself, is short and fast. Often, however, an acceptance test cannot be completely foolproof – because of the performance overhead. Take the example of the sort method. The acceptance test could check that the information had been sorted, that is, is in sequence. However, this does not guarantee that items have not been lost or created. An acceptance test, therefore, does not normally attempt to ensure the correctness of the software, but instead carries out a check to see whether the results are acceptably good. Note that if a fault like division by zero, a protection violation, an array subscript out of range occurs while one of the sort methods is being executed, then these also constitute the result of checks on the behavior of the software. (These are checks carried out by the hardware or the run-time system.) Thus either software acceptance tests or hardware checks can trigger fault tolerance. The alternatives The software components provided as backups must accomplish the same end as the primary module. But they should achieve this by means of a different algorithm so that the same problem doesn’t arise. Ideally the alternatives should be developed by different programmers, so that they are not unwittingly sharing assumptions. The alternatives should also be less complex than the primary, so that they will be less likely to fail. For this reason they will probably be poorer in their performance (speed). Another approach is to create alternatives that provide an increasingly degraded service. This allows the system to exhibit what is termed graceful degradation. As an example of graceful degradation, consider a steel rolling mill in which a computer controls a machine that chops off the required lengths of steel. Normally the computer employs a sophisticated algorithm to make optimum use of the steel, while satisfying customers’ orders. Should this algorithm fail, a simpler algorithm can be used that processes the orders strictly sequentially. This means that the system will keep going, albeit less efficiently. Implementation The language constructs of the recovery block mechanism hide the preservation of variables. The programmer does not need to explicitly declare which variables should be stored and when. The system must save values before any of the alternatives is executed,

252 Chapter 17 ■ Software robustness and restore them should any of the alternatives fail. Although this may seem a formidable task, only the values of variables that are changed need to be preserved, and the notation highlights which ones these are. Variables local to the alternatives need not be stored, nor need parameters passed by value. Only global variables that are changed need to be preserved. Nonetheless, storing data in this manner probably incurs too high an overhead if it is carried out solely by software. Studies indicate that, suitably implemented with hardware assistance, the speed overhead might be no more than about 15%. No programming language has yet incorporated the recovery block notation. Even so, the idea provides a framework which can be used, in conjunction with any programming language, to structure fault tolerant software. 17.7 ● n-version programming This form of programming means developing n versions of the same software component. For example, suppose a fly-by-wire airplane has a software component that decides how much the rudder should be moved in response to information about speed, pitch, throttle setting, etc. Three or more version of the component are implemented and run concurrently. The outputs are compared by a voting module, the majority vote wins and is used to control the rudder (see Figure 17.4). It is important that the different versions of the component are developed by different teams, using different methods and (preferably) at different locations, so that a minimum of assumptions are shared by the developers. By this means, the modules will use different algorithms, have different mistakes and produce different outputs (if they do) under different circumstances. Thus the chances are that when one of the components fails and produces an incorrect result, the others will perform correctly and the faulty component will be outvoted by the majority. Clearly the success of an n-programming scheme depends on the degree of independence of the different components. If the majority embody a similar design fault, they will fail together and the wrong decision will be the outcome. This is a bold assumption, and some studies have shown a tendency for different developers to commit the same mistakes, probably because of shared misunderstandings of the (same) specification. The expense of n-programming is in the effort to develop n versions, plus the processing overhead of running the multiple versions. If hardware reliability is also an issue, Input data Figure 17.4 Triple modular redundancy Version 1 Version 2 Version 3 Voting module Output data

17.6 Recovery blocks 251<br />

Recovery blocks will, however, also cope with hardware faults. For example, suppose<br />

that a fault develops in the region of main memory containing the primary sort method.<br />

The recovery block mechanism can then recover by switching over to an alternative<br />

method. There are stories that the developers of the recovery block mechanism at<br />

Newcastle University, England, used to invite visitors to remove memory boards from<br />

a live computer and observe that the computer continued apparently unaffected.<br />

We now examine some of the other aspects of recovery blocks.<br />

The acceptance test<br />

You might think that acceptance tests would be cumbersome methods, incurring high<br />

overheads, but this need not be so. Consider <strong>for</strong> example a method to calculate a square<br />

root. A method to check the outcome, simply by multiplying the answer by itself, is short<br />

and fast. Often, however, an acceptance test cannot be completely foolproof – because<br />

of the per<strong>for</strong>mance overhead. Take the example of the sort method. The acceptance test<br />

could check that the in<strong>for</strong>mation had been sorted, that is, is in sequence. However, this<br />

does not guarantee that items have not been lost or created. An acceptance test, there<strong>for</strong>e,<br />

does not normally attempt to ensure the correctness of the software, but instead<br />

carries out a check to see whether the results are acceptably good.<br />

Note that if a fault like division by zero, a protection violation, an array subscript out<br />

of range occurs while one of the sort methods is being executed, then these also constitute<br />

the result of checks on the behavior of the software. (These are checks carried<br />

out by the hardware or the run-time system.) Thus either software acceptance tests or<br />

hardware checks can trigger fault tolerance.<br />

The alternatives<br />

The software components provided as backups must accomplish the same end as the<br />

primary module. But they should achieve this by means of a different algorithm so that<br />

the same problem doesn’t arise. Ideally the alternatives should be developed by different<br />

programmers, so that they are not unwittingly sharing assumptions. The alternatives<br />

should also be less complex than the primary, so that they will be less likely to fail.<br />

For this reason they will probably be poorer in their per<strong>for</strong>mance (speed).<br />

Another approach is to create alternatives that provide an increasingly degraded service.<br />

This allows the system to exhibit what is termed graceful degradation. As an example of<br />

graceful degradation, consider a steel rolling mill in which a computer controls a machine<br />

that chops off the required lengths of steel. Normally the computer employs a sophisticated<br />

algorithm to make optimum use of the steel, while satisfying customers’ orders. Should<br />

this algorithm fail, a simpler algorithm can be used that processes the orders strictly<br />

sequentially. This means that the system will keep going, albeit less efficiently.<br />

Implementation<br />

The language constructs of the recovery block mechanism hide the preservation of variables.<br />

The programmer does not need to explicitly declare which variables should be<br />

stored and when. The system must save values be<strong>for</strong>e any of the alternatives is executed,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!