Software Engineering for Students A Programming Approach
Software Engineering for Students A Programming Approach Software Engineering for Students A Programming Approach
17.6 Recovery blocks 251 Recovery blocks will, however, also cope with hardware faults. For example, suppose that a fault develops in the region of main memory containing the primary sort method. The recovery block mechanism can then recover by switching over to an alternative method. There are stories that the developers of the recovery block mechanism at Newcastle University, England, used to invite visitors to remove memory boards from a live computer and observe that the computer continued apparently unaffected. We now examine some of the other aspects of recovery blocks. The acceptance test You might think that acceptance tests would be cumbersome methods, incurring high overheads, but this need not be so. Consider for example a method to calculate a square root. A method to check the outcome, simply by multiplying the answer by itself, is short and fast. Often, however, an acceptance test cannot be completely foolproof – because of the performance overhead. Take the example of the sort method. The acceptance test could check that the information had been sorted, that is, is in sequence. However, this does not guarantee that items have not been lost or created. An acceptance test, therefore, does not normally attempt to ensure the correctness of the software, but instead carries out a check to see whether the results are acceptably good. Note that if a fault like division by zero, a protection violation, an array subscript out of range occurs while one of the sort methods is being executed, then these also constitute the result of checks on the behavior of the software. (These are checks carried out by the hardware or the run-time system.) Thus either software acceptance tests or hardware checks can trigger fault tolerance. The alternatives The software components provided as backups must accomplish the same end as the primary module. But they should achieve this by means of a different algorithm so that the same problem doesn’t arise. Ideally the alternatives should be developed by different programmers, so that they are not unwittingly sharing assumptions. The alternatives should also be less complex than the primary, so that they will be less likely to fail. For this reason they will probably be poorer in their performance (speed). Another approach is to create alternatives that provide an increasingly degraded service. This allows the system to exhibit what is termed graceful degradation. As an example of graceful degradation, consider a steel rolling mill in which a computer controls a machine that chops off the required lengths of steel. Normally the computer employs a sophisticated algorithm to make optimum use of the steel, while satisfying customers’ orders. Should this algorithm fail, a simpler algorithm can be used that processes the orders strictly sequentially. This means that the system will keep going, albeit less efficiently. Implementation The language constructs of the recovery block mechanism hide the preservation of variables. The programmer does not need to explicitly declare which variables should be stored and when. The system must save values before any of the alternatives is executed,
252 Chapter 17 ■ Software robustness and restore them should any of the alternatives fail. Although this may seem a formidable task, only the values of variables that are changed need to be preserved, and the notation highlights which ones these are. Variables local to the alternatives need not be stored, nor need parameters passed by value. Only global variables that are changed need to be preserved. Nonetheless, storing data in this manner probably incurs too high an overhead if it is carried out solely by software. Studies indicate that, suitably implemented with hardware assistance, the speed overhead might be no more than about 15%. No programming language has yet incorporated the recovery block notation. Even so, the idea provides a framework which can be used, in conjunction with any programming language, to structure fault tolerant software. 17.7 ● n-version programming This form of programming means developing n versions of the same software component. For example, suppose a fly-by-wire airplane has a software component that decides how much the rudder should be moved in response to information about speed, pitch, throttle setting, etc. Three or more version of the component are implemented and run concurrently. The outputs are compared by a voting module, the majority vote wins and is used to control the rudder (see Figure 17.4). It is important that the different versions of the component are developed by different teams, using different methods and (preferably) at different locations, so that a minimum of assumptions are shared by the developers. By this means, the modules will use different algorithms, have different mistakes and produce different outputs (if they do) under different circumstances. Thus the chances are that when one of the components fails and produces an incorrect result, the others will perform correctly and the faulty component will be outvoted by the majority. Clearly the success of an n-programming scheme depends on the degree of independence of the different components. If the majority embody a similar design fault, they will fail together and the wrong decision will be the outcome. This is a bold assumption, and some studies have shown a tendency for different developers to commit the same mistakes, probably because of shared misunderstandings of the (same) specification. The expense of n-programming is in the effort to develop n versions, plus the processing overhead of running the multiple versions. If hardware reliability is also an issue, Input data Figure 17.4 Triple modular redundancy Version 1 Version 2 Version 3 Voting module Output data
- Page 223 and 224: CHAPTER 15 Object-oriented programm
- Page 225 and 226: 202 Chapter 15 ■ Object-oriented
- Page 227 and 228: 204 Chapter 15 ■ Object-oriented
- Page 229 and 230: 206 Chapter 15 ■ Object-oriented
- Page 231 and 232: 208 Chapter 15 ■ Object-oriented
- Page 233 and 234: 210 Chapter 15 ■ Object-oriented
- Page 235 and 236: 212 Chapter 15 ■ Object-oriented
- Page 237 and 238: 214 Chapter 15 ■ Object-oriented
- Page 239 and 240: 216 Chapter 15 ■ Object-oriented
- Page 241 and 242: 218 Chapter 15 ■ Object-oriented
- Page 243 and 244: 220 Chapter 15 ■ Object-oriented
- Page 245 and 246: 222 Chapter 16 ■ Programming in t
- Page 247 and 248: 224 Chapter 16 ■ Programming in t
- Page 249 and 250: 226 Chapter 16 ■ Programming in t
- Page 251 and 252: 228 Chapter 16 ■ Programming in t
- Page 253 and 254: 230 Chapter 16 ■ Programming in t
- Page 255 and 256: 232 Chapter 16 ■ Programming in t
- Page 257 and 258: 234 Chapter 16 ■ Programming in t
- Page 259 and 260: 236 Chapter 16 ■ Programming in t
- Page 261 and 262: 238 Chapter 17 ■ Software robustn
- Page 263 and 264: 240 Chapter 17 ■ Software robustn
- Page 265 and 266: 242 Chapter 17 ■ Software robustn
- Page 267 and 268: 244 Chapter 17 ■ Software robustn
- Page 269 and 270: 246 Chapter 17 ■ Software robustn
- Page 271 and 272: 248 Chapter 17 ■ Software robustn
- Page 273: 250 Chapter 17 ■ Software robustn
- Page 277 and 278: 254 Chapter 17 ■ Software robustn
- Page 279 and 280: 256 Chapter 17 ■ Software robustn
- Page 281 and 282: 258 Chapter 17 ■ Software robustn
- Page 283 and 284: 260 Chapter 18 ■ Scripting GNU/Li
- Page 285 and 286: 262 Chapter 18 ■ Scripting In sum
- Page 288: PART D VERIFICATION
- Page 291 and 292: 268 Chapter 19 ■ Testing We begin
- Page 293 and 294: 270 Chapter 19 ■ Testing within a
- Page 295 and 296: 272 Chapter 19 ■ Testing Test num
- Page 297 and 298: 274 Chapter 19 ■ Testing if (a >=
- Page 299 and 300: 276 Chapter 19 ■ Testing 3. apply
- Page 301 and 302: 278 Chapter 19 ■ Testing made con
- Page 303 and 304: 280 Chapter 19 ■ Testing 19.3 Dev
- Page 305 and 306: 282 Chapter 19 ■ Testing 19.2 The
- Page 307 and 308: 284 Chapter 20 ■ Groups The term
- Page 309 and 310: 286 Chapter 20 ■ Groups Of course
- Page 311 and 312: 288 Chapter 20 ■ Groups • Exerc
- Page 314 and 315: CHAPTER 21 This chapter explains: 2
- Page 316 and 317: Stage Input Output 21.3 Feedback be
- Page 318 and 319: Summary The essence and the strengt
- Page 320 and 321: CHAPTER 22 This chapter: 22.1 ● I
- Page 322 and 323: 22.2 The spiral model 299 to try to
17.6 Recovery blocks 251<br />
Recovery blocks will, however, also cope with hardware faults. For example, suppose<br />
that a fault develops in the region of main memory containing the primary sort method.<br />
The recovery block mechanism can then recover by switching over to an alternative<br />
method. There are stories that the developers of the recovery block mechanism at<br />
Newcastle University, England, used to invite visitors to remove memory boards from<br />
a live computer and observe that the computer continued apparently unaffected.<br />
We now examine some of the other aspects of recovery blocks.<br />
The acceptance test<br />
You might think that acceptance tests would be cumbersome methods, incurring high<br />
overheads, but this need not be so. Consider <strong>for</strong> example a method to calculate a square<br />
root. A method to check the outcome, simply by multiplying the answer by itself, is short<br />
and fast. Often, however, an acceptance test cannot be completely foolproof – because<br />
of the per<strong>for</strong>mance overhead. Take the example of the sort method. The acceptance test<br />
could check that the in<strong>for</strong>mation had been sorted, that is, is in sequence. However, this<br />
does not guarantee that items have not been lost or created. An acceptance test, there<strong>for</strong>e,<br />
does not normally attempt to ensure the correctness of the software, but instead<br />
carries out a check to see whether the results are acceptably good.<br />
Note that if a fault like division by zero, a protection violation, an array subscript out<br />
of range occurs while one of the sort methods is being executed, then these also constitute<br />
the result of checks on the behavior of the software. (These are checks carried<br />
out by the hardware or the run-time system.) Thus either software acceptance tests or<br />
hardware checks can trigger fault tolerance.<br />
The alternatives<br />
The software components provided as backups must accomplish the same end as the<br />
primary module. But they should achieve this by means of a different algorithm so that<br />
the same problem doesn’t arise. Ideally the alternatives should be developed by different<br />
programmers, so that they are not unwittingly sharing assumptions. The alternatives<br />
should also be less complex than the primary, so that they will be less likely to fail.<br />
For this reason they will probably be poorer in their per<strong>for</strong>mance (speed).<br />
Another approach is to create alternatives that provide an increasingly degraded service.<br />
This allows the system to exhibit what is termed graceful degradation. As an example of<br />
graceful degradation, consider a steel rolling mill in which a computer controls a machine<br />
that chops off the required lengths of steel. Normally the computer employs a sophisticated<br />
algorithm to make optimum use of the steel, while satisfying customers’ orders. Should<br />
this algorithm fail, a simpler algorithm can be used that processes the orders strictly<br />
sequentially. This means that the system will keep going, albeit less efficiently.<br />
Implementation<br />
The language constructs of the recovery block mechanism hide the preservation of variables.<br />
The programmer does not need to explicitly declare which variables should be<br />
stored and when. The system must save values be<strong>for</strong>e any of the alternatives is executed,