27.12.2013 Views

“Bagatelle in C arranged for VDM SoLo” - Universidade do Minho

“Bagatelle in C arranged for VDM SoLo” - Universidade do Minho

“Bagatelle in C arranged for VDM SoLo” - Universidade do Minho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>“Bagatelle</strong> <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> <strong>SoLo”</strong><br />

J.N. Oliveira<br />

Ref. [Ol01] — 2001<br />

J.N. Oliveira. <strong>“Bagatelle</strong> <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> <strong>SoLo”</strong>. JUCS, 7(8):754–781, 2001.


Journal of Universal Computer Science, vol. 7, no. 8 (2001), 754-781<br />

submitted: 18/5/01, accepted: 21/8/01, appeared: 28/8/01Spr<strong>in</strong>ger Pub. Co.<br />

<strong>“Bagatelle</strong> <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> <strong>SoLo”</strong><br />

José N. Oliveira<br />

(<strong>Universidade</strong> <strong>do</strong> M<strong>in</strong>ho, Braga, Portugal<br />

Senior R&D Partner, SIDEREUS S.A., Porto, Portugal<br />

jno@di.um<strong>in</strong>ho.pt)<br />

Abstract: This paper sketches a reverse eng<strong>in</strong>eer<strong>in</strong>g discipl<strong>in</strong>e which comb<strong>in</strong>es <strong>for</strong>mal and<br />

semi-<strong>for</strong>mal methods. Central to the <strong>for</strong>mer is denotational semantics, expressed <strong>in</strong> the ISO/IEC<br />

13817-1 standard specification language (<strong>VDM</strong>-SL). This is strengthened with algebra of programm<strong>in</strong>g,<br />

which is applied <strong>in</strong> “reverse order” so as to reconstruct <strong>for</strong>mal specifications from<br />

legacy code. The latter <strong>in</strong>clude code slic<strong>in</strong>g, a “shortcut” which trims <strong>do</strong>wn the complexity of<br />

handl<strong>in</strong>g the <strong>for</strong>mal semantics of all program variables at the same time.<br />

A key po<strong>in</strong>t of the approach is its constructive style. Reverse calculations go as far as absorb<strong>in</strong>g<br />

auxiliary variables, <strong>in</strong>troduc<strong>in</strong>g mutual recursion (if applicable) and revers<strong>in</strong>g semantic denotations<br />

<strong>in</strong>to standard generic programm<strong>in</strong>g schemata such as cata/paramorphisms.<br />

The approach is illustrated <strong>for</strong> a small piece of code already studied <strong>in</strong> the code-slic<strong>in</strong>g literature:<br />

Kernighan and Richtie’s word count C programm<strong>in</strong>g “bagatelle”.<br />

Key Words: Reverse eng<strong>in</strong>eer<strong>in</strong>g; denotational semantics; slic<strong>in</strong>g; algebra of programm<strong>in</strong>g.<br />

Category: F.3.1 (Specify<strong>in</strong>g and Verify<strong>in</strong>g and Reason<strong>in</strong>g About Programs); D.2.1 (Requirements/Specifications)<br />

1 Introduction<br />

The development of computer software has always been a contradictory activity. Four<br />

decades ago, a “software crisis” was identified which drew attention to the poor technical<br />

sophistication of the (then emerg<strong>in</strong>g) software technology. Many years passed,<br />

the advent of home comput<strong>in</strong>g and world-wide network facilities has brought computer<br />

software unprecedented relevance <strong>in</strong> everybody’s life. However, the theoretical<br />

advances <strong>in</strong> programm<strong>in</strong>g science — which has become a stable discipl<strong>in</strong>e meanwhile<br />

— are still by and large ignored by the ever grow<strong>in</strong>g community of programmers.<br />

The situation is no better <strong>in</strong> education: software design rema<strong>in</strong>s among the very few<br />

topics which many eng<strong>in</strong>eer<strong>in</strong>g departments still accept to address with little (if any)<br />

scientific basis.<br />

Time-to-market is among the ma<strong>in</strong> factors en<strong>for</strong>c<strong>in</strong>g ad hoc programm<strong>in</strong>g practice.<br />

However, a quick market entails <strong>in</strong>direct costs such as permanent risk, poor quality and<br />

low reliability of the produced code, and hard ma<strong>in</strong>tenance.<br />

In a situation <strong>in</strong> which the only quality certificate of the runn<strong>in</strong>g software artifact<br />

still is life-cycle endurance, customers and software producers are little prepared to<br />

change, modify or improve runn<strong>in</strong>g code. So there is little opportunity <strong>for</strong> brand new,<br />

technically sound code to replace the old one. (One is never sure about the implications<br />

of a software update.) However, faced with so risky a dependence on legacy software,


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

755<br />

managers are more and more prepared to spend resources to <strong>in</strong>crease confidence on —<br />

ie. the level of understand<strong>in</strong>g of — their (untouchable) code.<br />

2 Program understand<strong>in</strong>g<br />

Program understand<strong>in</strong>g affiliates to reverse eng<strong>in</strong>eer<strong>in</strong>g, understood as the analysis of<br />

a system <strong>in</strong> order to identify components and <strong>in</strong>tended behaviour <strong>in</strong> order to create<br />

higher level abstractions of the system [CI90]. Reverse eng<strong>in</strong>eer<strong>in</strong>g <strong>in</strong>cludes reverse<br />

specification — the analytical process of <strong>in</strong>ferr<strong>in</strong>g the orig<strong>in</strong>al specification (which may<br />

have never been written) of some runn<strong>in</strong>g piece of software.<br />

If (on statistical grounds) <strong>for</strong>ward software eng<strong>in</strong>eer<strong>in</strong>g can today be regarded as<br />

a lost opportunity <strong>for</strong> <strong>for</strong>mal methods 1 , its converse still looks a promis<strong>in</strong>g area <strong>for</strong><br />

their application. This is due to the complexity of reverse eng<strong>in</strong>eer<strong>in</strong>g problems. So it<br />

is likely that the <strong>for</strong>mal techniques developed by academics <strong>for</strong> the production of fresh,<br />

high quality software will eventually see the light of (<strong>in</strong>dustrial) success <strong>in</strong> their reverse<br />

application to pre-exist<strong>in</strong>g code. Even if, <strong>for</strong> this purpose, they have to merge with<br />

other, <strong>in</strong><strong>for</strong>mal program ma<strong>in</strong>tenance and debugg<strong>in</strong>g techniques.<br />

3 About this paper<br />

This paper is a modest contribution <strong>in</strong> the spirit of the last paragraph above. It reports<br />

work <strong>in</strong> progress on the development of a discipl<strong>in</strong>e <strong>for</strong> reverse eng<strong>in</strong>eer<strong>in</strong>g which comb<strong>in</strong>es<br />

<strong>for</strong>mal and semi-<strong>for</strong>mal methods. The <strong>for</strong>mal basis of the approach enriches standard<br />

denotational semantics techniques with the algebra of programm<strong>in</strong>g [BdM97],<br />

which is applied <strong>in</strong> “reverse order” so as to reconstruct the <strong>for</strong>mal specifications of<br />

legacy code. Because of the complexity <strong>in</strong>herent <strong>in</strong> handl<strong>in</strong>g <strong>for</strong>mal semantic descriptions<br />

of algorithmic code, reverse algebraic calculation is preceded by code slic<strong>in</strong>g<br />

[Wei81].<br />

This work is the follow up of a project which has addressed data reverse eng<strong>in</strong>eer<strong>in</strong>g<br />

(DRE) <strong>in</strong> a similar way [NSO99]: data are reversed by calculation accord<strong>in</strong>g to<br />

an algebra of data-representation laws which <strong>in</strong>clude the trans<strong>for</strong>mation of relational<br />

database meta-data <strong>in</strong>to abstract ISO/IEC 13817-1 standard (<strong>VDM</strong>-SL) <strong>for</strong>mal descriptions<br />

[FL98].<br />

Code reversal is, <strong>in</strong> general, harder than DRE and this paper will not present a<br />

general solution to the problems <strong>in</strong>volved. Instead, a small example is worked out which<br />

triggers some <strong>in</strong>tuitions about the strategy to follow <strong>in</strong> real situations. This example is<br />

the “word count” programm<strong>in</strong>g “bagatelle” 2 presented by Kernighan and Richtie’s <strong>in</strong><br />

their well-known <strong>in</strong>troductory book to the C programm<strong>in</strong>g language (Fig. 1). This can<br />

With notable exceptions <strong>in</strong> areas such as safety-critical and dependable comput<strong>in</strong>g.<br />

In the musical term<strong>in</strong>ology, a bagatelle (“musical trifle”) is a short light or whimsical piece,<br />

usually written <strong>for</strong> piano.


756 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

1 #def<strong>in</strong>e YES 1<br />

2 #def<strong>in</strong>e NO 0<br />

3 ma<strong>in</strong>()<br />

4 {<br />

5 <strong>in</strong>t c, nl, nw, nc, <strong>in</strong>word ;<br />

6 <strong>in</strong>word = NO ;<br />

7 nl = 0;<br />

8 nw = 0;<br />

9 nc = 0;<br />

10 c = getchar();<br />

11 while ( c != EOF ) {<br />

12 nc = nc + 1;<br />

13 if ( c == ’\n’)<br />

14 nl = nl + 1;<br />

15 if ( c == ’ ’ || c == ’\n’ || c == ’\t’)<br />

16 <strong>in</strong>word = NO;<br />

17 else if (<strong>in</strong>word == NO) {<br />

18 <strong>in</strong>word = YES ;<br />

19 nw = nw + 1;<br />

20 }<br />

21 c = getchar();<br />

22 }<br />

23 pr<strong>in</strong>tf("%d",nl);<br />

24 pr<strong>in</strong>tf("%d",nw);<br />

25 pr<strong>in</strong>tf("%d",nc);<br />

26 }<br />

Figure 1: The word count program [KR78].<br />

be recognized as the core of the Unix wc command, which pr<strong>in</strong>ts the number of bytes,<br />

words, and l<strong>in</strong>es <strong>in</strong> files.<br />

However simple, this example is expressive enough to illustrate our ma<strong>in</strong> po<strong>in</strong>t:<br />

that, <strong>in</strong> practice, denotational techniques alone are <strong>in</strong>sufficient <strong>for</strong> code reversal, and<br />

that they benefit from comb<strong>in</strong><strong>in</strong>g with other — either <strong>for</strong>mal or <strong>in</strong><strong>for</strong>mal — methods.<br />

The rema<strong>in</strong>der of this paper is structured as follows: <strong>in</strong> the follow<strong>in</strong>g section we<br />

<strong>in</strong>troduce our (light-weight) approach to denotational semantics, which is illustrated <strong>in</strong><br />

section 5 <strong>for</strong> the word count example, us<strong>in</strong>g <strong>VDM</strong>-SL notation. In sections 6 and 7 we<br />

motivate the application of slic<strong>in</strong>g to <strong>for</strong>mal semantics. This is illustrated <strong>in</strong> section 8.<br />

Section 9 motivates the need <strong>for</strong> a posteriori algebraic calculations, to be applied to<br />

each particular slice. The algebra of programm<strong>in</strong>g is briefly presented <strong>in</strong> sections 10<br />

and 11, and f<strong>in</strong>ally applied to one of the slices of the word-count example (section 12).<br />

The paper ends with some conclusions and review of related work.


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

757<br />

4 Denotational semantics <strong>for</strong> program synthesis / analysis<br />

Let be a piece of algorithmic code and denote its denotational semantics, ie, the<br />

<strong>in</strong>put/output relation which captures the behaviour of . In <strong>for</strong>ward eng<strong>in</strong>eer<strong>in</strong>g one<br />

starts from a specification and rewrites it over and over aga<strong>in</strong>,<br />

until the semantics of some piece of code is found. Structured programm<strong>in</strong>g<br />

makes it possible to take advantage of the compositional properties of the available<br />

program comb<strong>in</strong>ators, <strong>for</strong> <strong>in</strong>stance sequential composition<br />

(1)<br />

where “ ” denotes relational composition. So, if specification is written <strong>do</strong>wn<br />

and one f<strong>in</strong>ds and such that and hold, then is a<br />

valid ref<strong>in</strong>ement step <strong>in</strong> program synthesis.<br />

By contrast, it seems natural that program analysis should go the other way round:<br />

try and identify chunks of code such that their semantics can be <strong>in</strong>ferred and abstracted<br />

upon. There is a fact to reta<strong>in</strong>, though: <strong>in</strong> go<strong>in</strong>g backwards along the direction <strong>in</strong><br />

(1) one can add arbitrary nondeterm<strong>in</strong>ism and end up <strong>in</strong> a specification which is too<br />

vague.<br />

For a determ<strong>in</strong>istic program, relation specializes to a function which maps<br />

the state of (ie. the set of all variables which has access to) be<strong>for</strong>e execution takes<br />

place, to the state after such an execution. This retraction to a functional semantics, albeit<br />

restrictive <strong>in</strong> general, is acceptable <strong>in</strong> face of simple programs (“bagatelles”) such<br />

as word count (Fig. 1). The theory gets simpler and more <strong>in</strong>tuitive and this helps the<br />

(<strong>in</strong><strong>for</strong>mal) reader to appreciate the power of <strong>for</strong>mal reason<strong>in</strong>g. This expla<strong>in</strong>s our decision<br />

<strong>in</strong> this paper to reverse specify ma<strong>in</strong> <strong>in</strong> the word-count example only <strong>in</strong> terms<br />

of its functional specification 3 . So we shall be deal<strong>in</strong>g with functions and functional<br />

equality, rather than relations and the subset order<strong>in</strong>g.<br />

5 Start<strong>in</strong>g denotational semantics <strong>for</strong> word count<br />

The <strong>VDM</strong>-SL specification language [FL98] will be a<strong>do</strong>pted to express the <strong>for</strong>mal semantics<br />

of (determ<strong>in</strong>istic) code such as word count. One starts by <strong>in</strong>ferr<strong>in</strong>g the program<br />

state from the program variables:<br />

Later on we will address shortcom<strong>in</strong>gs of this decision, acceptable only <strong>in</strong> face of determ<strong>in</strong>istic,<br />

term<strong>in</strong>at<strong>in</strong>g programs. The trade-off between the relational and functional foundations<br />

<strong>for</strong> program calculation is apparent <strong>in</strong> [BdM97]. The power of the relational approach can be<br />

appreciated <strong>in</strong> [BH93, Bac00], among a vast literature on this subject.


758 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

:: char<br />

For notation convenience, each selector <strong>in</strong> is named after the correspond<strong>in</strong>g program<br />

variable. Note the char and abstractions of <strong>in</strong> variables and , respectively.<br />

Next, the <strong>in</strong>put stream is made explicit <strong>in</strong> a way such that EOF is modelled by<br />

nil ,<br />

:: char<br />

char<br />

cf. the follow<strong>in</strong>g semantics <strong>for</strong> :<br />

Let<br />

be the function which captures the semantics of the whole program, that is<br />

ma<strong>in</strong>()<br />

In <strong>VDM</strong>-SL notation one has:<br />

char<br />

let mk- nil false <strong>in</strong><br />

mkif<br />

then mk- nil<br />

else mk- tl hd<br />

mk-<br />

The while-loop <strong>in</strong> ma<strong>in</strong>() is modelled by auxiliary function<br />

state, of type . The loop <strong>in</strong>itialization is captured by record<br />

, which updates the<br />

passed as parameter to :<br />

mk- nil false


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

759<br />

let<br />

if<br />

then<br />

else let<br />

nil<br />

<strong>in</strong><br />

<strong>in</strong><br />

State model can be made simpler by filter<strong>in</strong>g the redundancy of entries and<br />

. This means to data-abstract to<br />

:: char<br />

under abstraction (retrieve [Jon86]) function<br />

if<br />

then<br />

else<br />

if<br />

then<br />

else if<br />

then<br />

else<br />

if<br />

then false<br />

else if<br />

then true<br />

else<br />

mk-<br />

mkif<br />

nil<br />

then mkelse<br />

mk-<br />

Then reduces to hd tl -manipulation, lead<strong>in</strong>g to a simpler version of :


760 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

mkif<br />

then mkelse<br />

let if hd<br />

then<br />

else if<br />

then<br />

else<br />

if hd<br />

then false<br />

else if<br />

then true<br />

else <strong>in</strong><br />

mk- tl<br />

if hd<br />

then<br />

else<br />

where an auxiliary function was <strong>in</strong>troduced<br />

char<br />

abstract<strong>in</strong>g the test <strong>for</strong> a separator character. F<strong>in</strong>ally,<br />

and renamed to :<br />

is redef<strong>in</strong>ed accord<strong>in</strong>gly<br />

char<br />

let mk- false <strong>in</strong><br />

mk-<br />

6 Loop <strong>in</strong>ter-comb<strong>in</strong>ation and code fusion<br />

When writ<strong>in</strong>g code such as word count, programmers tend to comb<strong>in</strong>e <strong>in</strong>to a s<strong>in</strong>gle<br />

programm<strong>in</strong>g construct (eg. a loop) two or more logically <strong>in</strong>dependent computations.<br />

This saves auxiliary (eg. <strong>in</strong>termediate) data-structur<strong>in</strong>g and the programm<strong>in</strong>g “tricks” to


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

761<br />

achieve efficiency <strong>in</strong> this way can be framed <strong>in</strong>to two broad categories, one “sequential”<br />

and one “parallel”. The <strong>for</strong>mer can be expressed <strong>in</strong><strong>for</strong>mally as follows,<br />

as illustrated <strong>in</strong> a diagram:<br />

preferred to (2)<br />

In other situations the idea <strong>in</strong> the programmer’s m<strong>in</strong>d is to per<strong>for</strong>m the computation<br />

of several (possibly <strong>in</strong>dependent) output variables <strong>in</strong> the same loop, all shar<strong>in</strong>g the same<br />

visit to the <strong>in</strong>put data structure. This is depicted (<strong>for</strong> two such variables) <strong>in</strong> the follow<strong>in</strong>g<br />

draw<strong>in</strong>g,<br />

that is,<br />

preferred to (3)<br />

where the angle brackets <strong>in</strong> denote the “parallel” execution of computations<br />

and on the same <strong>in</strong>put.<br />

The word count program of Fig. 1 is an <strong>in</strong>stance of situation (3): the computations<br />

of nc, nl and nw (resp. the number of characters, number of l<strong>in</strong>es and number of words<br />

<strong>in</strong> the <strong>in</strong>put stream) proceed <strong>in</strong> parallel, ie. <strong>in</strong> the same while-loop access to the <strong>in</strong>put<br />

stream.


762 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

7 Code slic<strong>in</strong>g<br />

As happens with word count, the loop-<strong>in</strong>tercomb<strong>in</strong>ation strategy suggested <strong>in</strong> (3) may<br />

<strong>in</strong>volve computations on the right which are <strong>in</strong>dependent of each other. The loop-body<br />

on the left can become really <strong>in</strong>scrutable if the programmer <strong>do</strong>esn’t bother to <strong>in</strong>terleave<br />

the statements of with the statements of <strong>in</strong> an arbitrary, not obvious way.<br />

This is where code slic<strong>in</strong>g proves useful <strong>in</strong> debugg<strong>in</strong>g practice. Program slic<strong>in</strong>g was<br />

<strong>in</strong>troduced by Weiser [Wei81] as a means to help programmers <strong>in</strong> understand<strong>in</strong>g <strong>for</strong>eign<br />

code and <strong>in</strong> debugg<strong>in</strong>g. It is a technique <strong>for</strong> restrict<strong>in</strong>g the behaviour of a program<br />

to some fragment of <strong>in</strong>terest. The (decomposition) slice<br />

on variable<br />

yields that portion of the program which contributes to the computation of . Slices<br />

are executable (sub)programs which can be extracted automatically from the source<br />

code by data flow and control flow analysis. Gallagher and Lyle [GL91] show how such<br />

slices, ordered by set <strong>in</strong>clusion, <strong>for</strong>m a semi-lattice where meet corresponds to code<br />

shar<strong>in</strong>g.<br />

The lesson learned from the code slic<strong>in</strong>g community po<strong>in</strong>ts to a clear direction <strong>in</strong><br />

program understand<strong>in</strong>g: <strong>in</strong>stead of work<strong>in</strong>g with a monolithic state vector <strong>in</strong>volv<strong>in</strong>g<br />

state variables , <strong>for</strong> and each of type , and express<strong>in</strong>g<br />

semantics <strong>in</strong> terms of state-vector trans<strong>for</strong>mations , one “splits”<br />

the effect on the state vector <strong>in</strong> terms of <strong>in</strong>dependent computations .<br />

Each <strong>in</strong>dividual is self-sufficient and smaller than the orig<strong>in</strong>al code, there<strong>for</strong>e easier<br />

to understand (or reverse calculate).<br />

The idea of us<strong>in</strong>g slic<strong>in</strong>g is that of “short-cutt<strong>in</strong>g” the work of understand<strong>in</strong>g a large<br />

program via the syntactic separation of the program <strong>in</strong> its constituent slices. However,<br />

how sound is this strategy? On a denotational semantics basis, how <strong>do</strong> we guarantee<br />

that, altogether, the slices’ semantics “re-constitute” the whole program semantics? We<br />

will rely on the follow<strong>in</strong>g conjecture:<br />

Let be a program exhibit<strong>in</strong>g output variables, and let be the<br />

correspond<strong>in</strong>g slices. Then the semantics of can be recovered by comb<strong>in</strong><strong>in</strong>g<br />

these and only these slices, ie.<br />

In other words, slic<strong>in</strong>g is a semantically sound code-decomposition technique.<br />

The angle-bracket comb<strong>in</strong>ator of (4) will be <strong>for</strong>malized <strong>in</strong> section 11. Although the<br />

equality <strong>do</strong>es not hold <strong>for</strong> all programs (see section 15), it <strong>do</strong>es hold <strong>for</strong> a broad class<br />

of determ<strong>in</strong>istic programs which <strong>in</strong>cludes wc.<br />

(4)<br />

8 The slices of wc and their <strong>for</strong>mal semantics<br />

In general, the semantics of a program block <strong>in</strong>volv<strong>in</strong>g while, <strong>for</strong> or <strong>do</strong> statements<br />

and/or recursion will have to be <strong>in</strong>ductively def<strong>in</strong>ed over some <strong>in</strong>put type , eg. a f<strong>in</strong>ite


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

763<br />

list, an array, a tree. So this type should be s<strong>in</strong>gled out from the space vector:<br />

Slic<strong>in</strong>g will supply a collection of slices<br />

such that<br />

(5)<br />

which we may want to abstract <strong>in</strong>to <strong>in</strong>ductive functions with shape<br />

(6)<br />

The trans<strong>for</strong>mation from (6) to (5) is a well-known technique <strong>for</strong> improv<strong>in</strong>g the efficiency<br />

of functional programs called accumulation parameter <strong>in</strong>troduction [BdM97].<br />

Below we shall be <strong>in</strong>terested <strong>in</strong> the reverse application of this rule, that is, we want to<br />

remove accumulations.<br />

Let us <strong>in</strong>stantiate this process <strong>for</strong> word count (Fig. 1). For this piece of code, Gallagher<br />

and Lyle [GL91] identify the follow<strong>in</strong>g slice decomposition semi-lattice:<br />

<br />

<br />

<br />

<br />

Clearly, there are 3 maximal slices to be extracted<br />

number of characters<br />

number of l<strong>in</strong>es<br />

number of words<br />

which match with the triple output of the program. Let us analyse each of these. The<br />

character count slice is<br />

3 ma<strong>in</strong>()<br />

4 {<br />

5 <strong>in</strong>t c, nc ;<br />

9 nc = 0;<br />

10 c = getchar();<br />

11 while ( c != EOF ) {<br />

12 ++nc;<br />

20 }<br />

21 c = getchar();<br />

22 }<br />

25 pr<strong>in</strong>tf("%d",nc);<br />

26 }<br />

with semantics captured by function<br />

char<br />

mk-


764 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

which calls<br />

mkif<br />

then mkelse<br />

mk- tl<br />

def<strong>in</strong>ed over the -projection of :<br />

:: char<br />

Entry of is clearly an accumulation parameter <strong>in</strong> function , whose removal<br />

is the <strong>in</strong>verse of a standard program trans<strong>for</strong>mation (exercise 3.45 <strong>in</strong> [BdM97]).<br />

We thus “get rid of the state” and obta<strong>in</strong><br />

char<br />

if<br />

then<br />

else<br />

tl<br />

easily recognizable as <strong>VDM</strong>-SL’s primitive “length” function. So we are <strong>do</strong>ne,<br />

and can proceed to the l<strong>in</strong>e-count slice :<br />

3 ma<strong>in</strong>()<br />

4 {<br />

5 <strong>in</strong>t c, nl ;<br />

7 nl = 0;<br />

10 c = getchar();<br />

11 while ( c != EOF ) {<br />

13 if ( c == ’\n’)<br />

14 ++nl;<br />

21 c = getchar();<br />

22 }<br />

23 pr<strong>in</strong>tf("%d",nl);<br />

26 }<br />

This corresponds to sliced state<br />

len<br />

:: char


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

765<br />

and exhibits semantics<br />

char<br />

mk-<br />

mk-<br />

if<br />

then mkelse<br />

mk- tl hd<br />

<strong>for</strong> auxiliary function<br />

char<br />

if<br />

then<br />

else<br />

Aga<strong>in</strong> the same standard trans<strong>for</strong>mation will lead to a state-free denotation:<br />

char<br />

if<br />

then<br />

else if hd<br />

then<br />

else<br />

tl<br />

tl<br />

F<strong>in</strong>ally, we have to cope with word count<strong>in</strong>g slice :<br />

1 #def<strong>in</strong>e YES 1<br />

2 #def<strong>in</strong>e NO 0<br />

3 ma<strong>in</strong>()<br />

4 {<br />

5 <strong>in</strong>t c, nw, <strong>in</strong>word ;<br />

6 <strong>in</strong>word = NO ;<br />

8 nw = 0;<br />

10 c = getchar();<br />

11 while ( c != EOF ) {<br />

15 if ( c == ’ ’ || c == ’\n’ || c == ’\t’)<br />

16 <strong>in</strong>word = NO;<br />

17 else if (<strong>in</strong>word == NO) {<br />

18 <strong>in</strong>word = YES ;


766 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

19 ++nw;<br />

20 }<br />

21 c = getchar();<br />

22 }<br />

24 pr<strong>in</strong>tf("%d",nw);<br />

26 }<br />

Fact is apparent from the projection of<br />

:: char<br />

which <strong>in</strong>cludes , and from , which is function<br />

char<br />

mk-<br />

false<br />

mkif<br />

then mkelse<br />

let if hd<br />

then<br />

else if<br />

then<br />

else<br />

if hd<br />

then false<br />

else if<br />

then true<br />

else <strong>in</strong><br />

mk- tl<br />

Clearly, is less obvious than what we have been seen above <strong>for</strong> the other<br />

slices. Firstly, because there are two state variables ( and ) <strong>in</strong>stead of one. So<br />

our first step is to package them together,<br />

:: char<br />

::


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

767<br />

and redef<strong>in</strong>e<br />

accord<strong>in</strong>gly<br />

mkif<br />

then mkelse<br />

mk- tl hd<br />

where<br />

is <strong>in</strong>troduced to capture the state accumulation process:<br />

char<br />

mkif<br />

then mkelse<br />

if<br />

then mkelse<br />

mk-<br />

false<br />

true<br />

(7)<br />

Secondly, because the removal of the accumulator will not br<strong>in</strong>g — unlike the<br />

preced<strong>in</strong>g examples — a denotation as clean as we would like:<br />

char<br />

char<br />

(8)<br />

if<br />

then mk-<br />

false<br />

else hd tl<br />

The fact that an auxiliary function ( ) is still needed br<strong>in</strong>gs about some questions:<br />

shouldn’t Boolean entry <strong>in</strong> have already disappeared? how <strong>do</strong> we<br />

proceed? is there a limit to the program trans<strong>for</strong>mations we have been apply<strong>in</strong>g so far?<br />

9 Need <strong>for</strong> a notation trans<strong>for</strong>mation<br />

Certa<strong>in</strong>ly we could go further <strong>in</strong> conventional (ie. lambda-calculus based) trans<strong>for</strong>mations.<br />

However, such “bagatelle-sized” reason<strong>in</strong>g is unlikely to scale up to realistic situations.<br />

Why? Legacy code normally <strong>in</strong>volves <strong>in</strong>tricate collections of global variables<br />

whose mean<strong>in</strong>g is obfuscated by loop-<strong>in</strong>tercomb<strong>in</strong>ation, mak<strong>in</strong>g it harder and harder to


768 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

understand by others. It is no wonder that the <strong>for</strong>mal semantics of code made efficient<br />

<strong>in</strong> this way leads to volum<strong>in</strong>ous mathematical expressions <strong>in</strong>volv<strong>in</strong>g large state vectors<br />

which resist to symbolic manipulation.<br />

However, the programm<strong>in</strong>g <strong>in</strong>tuitions implicit <strong>in</strong> (2) and (3) make sense and actually<br />

validated by algebraic laws [BdM97] which programmers (as a rule) ignore but feel<br />

obvious about the semantics of the underly<strong>in</strong>g programm<strong>in</strong>g language 4 .<br />

Notation seems to be a factor aga<strong>in</strong>st the explicit use of such laws <strong>in</strong> denotational<br />

semantics. Po<strong>in</strong>twise notation <strong>in</strong>volv<strong>in</strong>g operators as well as variable symbols, logical<br />

connectives, quantifiers, etc. is not abstract enough. It also entails a loss <strong>in</strong> genericity<br />

which conventional eng<strong>in</strong>eer<strong>in</strong>g mathematics has learned to solve elsewhere by chang<strong>in</strong>g<br />

the “mathematical space”, <strong>for</strong> <strong>in</strong>stance by mov<strong>in</strong>g (temporarily) from the<br />

to the <strong>in</strong> the Laplace trans<strong>for</strong>mation [Kre88]:<br />

-space<br />

-space<br />

Given problem<br />

Subsidiary equation<br />

Solution of given problem<br />

Solution of subsidiary equation<br />

Quot<strong>in</strong>g [Kre88], p.242:<br />

The Laplace trans<strong>for</strong>mation is a method <strong>for</strong> solv<strong>in</strong>g differential equations (...)<br />

The process of solution consists of three ma<strong>in</strong> steps:<br />

1st step. The given “hard” problem is trans<strong>for</strong>med <strong>in</strong>to a “simple”<br />

equation (subsidiary equation).<br />

2nd step. The subsidiary equation is solved by purely algebraic manipulations.<br />

Side comment: what becomes far less obvious (to others) is their use <strong>in</strong> an <strong>in</strong>-discrim<strong>in</strong>ated<br />

and un<strong>do</strong>cumented way!


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

769<br />

3rd step. The solution of the subsidiary equation is trans<strong>for</strong>med back<br />

to obta<strong>in</strong> the solution of the given problem.<br />

In this way the Laplace trans<strong>for</strong>mation reduces the problem of solv<strong>in</strong>g a differential<br />

equation to an algebraic problem.<br />

This is precisely what we would like to <strong>do</strong> <strong>in</strong> reverse denotational semantics: obta<strong>in</strong><br />

the po<strong>in</strong>twise denotation of a program, trans<strong>for</strong>m it <strong>in</strong>to a subsidiary po<strong>in</strong>tfree denotation,<br />

obta<strong>in</strong> the solution us<strong>in</strong>g the po<strong>in</strong>tfree algebra of programs, and return back to the<br />

po<strong>in</strong>twise level where <strong>for</strong>mal method practitioners are used to express their thoughts.<br />

10 Algebra of programm<strong>in</strong>g<br />

In a more respected than loved paper, John Backus [Bac78] was among the first to<br />

alert computer programmers that computer languages alone are <strong>in</strong>sufficient, and that<br />

only languages which exhibit an algebra <strong>for</strong> reason<strong>in</strong>g about the objects they purport<br />

to describe will be useful <strong>in</strong> the long run. This l<strong>in</strong>e of thought has witnessed significant<br />

advances over the last decade, based on the so-called functorial approach to datatypes<br />

which, orig<strong>in</strong>at<strong>in</strong>g from [MA86], has reached the status of a thorough program calculus<br />

<strong>in</strong> [BdM97]. Because this style of calculation has become known as the Bird-Meertens<br />

<strong>for</strong>malism (BMF) [Bac88], we will refer to the trans<strong>for</strong>mation from the “ -space to the<br />

-space” (where stands <strong>for</strong> variable and <strong>for</strong> po<strong>in</strong>tfree) as the “BMF trans<strong>for</strong>mation”.<br />

11 Introduc<strong>in</strong>g the “BMF-trans<strong>for</strong>mation”<br />

Programm<strong>in</strong>g “trick” (2) is an <strong>in</strong>stance of a class of <strong>for</strong>mal program trans<strong>for</strong>mations<br />

known as fusion laws: two sequential computations of the same k<strong>in</strong>d — <strong>in</strong> this case,<br />

and — merge together (ie. “fused”) <strong>in</strong> a s<strong>in</strong>gle computation of the same<br />

k<strong>in</strong>d — <strong>in</strong> this case.<br />

Recall that we have decided to restrict ourselves to functional code blocks, ie.,<br />

pieces of code whose semantics can be expressed by functions , etc. So “ ” <strong>in</strong><br />

denotes function composition<br />

under typ<strong>in</strong>g rule<br />

(9)<br />

On the other hand, programm<strong>in</strong>g trick (3) has to <strong>do</strong> with “fus<strong>in</strong>g” two “parallel”<br />

computations <strong>in</strong>to a s<strong>in</strong>gle one and affiliates to another group of laws hav<strong>in</strong>g to <strong>do</strong> with


770 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

mutual recursion. These <strong>in</strong>volve the “angle bracket” comb<strong>in</strong>ator of (3), which we will<br />

refer to as the “split” comb<strong>in</strong>ator:<br />

cf. diagram<br />

(10)<br />

<br />

<br />

<br />

<br />

<strong>in</strong>volv<strong>in</strong>g projections and which are such that and .<br />

This is Backus [Bac78] construction operator, which can be specified as the follow<strong>in</strong>g<br />

polymorphic operator <strong>in</strong> <strong>VDM</strong>-SL notation:<br />

mk-<br />

(11)<br />

These comb<strong>in</strong>ators are rich <strong>in</strong> algebraic properties. For <strong>in</strong>stance, the<br />

laws<br />

are implicit <strong>in</strong> the diagram above, and the<br />

-cancellation<br />

, (12)<br />

-fusion law<br />

expresses “left distribution” of composition of over split, a law <strong>in</strong> which two “parallel”<br />

consumer functions and fuse with another, producer function .<br />

Already known s<strong>in</strong>ce Backus’ algebra of programs, law (13) is an example of an<br />

equation “<strong>in</strong> the -space”. The compression of notation and the power of such laws is<br />

particularly apparent <strong>in</strong> deal<strong>in</strong>g with recursion, that is, with <strong>in</strong>ductive datatypes. For<br />

conciseness, we will focus on the datatype of f<strong>in</strong>ite lists (which is the one present <strong>in</strong><br />

word count) and mention only the laws which are relevant <strong>for</strong> our calculations. (For a<br />

comprehensive account, see eg. [BdM97].)<br />

Consider the follow<strong>in</strong>g <strong>in</strong>ductive def<strong>in</strong>ition (<strong>in</strong> <strong>VDM</strong>-SL) of the function which computes<br />

the sum of a list of <strong>in</strong>tegers:<br />

(13)<br />

if<br />

then<br />

else hd<br />

tl<br />

(14)


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

771<br />

In general, any list process<strong>in</strong>g function with signature can be written<br />

accord<strong>in</strong>g to the follow<strong>in</strong>g recursive scheme, <strong>in</strong> <strong>VDM</strong>-SL:<br />

if<br />

then<br />

else hd tl<br />

(15)<br />

<strong>for</strong> some and . For <strong>in</strong>stance, and <strong>in</strong><br />

.<br />

A standard result <strong>in</strong> <strong>in</strong>ductive datatype theory tells us that each <strong>in</strong>stance of is<br />

uniquely determ<strong>in</strong>ed by the pair . Pairs of this k<strong>in</strong>d are called algebras (= collections<br />

of functions and constants) which will be described <strong>in</strong> a compact way by resort<strong>in</strong>g<br />

to a comb<strong>in</strong>ator which dualizes split (10):<br />

(16)<br />

cf. diagram<br />

<br />

<br />

(17)<br />

<br />

where denotes the disjo<strong>in</strong>t union of and .<br />

Split and its dual are related to each other by the follow<strong>in</strong>g exchange law, which<br />

makes it possible to express every function of type <strong>in</strong> two alternative<br />

ways<br />

(18)<br />

<strong>for</strong> , , and .<br />

Thanks to the comb<strong>in</strong>ator, one can record the whole <strong>in</strong><strong>for</strong>mation about algebra<br />

above <strong>in</strong>to a s<strong>in</strong>gle arrow<br />

(19)<br />

where<br />

denotes an arbitrary (but fixed) s<strong>in</strong>gleton type — eg.


772 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

<strong>in</strong> <strong>VDM</strong>-SL — and denotes the “everywhere ” constant function such that<br />

Go<strong>in</strong>g further <strong>in</strong> the same direction, we can let arrow (19) participate <strong>in</strong> a larger diagram<br />

which records the whole “algorithmic” <strong>in</strong><strong>for</strong>mation about (15):<br />

(20)<br />

(21)<br />

In this diagram: is the <strong>VDM</strong>-SL type of f<strong>in</strong>ite lists (whose elements belong to );<br />

is algebra which builds -lists 5 ; is the identity function such that<br />

<strong>for</strong> every ; the “recursive call”<br />

<strong>in</strong>volves the “product”<br />

comb<strong>in</strong>ator,<br />

(22)<br />

and its dual, the “sum” comb<strong>in</strong>ator:<br />

(23)<br />

Diagram (21) expresses an equation about<br />

which we re-write <strong>in</strong>to<br />

by <strong>in</strong>troduc<strong>in</strong>g and .<br />

Equation (24) is all we need to know to def<strong>in</strong>e — provided we <strong>in</strong>stantiate , ie.<br />

and . To express this uniqueness of , dependent on , we write — read “ -<br />

catamorphism” — <strong>in</strong>stead of :<br />

For <strong>in</strong>stance, catamorphism is the “BMF-trans<strong>for</strong>mation” of the equivalent,<br />

four l<strong>in</strong>e <strong>VDM</strong>-SL (po<strong>in</strong>twise) def<strong>in</strong>ition of (14). As an exercise, the reader can<br />

Operator<br />

is def<strong>in</strong>ed <strong>in</strong> <strong>VDM</strong>-SL by<br />

(24)<br />

(25)


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

773<br />

recover this def<strong>in</strong>ition of — or <strong>in</strong> general that of (15) — from (25) by apply<strong>in</strong>g<br />

standard laws of the algebra of programm<strong>in</strong>g known as -fusion,<br />

and<br />

-absorption<br />

among others.<br />

Catamorphisms extend to any polynomial and possess a number of remarkable<br />

properties, among which we select the mutual-recursion law, also called “Fokk<strong>in</strong>ga<br />

law”:<br />

cf. diagrams<br />

(26)<br />

(27)<br />

(28)<br />

and<br />

This law is a <strong>for</strong>mal basis <strong>for</strong> “parallel loop” <strong>in</strong>ter-comb<strong>in</strong>ation (or unravell<strong>in</strong>g). It will<br />

play the major rôle <strong>in</strong> the calculations which rema<strong>in</strong> to be <strong>do</strong>ne about .<br />

12 Back to — <strong>in</strong>troduction of mutual recursion<br />

Recall functions and <strong>in</strong> (8). We will now focus our attention on function<br />

which, <strong>in</strong> the -space,<br />

is list catamorphism<br />

By deliver<strong>in</strong>g a pair of outputs,<br />

false<br />

char<br />

(7) will split <strong>in</strong>to<br />

where


774 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

mkif<br />

then<br />

else<br />

char<br />

char<br />

that is (<strong>in</strong> the<br />

-space):<br />

where , <strong>do</strong>uble projection abbreviates , and resorts<br />

to McCarthy’s conditional comb<strong>in</strong>ator def<strong>in</strong>ed by<br />

where<br />

(29)<br />

(30)<br />

Function<br />

can be reshaped<br />

false<br />

exchange law (18)<br />

false<br />

<strong>in</strong> a way so that it can be handled by the mutual-recursion law (28), <strong>for</strong><br />

and false . From this we <strong>in</strong>fer , as follows:<br />

– Unknown :<br />

false<br />

<strong>in</strong>stantiations<br />

false<br />

-absorption (27)<br />

false<br />

-natural law (31) below<br />

false


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

775<br />

One of the -natural laws<br />

was used <strong>in</strong> this calculation. Back to the<br />

false<br />

-space, one obta<strong>in</strong>s<br />

(31)<br />

(32)<br />

– Unknown : its diagram is<br />

char<br />

char<br />

char<br />

char<br />

Calculation:<br />

<strong>in</strong>stantiation of<br />

-absorption (27)<br />

McCarthy conditional and -cancellation (12)<br />

Back to the<br />

-space:<br />

if<br />

then<br />

else<br />

F<strong>in</strong>ally,<br />

recall<br />

mutual recursion <strong>in</strong>troduced above<br />

-cancellation (12)


776 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

So we a<strong>do</strong>pt as <strong>in</strong> our f<strong>in</strong>al <strong>VDM</strong>-SL, which goes back <strong>in</strong>to the -space. For a<br />

more suggestive read<strong>in</strong>g, we negate condition and abstract <strong>in</strong>to a<br />

“look ahead <strong>for</strong> word separator” predicate 6 :<br />

char<br />

if<br />

then<br />

else if hd tl<br />

then tl<br />

else tl<br />

char<br />

hd<br />

This specification adds to our understand<strong>in</strong>g of word count<strong>in</strong>g mechanism:<br />

counts the number of transitions from a non-separator to a separator character, or the<br />

end of the <strong>in</strong>put stream. As anticipated earlier on, variable has disappeared<br />

throughout this calculation. In fact, it can be regarded as a state “flag” implement<strong>in</strong>g<br />

the “separator/non-separator” state automaton which is implicit <strong>in</strong> the orig<strong>in</strong>al code:<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Function is now a proper <strong>in</strong>ductive function: it belongs to the class of so-called<br />

paramorphisms [Mee92], which are very common <strong>in</strong> <strong>for</strong>mal specification (<strong>for</strong> <strong>in</strong>stance,<br />

the usual def<strong>in</strong>ition of the factorial function is a paramorphism). Depicted <strong>in</strong> a diagram,<br />

will look as follows<br />

char<br />

char<br />

char<br />

char<br />

char<br />

where<br />

The<br />

version of the if-then-else relies on <strong>VDM</strong>-SL’s logic of partial functions [FL98].


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

777<br />

char<br />

mkif<br />

then<br />

else<br />

char<br />

All <strong>in</strong> all, we can write<br />

ma<strong>in</strong>()<br />

where<br />

len . In <strong>VDM</strong>-SL notation:<br />

char<br />

mk-<br />

len<br />

13 Summary<br />

As pictured <strong>in</strong> Fig. 2, we have comb<strong>in</strong>ed a <strong>for</strong>mal method — algebra of programm<strong>in</strong>g<br />

[BdM97] — with a semi-<strong>for</strong>mal one — code slic<strong>in</strong>g [Wei81] — <strong>in</strong> order to per<strong>for</strong>m<br />

the reverse specification of a little program already studied <strong>in</strong> the code slic<strong>in</strong>g literature<br />

[GL91]: the word count program of [KR78]. We claim that we have gone deeper than<br />

[GL91] <strong>in</strong> understand<strong>in</strong>g this piece of code.<br />

A key po<strong>in</strong>t of the approach is its constructive style, based on a change of notation<br />

which leads to powerful algebraic laws of programm<strong>in</strong>g. This also helps <strong>in</strong> overcom<strong>in</strong>g<br />

shortcom<strong>in</strong>gs of the specification language. For <strong>in</strong>stance, the coproduct construct<br />

is not polymorphic <strong>in</strong> <strong>VDM</strong>-SL. For each particular and , a specific disjo<strong>in</strong>t union<br />

datatype has to be def<strong>in</strong>ed, eg. <strong>in</strong>:<br />

::<br />

::<br />

So <strong>in</strong>jections and <strong>in</strong>stantiate to mk- and mk- , respectively, and to:<br />

if isthen<br />

else


778 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

Source code slic<strong>in</strong>g<br />

- <strong>for</strong>mal<br />

Denotational semantics<br />

<strong>for</strong>mal<br />

Algebra of program<strong>in</strong>g<br />

+ <strong>for</strong>mal<br />

Figure 2: Formal/<strong>in</strong><strong>for</strong>mal method comb<strong>in</strong>ation.<br />

Thanks to the change of notation, all our reason<strong>in</strong>g <strong>in</strong>volv<strong>in</strong>g coproducts was handsomely<br />

carried out <strong>in</strong> the -space (po<strong>in</strong>tfree notation). Po<strong>in</strong>tfree reason<strong>in</strong>g went as far<br />

as absorb<strong>in</strong>g auxiliary variables and <strong>in</strong>troduc<strong>in</strong>g mutual recursion, a specification mechanism<br />

which programmers never make explicit because they fear lack<strong>in</strong>g efficiency.<br />

Last but not least, the approach adds to program understand<strong>in</strong>g <strong>in</strong> “catalogu<strong>in</strong>g” semantic<br />

denotations <strong>in</strong>to well-known classes of <strong>in</strong>ductive schemata (cata/paramorphisms)<br />

which are rich <strong>in</strong> algebraic properties and are amenable to further reason<strong>in</strong>g (eg. reimplementation).<br />

Slic<strong>in</strong>g can be regarded as a denotational semantics “shortcut” — it trims <strong>do</strong>wn<br />

the complexity of handl<strong>in</strong>g all program variables at the same time. In fact, the mutualrecursion<br />

law (28) could have been applied to the whole program — rather than to its<br />

slices — at the sacrifice of a lot more reason<strong>in</strong>g show<strong>in</strong>g eventually that the three slices<br />

are <strong>in</strong>dependent of each other: a result known as the banana-split law,<br />

which is a special case of (28).<br />

(33)<br />

14 Related work<br />

Venkatesh [Ven91] was among the first to address program slic<strong>in</strong>g from a <strong>for</strong>mal semantics<br />

po<strong>in</strong>t of view. References [CLM95, CLM96] present approaches to recover<br />

(functional) specifications from imperative source code us<strong>in</strong>g “symbolic execution”.<br />

Specifications are expressed by pre/post-conditions. References [GC96, GC99] also<br />

base their reverse eng<strong>in</strong>eer<strong>in</strong>g strategies on pre-/post-conditions. Similarly, [CLM98]<br />

uses a first order logic <strong>for</strong>mula to map a subset of the <strong>in</strong>put program variables onto a<br />

set of <strong>in</strong>itial states. This allows one to specify any <strong>in</strong>itial state of the program. So, the


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

779<br />

slice calculated by add<strong>in</strong>g such a condition on the <strong>in</strong>put variable to the slic<strong>in</strong>g criterion<br />

is called conditioned slice.<br />

Reference [HDS95] proposes the use of program trans<strong>for</strong>mation techniques to construct<br />

a syntactically unrestricted slice, and so to improve the simplification power of<br />

slic<strong>in</strong>g. In the same direction, [HD97] <strong>in</strong>troduces the concept of amorphous slic<strong>in</strong>g,<br />

which relaxes syntactic constra<strong>in</strong>ts traditional <strong>in</strong> slic<strong>in</strong>g, thus generat<strong>in</strong>g smaller slices.<br />

In this way, the applicability of slic<strong>in</strong>g to program comprehension is improved.<br />

The repertoire of <strong>for</strong>mal techniques <strong>for</strong> reverse eng<strong>in</strong>eer<strong>in</strong>g further <strong>in</strong>cludes “type<br />

<strong>in</strong>ference” [vDM99] and concept/cluster analysis [vDK99]. These are applicable ma<strong>in</strong>ly<br />

to the detection of objects and can also be comb<strong>in</strong>ed with techniques proposed <strong>in</strong> this<br />

paper. Likewise, semi-<strong>for</strong>mal techniques [Vil01a] <strong>for</strong> the detection of recurrent algorithmic<br />

structures can also contribute to the identification — and later to the <strong>for</strong>malization<br />

— of program patterns.<br />

15 Future work<br />

The technique <strong>for</strong> code reversal reported <strong>in</strong> this paper is work <strong>in</strong> progress and some of<br />

its problems are still open. At the heart of these we place the conjecture of section 7. As<br />

anticipated <strong>in</strong> section 7 and discussed <strong>in</strong> [VO01], it needs further attention <strong>in</strong> presence<br />

of non-term<strong>in</strong>ation, <strong>for</strong>c<strong>in</strong>g equality to give place to <strong>in</strong> (4) and thus a move towards<br />

the more general relational algebra of programm<strong>in</strong>g [Bac00, BdM97].<br />

A <strong>for</strong>thcom<strong>in</strong>g master thesis [Vil01b] is expected to present several exercises such<br />

as word count which will let us to know more about which laws of programm<strong>in</strong>g are<br />

relevant <strong>in</strong> this context and to improve the <strong>in</strong>terplay between the code slic<strong>in</strong>g and the<br />

algebra of programm<strong>in</strong>g techniques. Even with<strong>in</strong> the functional <strong>in</strong>terpretation, some<br />

program structures will require laws more powerful than those which we have been<br />

th<strong>in</strong>k<strong>in</strong>g of — <strong>for</strong> <strong>in</strong>stance, comonadic calculations [Par00].<br />

Acknowledgements<br />

The research described <strong>in</strong> this paper was carried out at the ALGORITMI R&D Center<br />

and was funded by the Portuguese Science and Technology Foundation (grant P060-<br />

P31B-09/97). The author wishes to thank Gustavo Villavicencio <strong>for</strong> his contribution to<br />

this paper, and Peter G. Larsen and Andreas Kerschbaumer <strong>for</strong> their comments. The use<br />

of <strong>VDM</strong>Tools R under a Free Academic Site License provided by IFAD is gratefully<br />

acknowledged.<br />

References<br />

[Bac78]<br />

J. Backus. Can programm<strong>in</strong>g be liberated from the von Neumann style? a functional<br />

style and its algebra of programs. CACM, 21(8):613–639, August 1978.


780 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

[Bac88] R. Backhouse. An exploration of the Bird-Meertens Formalism, July 1988. Department<br />

of Comput<strong>in</strong>g Science, Gron<strong>in</strong>gen University, The Netherlands.<br />

[Bac00] R. C. Backhouse. Fixed po<strong>in</strong>t calculus, 2000. Summer School and Workshop on Algebraic<br />

and Coalgebraic Methods <strong>in</strong> the Mathematics of Program Construction, L<strong>in</strong>coln<br />

College, Ox<strong>for</strong>d, UK 10th to 14th April 2000.<br />

[BdM97] R. Bird and O. de Moor. Algebra of Programm<strong>in</strong>g. Series <strong>in</strong> Computer Science.<br />

Prentice-Hall International, 1997. C. A. R. Hoare, series editor.<br />

[BH93] R. C. Backhouse and P. F. Hoogendijk. Elements of a relational theory of datatypes.<br />

In S.A. Schuman, B. Möller, and H.A. Partsch, editors, Formal Program Development,<br />

number 755 <strong>in</strong> Lecture Notes <strong>in</strong> Computer Science, pages 7–42. Spr<strong>in</strong>ger, 1993.<br />

[CI90] E.J. Chikofsky and J. H. Cross II. Reverse eng<strong>in</strong>eer<strong>in</strong>g and design recovery: a taxonomy.<br />

IEEE Software, 7(1):13–17, 1990.<br />

[CLM95] Aniello Cimitile, Andrea De Lucia, and Malcolm C. Munro. Qualify<strong>in</strong>g reusable<br />

functions us<strong>in</strong>g symbolic execution. In Work<strong>in</strong>g Conference on Reverse Eng<strong>in</strong>eer<strong>in</strong>g,<br />

Toronto, Canada, pages 178–187, 1995.<br />

[CLM96] Aniello Cimitile, Andrea De Lucia, and Malcolm C. Munro. A specification driven<br />

slic<strong>in</strong>g process <strong>for</strong> identify<strong>in</strong>g reusable functions. Journal of Software Ma<strong>in</strong>tenance:<br />

Research and Practice, 8(3):145–178, 1996.<br />

[CLM98] Aniello Cimitile, Andrea De Lucia, and Malcolm C. Munro. Conditioned program<br />

slic<strong>in</strong>g. In<strong>for</strong>mation and Software Technology (special issue on program slic<strong>in</strong>g),<br />

40(11/12):595–607, 1998.<br />

[FL98] J. Fitzgerald and P.G. Larsen. Modell<strong>in</strong>g Systems: Practical Tools and Techniques <strong>for</strong><br />

Software Development . Cambridge University Press, 1st edition, 1998.<br />

[GC96] G.C. Gannod and B.H.C. Cheng. Us<strong>in</strong>g <strong>in</strong><strong>for</strong>mal and <strong>for</strong>mal techniques <strong>for</strong> the reverse<br />

eng<strong>in</strong>eer<strong>in</strong>g of C programs. Technical report, DCS, Michigan State University, 1996.<br />

[GC99] G.C. Gannod and B.H.C. Cheng. A <strong>for</strong>mal apprach <strong>for</strong> reverse eng<strong>in</strong>eer<strong>in</strong>g: a case<br />

study. In WCRE’99, 1999.<br />

[GL91] K. B. Gallagher and J. R. Lyle. Us<strong>in</strong>g program slic<strong>in</strong>g <strong>in</strong> software ma<strong>in</strong>tenance. IEEE<br />

Transactions on Software Eng<strong>in</strong>eer<strong>in</strong>g, 17(8):751–761, 1991.<br />

[HD97] M. Harman and S. Danicic. Amorphous program slic<strong>in</strong>g. In Proceed<strong>in</strong>gs of the 5th<br />

IEEE Internation Workshop on Program Comprehesion (IWPC’97) (Dearborn, Michigan,<br />

USA, May 1997), pages 70–79. IEEE Computer Society Press, Los Alamitos, Cali<strong>for</strong>nia,<br />

USA, 1997.<br />

[HDS95] Mark Harman, Sebastian Danicic, and Yoga Sivagurunathan. Program comprehension<br />

assisted by slic<strong>in</strong>g and trans<strong>for</strong>mation, July 1995. In Malcolm Munro, editor, 1st<br />

Durham Workshop on Program Comprehension, Durham University, UK.<br />

[Jon86] C. B. Jones. Systematic Software Development Us<strong>in</strong>g <strong>VDM</strong>. Series <strong>in</strong> Computer Science.<br />

Prentice-Hall International, 1986. C. A. R. Hoare.<br />

[KR78] B.W. Kernighan and D.M. Richtie. The C Programm<strong>in</strong>g Language. Prentice Hall,<br />

Englewood Cliffs, N.J., 1978.<br />

[Kre88] E. Kreyszig. Advanced Eng<strong>in</strong>eer<strong>in</strong>g Mathematics. John Wiley & Sons, Inc., 6th edition,<br />

1988.<br />

[MA86] E. G. Manes and M. A. Arbib. Algebraic Approaches to Program Semantics. Texts<br />

and Monographs <strong>in</strong> Computer Science. Spr<strong>in</strong>ger-Verlag, 1986. D. Gries, series editor.<br />

[Mee92] L. Meertens. Paramorphisms. Formal Aspects of Comput<strong>in</strong>g, 4:413–424, 1992.<br />

[NSO99] F. L. Neves, J. C. Silva, and J. N. Oliveira. Convert<strong>in</strong>g In<strong>for</strong>mal Meta-data to <strong>VDM</strong>-<br />

SL: A Reverse Calculation Approach . In <strong>VDM</strong> <strong>in</strong> Practice! A Workshop co-located<br />

with FM’99: The World Congress on Formal Methods, Toulouse, France, 20-21 September,<br />

September 1999.<br />

[Par00] A. Par<strong>do</strong>. Towards merg<strong>in</strong>g recursion and comonads. In WGP’2000, Ponte de Lima,<br />

July 2000.<br />

[vDK99] Arie van Deursen and Tobias Kuipers. Identify<strong>in</strong>g objects us<strong>in</strong>g cluster and concept<br />

analysis. ICSE’99, ACM, 1999.<br />

[vDM99] Arie van Deursen and Leon Moonen. Understand<strong>in</strong>g cobol systems us<strong>in</strong>g <strong>in</strong>ferred<br />

types. 7th IWPC’99, IEEE Computer Society, 1999.


Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />

781<br />

[Ven91] G. A. Venkatesh. The semantic approach to program slic<strong>in</strong>g. In Proceed<strong>in</strong>gs of the<br />

ACM SIGPLAN ’91 Conference on Programm<strong>in</strong>g Language Design and Implementation<br />

(PLDI), volume 26, pages 107–119, 1991.<br />

[Vil01a] G. Villacicencio. Program analysis <strong>for</strong> the automatic detection of programm<strong>in</strong>g plans<br />

apply<strong>in</strong>g slic<strong>in</strong>g. In 5th European Conference on Software Ma<strong>in</strong>tenance and Reeng<strong>in</strong>eer<strong>in</strong>g.<br />

IEEE Computer Society, March 2001.<br />

[Vil01b] G. Villavicencio. Formalización de una estrategia <strong>in</strong>tegral de <strong>in</strong>geniería reversa, 2001.<br />

[VO01]<br />

Master’s thesis <strong>in</strong> preparation (University of San Luis, Argent<strong>in</strong>a).<br />

G. Villavicencio and J.N. Oliveira. Formal reverse calculation supported by code slic<strong>in</strong>g,<br />

2001. To appear <strong>in</strong> WCRE 2001: 8th Work<strong>in</strong>g Conference on Reverse Eng<strong>in</strong>eer<strong>in</strong>g<br />

, 2-5 October 2001, Stuttgart, Germany.<br />

[Wei81] Mark Weiser. Program slic<strong>in</strong>g. In Fifth International Conference on Software Eng<strong>in</strong>eer<strong>in</strong>g,<br />

San Diego, Cali<strong>for</strong>nia, March 1981.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!