“Bagatelle in C arranged for VDM SoLo” - Universidade do Minho
“Bagatelle in C arranged for VDM SoLo” - Universidade do Minho
“Bagatelle in C arranged for VDM SoLo” - Universidade do Minho
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>“Bagatelle</strong> <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> <strong>SoLo”</strong><br />
J.N. Oliveira<br />
Ref. [Ol01] — 2001<br />
J.N. Oliveira. <strong>“Bagatelle</strong> <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> <strong>SoLo”</strong>. JUCS, 7(8):754–781, 2001.
Journal of Universal Computer Science, vol. 7, no. 8 (2001), 754-781<br />
submitted: 18/5/01, accepted: 21/8/01, appeared: 28/8/01Spr<strong>in</strong>ger Pub. Co.<br />
<strong>“Bagatelle</strong> <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> <strong>SoLo”</strong><br />
José N. Oliveira<br />
(<strong>Universidade</strong> <strong>do</strong> M<strong>in</strong>ho, Braga, Portugal<br />
Senior R&D Partner, SIDEREUS S.A., Porto, Portugal<br />
jno@di.um<strong>in</strong>ho.pt)<br />
Abstract: This paper sketches a reverse eng<strong>in</strong>eer<strong>in</strong>g discipl<strong>in</strong>e which comb<strong>in</strong>es <strong>for</strong>mal and<br />
semi-<strong>for</strong>mal methods. Central to the <strong>for</strong>mer is denotational semantics, expressed <strong>in</strong> the ISO/IEC<br />
13817-1 standard specification language (<strong>VDM</strong>-SL). This is strengthened with algebra of programm<strong>in</strong>g,<br />
which is applied <strong>in</strong> “reverse order” so as to reconstruct <strong>for</strong>mal specifications from<br />
legacy code. The latter <strong>in</strong>clude code slic<strong>in</strong>g, a “shortcut” which trims <strong>do</strong>wn the complexity of<br />
handl<strong>in</strong>g the <strong>for</strong>mal semantics of all program variables at the same time.<br />
A key po<strong>in</strong>t of the approach is its constructive style. Reverse calculations go as far as absorb<strong>in</strong>g<br />
auxiliary variables, <strong>in</strong>troduc<strong>in</strong>g mutual recursion (if applicable) and revers<strong>in</strong>g semantic denotations<br />
<strong>in</strong>to standard generic programm<strong>in</strong>g schemata such as cata/paramorphisms.<br />
The approach is illustrated <strong>for</strong> a small piece of code already studied <strong>in</strong> the code-slic<strong>in</strong>g literature:<br />
Kernighan and Richtie’s word count C programm<strong>in</strong>g “bagatelle”.<br />
Key Words: Reverse eng<strong>in</strong>eer<strong>in</strong>g; denotational semantics; slic<strong>in</strong>g; algebra of programm<strong>in</strong>g.<br />
Category: F.3.1 (Specify<strong>in</strong>g and Verify<strong>in</strong>g and Reason<strong>in</strong>g About Programs); D.2.1 (Requirements/Specifications)<br />
1 Introduction<br />
The development of computer software has always been a contradictory activity. Four<br />
decades ago, a “software crisis” was identified which drew attention to the poor technical<br />
sophistication of the (then emerg<strong>in</strong>g) software technology. Many years passed,<br />
the advent of home comput<strong>in</strong>g and world-wide network facilities has brought computer<br />
software unprecedented relevance <strong>in</strong> everybody’s life. However, the theoretical<br />
advances <strong>in</strong> programm<strong>in</strong>g science — which has become a stable discipl<strong>in</strong>e meanwhile<br />
— are still by and large ignored by the ever grow<strong>in</strong>g community of programmers.<br />
The situation is no better <strong>in</strong> education: software design rema<strong>in</strong>s among the very few<br />
topics which many eng<strong>in</strong>eer<strong>in</strong>g departments still accept to address with little (if any)<br />
scientific basis.<br />
Time-to-market is among the ma<strong>in</strong> factors en<strong>for</strong>c<strong>in</strong>g ad hoc programm<strong>in</strong>g practice.<br />
However, a quick market entails <strong>in</strong>direct costs such as permanent risk, poor quality and<br />
low reliability of the produced code, and hard ma<strong>in</strong>tenance.<br />
In a situation <strong>in</strong> which the only quality certificate of the runn<strong>in</strong>g software artifact<br />
still is life-cycle endurance, customers and software producers are little prepared to<br />
change, modify or improve runn<strong>in</strong>g code. So there is little opportunity <strong>for</strong> brand new,<br />
technically sound code to replace the old one. (One is never sure about the implications<br />
of a software update.) However, faced with so risky a dependence on legacy software,
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
755<br />
managers are more and more prepared to spend resources to <strong>in</strong>crease confidence on —<br />
ie. the level of understand<strong>in</strong>g of — their (untouchable) code.<br />
2 Program understand<strong>in</strong>g<br />
Program understand<strong>in</strong>g affiliates to reverse eng<strong>in</strong>eer<strong>in</strong>g, understood as the analysis of<br />
a system <strong>in</strong> order to identify components and <strong>in</strong>tended behaviour <strong>in</strong> order to create<br />
higher level abstractions of the system [CI90]. Reverse eng<strong>in</strong>eer<strong>in</strong>g <strong>in</strong>cludes reverse<br />
specification — the analytical process of <strong>in</strong>ferr<strong>in</strong>g the orig<strong>in</strong>al specification (which may<br />
have never been written) of some runn<strong>in</strong>g piece of software.<br />
If (on statistical grounds) <strong>for</strong>ward software eng<strong>in</strong>eer<strong>in</strong>g can today be regarded as<br />
a lost opportunity <strong>for</strong> <strong>for</strong>mal methods 1 , its converse still looks a promis<strong>in</strong>g area <strong>for</strong><br />
their application. This is due to the complexity of reverse eng<strong>in</strong>eer<strong>in</strong>g problems. So it<br />
is likely that the <strong>for</strong>mal techniques developed by academics <strong>for</strong> the production of fresh,<br />
high quality software will eventually see the light of (<strong>in</strong>dustrial) success <strong>in</strong> their reverse<br />
application to pre-exist<strong>in</strong>g code. Even if, <strong>for</strong> this purpose, they have to merge with<br />
other, <strong>in</strong><strong>for</strong>mal program ma<strong>in</strong>tenance and debugg<strong>in</strong>g techniques.<br />
3 About this paper<br />
This paper is a modest contribution <strong>in</strong> the spirit of the last paragraph above. It reports<br />
work <strong>in</strong> progress on the development of a discipl<strong>in</strong>e <strong>for</strong> reverse eng<strong>in</strong>eer<strong>in</strong>g which comb<strong>in</strong>es<br />
<strong>for</strong>mal and semi-<strong>for</strong>mal methods. The <strong>for</strong>mal basis of the approach enriches standard<br />
denotational semantics techniques with the algebra of programm<strong>in</strong>g [BdM97],<br />
which is applied <strong>in</strong> “reverse order” so as to reconstruct the <strong>for</strong>mal specifications of<br />
legacy code. Because of the complexity <strong>in</strong>herent <strong>in</strong> handl<strong>in</strong>g <strong>for</strong>mal semantic descriptions<br />
of algorithmic code, reverse algebraic calculation is preceded by code slic<strong>in</strong>g<br />
[Wei81].<br />
This work is the follow up of a project which has addressed data reverse eng<strong>in</strong>eer<strong>in</strong>g<br />
(DRE) <strong>in</strong> a similar way [NSO99]: data are reversed by calculation accord<strong>in</strong>g to<br />
an algebra of data-representation laws which <strong>in</strong>clude the trans<strong>for</strong>mation of relational<br />
database meta-data <strong>in</strong>to abstract ISO/IEC 13817-1 standard (<strong>VDM</strong>-SL) <strong>for</strong>mal descriptions<br />
[FL98].<br />
Code reversal is, <strong>in</strong> general, harder than DRE and this paper will not present a<br />
general solution to the problems <strong>in</strong>volved. Instead, a small example is worked out which<br />
triggers some <strong>in</strong>tuitions about the strategy to follow <strong>in</strong> real situations. This example is<br />
the “word count” programm<strong>in</strong>g “bagatelle” 2 presented by Kernighan and Richtie’s <strong>in</strong><br />
their well-known <strong>in</strong>troductory book to the C programm<strong>in</strong>g language (Fig. 1). This can<br />
With notable exceptions <strong>in</strong> areas such as safety-critical and dependable comput<strong>in</strong>g.<br />
In the musical term<strong>in</strong>ology, a bagatelle (“musical trifle”) is a short light or whimsical piece,<br />
usually written <strong>for</strong> piano.
756 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
1 #def<strong>in</strong>e YES 1<br />
2 #def<strong>in</strong>e NO 0<br />
3 ma<strong>in</strong>()<br />
4 {<br />
5 <strong>in</strong>t c, nl, nw, nc, <strong>in</strong>word ;<br />
6 <strong>in</strong>word = NO ;<br />
7 nl = 0;<br />
8 nw = 0;<br />
9 nc = 0;<br />
10 c = getchar();<br />
11 while ( c != EOF ) {<br />
12 nc = nc + 1;<br />
13 if ( c == ’\n’)<br />
14 nl = nl + 1;<br />
15 if ( c == ’ ’ || c == ’\n’ || c == ’\t’)<br />
16 <strong>in</strong>word = NO;<br />
17 else if (<strong>in</strong>word == NO) {<br />
18 <strong>in</strong>word = YES ;<br />
19 nw = nw + 1;<br />
20 }<br />
21 c = getchar();<br />
22 }<br />
23 pr<strong>in</strong>tf("%d",nl);<br />
24 pr<strong>in</strong>tf("%d",nw);<br />
25 pr<strong>in</strong>tf("%d",nc);<br />
26 }<br />
Figure 1: The word count program [KR78].<br />
be recognized as the core of the Unix wc command, which pr<strong>in</strong>ts the number of bytes,<br />
words, and l<strong>in</strong>es <strong>in</strong> files.<br />
However simple, this example is expressive enough to illustrate our ma<strong>in</strong> po<strong>in</strong>t:<br />
that, <strong>in</strong> practice, denotational techniques alone are <strong>in</strong>sufficient <strong>for</strong> code reversal, and<br />
that they benefit from comb<strong>in</strong><strong>in</strong>g with other — either <strong>for</strong>mal or <strong>in</strong><strong>for</strong>mal — methods.<br />
The rema<strong>in</strong>der of this paper is structured as follows: <strong>in</strong> the follow<strong>in</strong>g section we<br />
<strong>in</strong>troduce our (light-weight) approach to denotational semantics, which is illustrated <strong>in</strong><br />
section 5 <strong>for</strong> the word count example, us<strong>in</strong>g <strong>VDM</strong>-SL notation. In sections 6 and 7 we<br />
motivate the application of slic<strong>in</strong>g to <strong>for</strong>mal semantics. This is illustrated <strong>in</strong> section 8.<br />
Section 9 motivates the need <strong>for</strong> a posteriori algebraic calculations, to be applied to<br />
each particular slice. The algebra of programm<strong>in</strong>g is briefly presented <strong>in</strong> sections 10<br />
and 11, and f<strong>in</strong>ally applied to one of the slices of the word-count example (section 12).<br />
The paper ends with some conclusions and review of related work.
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
757<br />
4 Denotational semantics <strong>for</strong> program synthesis / analysis<br />
Let be a piece of algorithmic code and denote its denotational semantics, ie, the<br />
<strong>in</strong>put/output relation which captures the behaviour of . In <strong>for</strong>ward eng<strong>in</strong>eer<strong>in</strong>g one<br />
starts from a specification and rewrites it over and over aga<strong>in</strong>,<br />
until the semantics of some piece of code is found. Structured programm<strong>in</strong>g<br />
makes it possible to take advantage of the compositional properties of the available<br />
program comb<strong>in</strong>ators, <strong>for</strong> <strong>in</strong>stance sequential composition<br />
(1)<br />
where “ ” denotes relational composition. So, if specification is written <strong>do</strong>wn<br />
and one f<strong>in</strong>ds and such that and hold, then is a<br />
valid ref<strong>in</strong>ement step <strong>in</strong> program synthesis.<br />
By contrast, it seems natural that program analysis should go the other way round:<br />
try and identify chunks of code such that their semantics can be <strong>in</strong>ferred and abstracted<br />
upon. There is a fact to reta<strong>in</strong>, though: <strong>in</strong> go<strong>in</strong>g backwards along the direction <strong>in</strong><br />
(1) one can add arbitrary nondeterm<strong>in</strong>ism and end up <strong>in</strong> a specification which is too<br />
vague.<br />
For a determ<strong>in</strong>istic program, relation specializes to a function which maps<br />
the state of (ie. the set of all variables which has access to) be<strong>for</strong>e execution takes<br />
place, to the state after such an execution. This retraction to a functional semantics, albeit<br />
restrictive <strong>in</strong> general, is acceptable <strong>in</strong> face of simple programs (“bagatelles”) such<br />
as word count (Fig. 1). The theory gets simpler and more <strong>in</strong>tuitive and this helps the<br />
(<strong>in</strong><strong>for</strong>mal) reader to appreciate the power of <strong>for</strong>mal reason<strong>in</strong>g. This expla<strong>in</strong>s our decision<br />
<strong>in</strong> this paper to reverse specify ma<strong>in</strong> <strong>in</strong> the word-count example only <strong>in</strong> terms<br />
of its functional specification 3 . So we shall be deal<strong>in</strong>g with functions and functional<br />
equality, rather than relations and the subset order<strong>in</strong>g.<br />
5 Start<strong>in</strong>g denotational semantics <strong>for</strong> word count<br />
The <strong>VDM</strong>-SL specification language [FL98] will be a<strong>do</strong>pted to express the <strong>for</strong>mal semantics<br />
of (determ<strong>in</strong>istic) code such as word count. One starts by <strong>in</strong>ferr<strong>in</strong>g the program<br />
state from the program variables:<br />
Later on we will address shortcom<strong>in</strong>gs of this decision, acceptable only <strong>in</strong> face of determ<strong>in</strong>istic,<br />
term<strong>in</strong>at<strong>in</strong>g programs. The trade-off between the relational and functional foundations<br />
<strong>for</strong> program calculation is apparent <strong>in</strong> [BdM97]. The power of the relational approach can be<br />
appreciated <strong>in</strong> [BH93, Bac00], among a vast literature on this subject.
758 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
:: char<br />
For notation convenience, each selector <strong>in</strong> is named after the correspond<strong>in</strong>g program<br />
variable. Note the char and abstractions of <strong>in</strong> variables and , respectively.<br />
Next, the <strong>in</strong>put stream is made explicit <strong>in</strong> a way such that EOF is modelled by<br />
nil ,<br />
:: char<br />
char<br />
cf. the follow<strong>in</strong>g semantics <strong>for</strong> :<br />
Let<br />
be the function which captures the semantics of the whole program, that is<br />
ma<strong>in</strong>()<br />
In <strong>VDM</strong>-SL notation one has:<br />
char<br />
let mk- nil false <strong>in</strong><br />
mkif<br />
then mk- nil<br />
else mk- tl hd<br />
mk-<br />
The while-loop <strong>in</strong> ma<strong>in</strong>() is modelled by auxiliary function<br />
state, of type . The loop <strong>in</strong>itialization is captured by record<br />
, which updates the<br />
passed as parameter to :<br />
mk- nil false
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
759<br />
let<br />
if<br />
then<br />
else let<br />
nil<br />
<strong>in</strong><br />
<strong>in</strong><br />
State model can be made simpler by filter<strong>in</strong>g the redundancy of entries and<br />
. This means to data-abstract to<br />
:: char<br />
under abstraction (retrieve [Jon86]) function<br />
if<br />
then<br />
else<br />
if<br />
then<br />
else if<br />
then<br />
else<br />
if<br />
then false<br />
else if<br />
then true<br />
else<br />
mk-<br />
mkif<br />
nil<br />
then mkelse<br />
mk-<br />
Then reduces to hd tl -manipulation, lead<strong>in</strong>g to a simpler version of :
760 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
mkif<br />
then mkelse<br />
let if hd<br />
then<br />
else if<br />
then<br />
else<br />
if hd<br />
then false<br />
else if<br />
then true<br />
else <strong>in</strong><br />
mk- tl<br />
if hd<br />
then<br />
else<br />
where an auxiliary function was <strong>in</strong>troduced<br />
char<br />
abstract<strong>in</strong>g the test <strong>for</strong> a separator character. F<strong>in</strong>ally,<br />
and renamed to :<br />
is redef<strong>in</strong>ed accord<strong>in</strong>gly<br />
char<br />
let mk- false <strong>in</strong><br />
mk-<br />
6 Loop <strong>in</strong>ter-comb<strong>in</strong>ation and code fusion<br />
When writ<strong>in</strong>g code such as word count, programmers tend to comb<strong>in</strong>e <strong>in</strong>to a s<strong>in</strong>gle<br />
programm<strong>in</strong>g construct (eg. a loop) two or more logically <strong>in</strong>dependent computations.<br />
This saves auxiliary (eg. <strong>in</strong>termediate) data-structur<strong>in</strong>g and the programm<strong>in</strong>g “tricks” to
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
761<br />
achieve efficiency <strong>in</strong> this way can be framed <strong>in</strong>to two broad categories, one “sequential”<br />
and one “parallel”. The <strong>for</strong>mer can be expressed <strong>in</strong><strong>for</strong>mally as follows,<br />
as illustrated <strong>in</strong> a diagram:<br />
preferred to (2)<br />
In other situations the idea <strong>in</strong> the programmer’s m<strong>in</strong>d is to per<strong>for</strong>m the computation<br />
of several (possibly <strong>in</strong>dependent) output variables <strong>in</strong> the same loop, all shar<strong>in</strong>g the same<br />
visit to the <strong>in</strong>put data structure. This is depicted (<strong>for</strong> two such variables) <strong>in</strong> the follow<strong>in</strong>g<br />
draw<strong>in</strong>g,<br />
that is,<br />
preferred to (3)<br />
where the angle brackets <strong>in</strong> denote the “parallel” execution of computations<br />
and on the same <strong>in</strong>put.<br />
The word count program of Fig. 1 is an <strong>in</strong>stance of situation (3): the computations<br />
of nc, nl and nw (resp. the number of characters, number of l<strong>in</strong>es and number of words<br />
<strong>in</strong> the <strong>in</strong>put stream) proceed <strong>in</strong> parallel, ie. <strong>in</strong> the same while-loop access to the <strong>in</strong>put<br />
stream.
762 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
7 Code slic<strong>in</strong>g<br />
As happens with word count, the loop-<strong>in</strong>tercomb<strong>in</strong>ation strategy suggested <strong>in</strong> (3) may<br />
<strong>in</strong>volve computations on the right which are <strong>in</strong>dependent of each other. The loop-body<br />
on the left can become really <strong>in</strong>scrutable if the programmer <strong>do</strong>esn’t bother to <strong>in</strong>terleave<br />
the statements of with the statements of <strong>in</strong> an arbitrary, not obvious way.<br />
This is where code slic<strong>in</strong>g proves useful <strong>in</strong> debugg<strong>in</strong>g practice. Program slic<strong>in</strong>g was<br />
<strong>in</strong>troduced by Weiser [Wei81] as a means to help programmers <strong>in</strong> understand<strong>in</strong>g <strong>for</strong>eign<br />
code and <strong>in</strong> debugg<strong>in</strong>g. It is a technique <strong>for</strong> restrict<strong>in</strong>g the behaviour of a program<br />
to some fragment of <strong>in</strong>terest. The (decomposition) slice<br />
on variable<br />
yields that portion of the program which contributes to the computation of . Slices<br />
are executable (sub)programs which can be extracted automatically from the source<br />
code by data flow and control flow analysis. Gallagher and Lyle [GL91] show how such<br />
slices, ordered by set <strong>in</strong>clusion, <strong>for</strong>m a semi-lattice where meet corresponds to code<br />
shar<strong>in</strong>g.<br />
The lesson learned from the code slic<strong>in</strong>g community po<strong>in</strong>ts to a clear direction <strong>in</strong><br />
program understand<strong>in</strong>g: <strong>in</strong>stead of work<strong>in</strong>g with a monolithic state vector <strong>in</strong>volv<strong>in</strong>g<br />
state variables , <strong>for</strong> and each of type , and express<strong>in</strong>g<br />
semantics <strong>in</strong> terms of state-vector trans<strong>for</strong>mations , one “splits”<br />
the effect on the state vector <strong>in</strong> terms of <strong>in</strong>dependent computations .<br />
Each <strong>in</strong>dividual is self-sufficient and smaller than the orig<strong>in</strong>al code, there<strong>for</strong>e easier<br />
to understand (or reverse calculate).<br />
The idea of us<strong>in</strong>g slic<strong>in</strong>g is that of “short-cutt<strong>in</strong>g” the work of understand<strong>in</strong>g a large<br />
program via the syntactic separation of the program <strong>in</strong> its constituent slices. However,<br />
how sound is this strategy? On a denotational semantics basis, how <strong>do</strong> we guarantee<br />
that, altogether, the slices’ semantics “re-constitute” the whole program semantics? We<br />
will rely on the follow<strong>in</strong>g conjecture:<br />
Let be a program exhibit<strong>in</strong>g output variables, and let be the<br />
correspond<strong>in</strong>g slices. Then the semantics of can be recovered by comb<strong>in</strong><strong>in</strong>g<br />
these and only these slices, ie.<br />
In other words, slic<strong>in</strong>g is a semantically sound code-decomposition technique.<br />
The angle-bracket comb<strong>in</strong>ator of (4) will be <strong>for</strong>malized <strong>in</strong> section 11. Although the<br />
equality <strong>do</strong>es not hold <strong>for</strong> all programs (see section 15), it <strong>do</strong>es hold <strong>for</strong> a broad class<br />
of determ<strong>in</strong>istic programs which <strong>in</strong>cludes wc.<br />
(4)<br />
8 The slices of wc and their <strong>for</strong>mal semantics<br />
In general, the semantics of a program block <strong>in</strong>volv<strong>in</strong>g while, <strong>for</strong> or <strong>do</strong> statements<br />
and/or recursion will have to be <strong>in</strong>ductively def<strong>in</strong>ed over some <strong>in</strong>put type , eg. a f<strong>in</strong>ite
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
763<br />
list, an array, a tree. So this type should be s<strong>in</strong>gled out from the space vector:<br />
Slic<strong>in</strong>g will supply a collection of slices<br />
such that<br />
(5)<br />
which we may want to abstract <strong>in</strong>to <strong>in</strong>ductive functions with shape<br />
(6)<br />
The trans<strong>for</strong>mation from (6) to (5) is a well-known technique <strong>for</strong> improv<strong>in</strong>g the efficiency<br />
of functional programs called accumulation parameter <strong>in</strong>troduction [BdM97].<br />
Below we shall be <strong>in</strong>terested <strong>in</strong> the reverse application of this rule, that is, we want to<br />
remove accumulations.<br />
Let us <strong>in</strong>stantiate this process <strong>for</strong> word count (Fig. 1). For this piece of code, Gallagher<br />
and Lyle [GL91] identify the follow<strong>in</strong>g slice decomposition semi-lattice:<br />
<br />
<br />
<br />
<br />
Clearly, there are 3 maximal slices to be extracted<br />
number of characters<br />
number of l<strong>in</strong>es<br />
number of words<br />
which match with the triple output of the program. Let us analyse each of these. The<br />
character count slice is<br />
3 ma<strong>in</strong>()<br />
4 {<br />
5 <strong>in</strong>t c, nc ;<br />
9 nc = 0;<br />
10 c = getchar();<br />
11 while ( c != EOF ) {<br />
12 ++nc;<br />
20 }<br />
21 c = getchar();<br />
22 }<br />
25 pr<strong>in</strong>tf("%d",nc);<br />
26 }<br />
with semantics captured by function<br />
char<br />
mk-
764 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
which calls<br />
mkif<br />
then mkelse<br />
mk- tl<br />
def<strong>in</strong>ed over the -projection of :<br />
:: char<br />
Entry of is clearly an accumulation parameter <strong>in</strong> function , whose removal<br />
is the <strong>in</strong>verse of a standard program trans<strong>for</strong>mation (exercise 3.45 <strong>in</strong> [BdM97]).<br />
We thus “get rid of the state” and obta<strong>in</strong><br />
char<br />
if<br />
then<br />
else<br />
tl<br />
easily recognizable as <strong>VDM</strong>-SL’s primitive “length” function. So we are <strong>do</strong>ne,<br />
and can proceed to the l<strong>in</strong>e-count slice :<br />
3 ma<strong>in</strong>()<br />
4 {<br />
5 <strong>in</strong>t c, nl ;<br />
7 nl = 0;<br />
10 c = getchar();<br />
11 while ( c != EOF ) {<br />
13 if ( c == ’\n’)<br />
14 ++nl;<br />
21 c = getchar();<br />
22 }<br />
23 pr<strong>in</strong>tf("%d",nl);<br />
26 }<br />
This corresponds to sliced state<br />
len<br />
:: char
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
765<br />
and exhibits semantics<br />
char<br />
mk-<br />
mk-<br />
if<br />
then mkelse<br />
mk- tl hd<br />
<strong>for</strong> auxiliary function<br />
char<br />
if<br />
then<br />
else<br />
Aga<strong>in</strong> the same standard trans<strong>for</strong>mation will lead to a state-free denotation:<br />
char<br />
if<br />
then<br />
else if hd<br />
then<br />
else<br />
tl<br />
tl<br />
F<strong>in</strong>ally, we have to cope with word count<strong>in</strong>g slice :<br />
1 #def<strong>in</strong>e YES 1<br />
2 #def<strong>in</strong>e NO 0<br />
3 ma<strong>in</strong>()<br />
4 {<br />
5 <strong>in</strong>t c, nw, <strong>in</strong>word ;<br />
6 <strong>in</strong>word = NO ;<br />
8 nw = 0;<br />
10 c = getchar();<br />
11 while ( c != EOF ) {<br />
15 if ( c == ’ ’ || c == ’\n’ || c == ’\t’)<br />
16 <strong>in</strong>word = NO;<br />
17 else if (<strong>in</strong>word == NO) {<br />
18 <strong>in</strong>word = YES ;
766 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
19 ++nw;<br />
20 }<br />
21 c = getchar();<br />
22 }<br />
24 pr<strong>in</strong>tf("%d",nw);<br />
26 }<br />
Fact is apparent from the projection of<br />
:: char<br />
which <strong>in</strong>cludes , and from , which is function<br />
char<br />
mk-<br />
false<br />
mkif<br />
then mkelse<br />
let if hd<br />
then<br />
else if<br />
then<br />
else<br />
if hd<br />
then false<br />
else if<br />
then true<br />
else <strong>in</strong><br />
mk- tl<br />
Clearly, is less obvious than what we have been seen above <strong>for</strong> the other<br />
slices. Firstly, because there are two state variables ( and ) <strong>in</strong>stead of one. So<br />
our first step is to package them together,<br />
:: char<br />
::
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
767<br />
and redef<strong>in</strong>e<br />
accord<strong>in</strong>gly<br />
mkif<br />
then mkelse<br />
mk- tl hd<br />
where<br />
is <strong>in</strong>troduced to capture the state accumulation process:<br />
char<br />
mkif<br />
then mkelse<br />
if<br />
then mkelse<br />
mk-<br />
false<br />
true<br />
(7)<br />
Secondly, because the removal of the accumulator will not br<strong>in</strong>g — unlike the<br />
preced<strong>in</strong>g examples — a denotation as clean as we would like:<br />
char<br />
char<br />
(8)<br />
if<br />
then mk-<br />
false<br />
else hd tl<br />
The fact that an auxiliary function ( ) is still needed br<strong>in</strong>gs about some questions:<br />
shouldn’t Boolean entry <strong>in</strong> have already disappeared? how <strong>do</strong> we<br />
proceed? is there a limit to the program trans<strong>for</strong>mations we have been apply<strong>in</strong>g so far?<br />
9 Need <strong>for</strong> a notation trans<strong>for</strong>mation<br />
Certa<strong>in</strong>ly we could go further <strong>in</strong> conventional (ie. lambda-calculus based) trans<strong>for</strong>mations.<br />
However, such “bagatelle-sized” reason<strong>in</strong>g is unlikely to scale up to realistic situations.<br />
Why? Legacy code normally <strong>in</strong>volves <strong>in</strong>tricate collections of global variables<br />
whose mean<strong>in</strong>g is obfuscated by loop-<strong>in</strong>tercomb<strong>in</strong>ation, mak<strong>in</strong>g it harder and harder to
768 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
understand by others. It is no wonder that the <strong>for</strong>mal semantics of code made efficient<br />
<strong>in</strong> this way leads to volum<strong>in</strong>ous mathematical expressions <strong>in</strong>volv<strong>in</strong>g large state vectors<br />
which resist to symbolic manipulation.<br />
However, the programm<strong>in</strong>g <strong>in</strong>tuitions implicit <strong>in</strong> (2) and (3) make sense and actually<br />
validated by algebraic laws [BdM97] which programmers (as a rule) ignore but feel<br />
obvious about the semantics of the underly<strong>in</strong>g programm<strong>in</strong>g language 4 .<br />
Notation seems to be a factor aga<strong>in</strong>st the explicit use of such laws <strong>in</strong> denotational<br />
semantics. Po<strong>in</strong>twise notation <strong>in</strong>volv<strong>in</strong>g operators as well as variable symbols, logical<br />
connectives, quantifiers, etc. is not abstract enough. It also entails a loss <strong>in</strong> genericity<br />
which conventional eng<strong>in</strong>eer<strong>in</strong>g mathematics has learned to solve elsewhere by chang<strong>in</strong>g<br />
the “mathematical space”, <strong>for</strong> <strong>in</strong>stance by mov<strong>in</strong>g (temporarily) from the<br />
to the <strong>in</strong> the Laplace trans<strong>for</strong>mation [Kre88]:<br />
-space<br />
-space<br />
Given problem<br />
Subsidiary equation<br />
Solution of given problem<br />
Solution of subsidiary equation<br />
Quot<strong>in</strong>g [Kre88], p.242:<br />
The Laplace trans<strong>for</strong>mation is a method <strong>for</strong> solv<strong>in</strong>g differential equations (...)<br />
The process of solution consists of three ma<strong>in</strong> steps:<br />
1st step. The given “hard” problem is trans<strong>for</strong>med <strong>in</strong>to a “simple”<br />
equation (subsidiary equation).<br />
2nd step. The subsidiary equation is solved by purely algebraic manipulations.<br />
Side comment: what becomes far less obvious (to others) is their use <strong>in</strong> an <strong>in</strong>-discrim<strong>in</strong>ated<br />
and un<strong>do</strong>cumented way!
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
769<br />
3rd step. The solution of the subsidiary equation is trans<strong>for</strong>med back<br />
to obta<strong>in</strong> the solution of the given problem.<br />
In this way the Laplace trans<strong>for</strong>mation reduces the problem of solv<strong>in</strong>g a differential<br />
equation to an algebraic problem.<br />
This is precisely what we would like to <strong>do</strong> <strong>in</strong> reverse denotational semantics: obta<strong>in</strong><br />
the po<strong>in</strong>twise denotation of a program, trans<strong>for</strong>m it <strong>in</strong>to a subsidiary po<strong>in</strong>tfree denotation,<br />
obta<strong>in</strong> the solution us<strong>in</strong>g the po<strong>in</strong>tfree algebra of programs, and return back to the<br />
po<strong>in</strong>twise level where <strong>for</strong>mal method practitioners are used to express their thoughts.<br />
10 Algebra of programm<strong>in</strong>g<br />
In a more respected than loved paper, John Backus [Bac78] was among the first to<br />
alert computer programmers that computer languages alone are <strong>in</strong>sufficient, and that<br />
only languages which exhibit an algebra <strong>for</strong> reason<strong>in</strong>g about the objects they purport<br />
to describe will be useful <strong>in</strong> the long run. This l<strong>in</strong>e of thought has witnessed significant<br />
advances over the last decade, based on the so-called functorial approach to datatypes<br />
which, orig<strong>in</strong>at<strong>in</strong>g from [MA86], has reached the status of a thorough program calculus<br />
<strong>in</strong> [BdM97]. Because this style of calculation has become known as the Bird-Meertens<br />
<strong>for</strong>malism (BMF) [Bac88], we will refer to the trans<strong>for</strong>mation from the “ -space to the<br />
-space” (where stands <strong>for</strong> variable and <strong>for</strong> po<strong>in</strong>tfree) as the “BMF trans<strong>for</strong>mation”.<br />
11 Introduc<strong>in</strong>g the “BMF-trans<strong>for</strong>mation”<br />
Programm<strong>in</strong>g “trick” (2) is an <strong>in</strong>stance of a class of <strong>for</strong>mal program trans<strong>for</strong>mations<br />
known as fusion laws: two sequential computations of the same k<strong>in</strong>d — <strong>in</strong> this case,<br />
and — merge together (ie. “fused”) <strong>in</strong> a s<strong>in</strong>gle computation of the same<br />
k<strong>in</strong>d — <strong>in</strong> this case.<br />
Recall that we have decided to restrict ourselves to functional code blocks, ie.,<br />
pieces of code whose semantics can be expressed by functions , etc. So “ ” <strong>in</strong><br />
denotes function composition<br />
under typ<strong>in</strong>g rule<br />
(9)<br />
On the other hand, programm<strong>in</strong>g trick (3) has to <strong>do</strong> with “fus<strong>in</strong>g” two “parallel”<br />
computations <strong>in</strong>to a s<strong>in</strong>gle one and affiliates to another group of laws hav<strong>in</strong>g to <strong>do</strong> with
770 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
mutual recursion. These <strong>in</strong>volve the “angle bracket” comb<strong>in</strong>ator of (3), which we will<br />
refer to as the “split” comb<strong>in</strong>ator:<br />
cf. diagram<br />
(10)<br />
<br />
<br />
<br />
<br />
<strong>in</strong>volv<strong>in</strong>g projections and which are such that and .<br />
This is Backus [Bac78] construction operator, which can be specified as the follow<strong>in</strong>g<br />
polymorphic operator <strong>in</strong> <strong>VDM</strong>-SL notation:<br />
mk-<br />
(11)<br />
These comb<strong>in</strong>ators are rich <strong>in</strong> algebraic properties. For <strong>in</strong>stance, the<br />
laws<br />
are implicit <strong>in</strong> the diagram above, and the<br />
-cancellation<br />
, (12)<br />
-fusion law<br />
expresses “left distribution” of composition of over split, a law <strong>in</strong> which two “parallel”<br />
consumer functions and fuse with another, producer function .<br />
Already known s<strong>in</strong>ce Backus’ algebra of programs, law (13) is an example of an<br />
equation “<strong>in</strong> the -space”. The compression of notation and the power of such laws is<br />
particularly apparent <strong>in</strong> deal<strong>in</strong>g with recursion, that is, with <strong>in</strong>ductive datatypes. For<br />
conciseness, we will focus on the datatype of f<strong>in</strong>ite lists (which is the one present <strong>in</strong><br />
word count) and mention only the laws which are relevant <strong>for</strong> our calculations. (For a<br />
comprehensive account, see eg. [BdM97].)<br />
Consider the follow<strong>in</strong>g <strong>in</strong>ductive def<strong>in</strong>ition (<strong>in</strong> <strong>VDM</strong>-SL) of the function which computes<br />
the sum of a list of <strong>in</strong>tegers:<br />
(13)<br />
if<br />
then<br />
else hd<br />
tl<br />
(14)
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
771<br />
In general, any list process<strong>in</strong>g function with signature can be written<br />
accord<strong>in</strong>g to the follow<strong>in</strong>g recursive scheme, <strong>in</strong> <strong>VDM</strong>-SL:<br />
if<br />
then<br />
else hd tl<br />
(15)<br />
<strong>for</strong> some and . For <strong>in</strong>stance, and <strong>in</strong><br />
.<br />
A standard result <strong>in</strong> <strong>in</strong>ductive datatype theory tells us that each <strong>in</strong>stance of is<br />
uniquely determ<strong>in</strong>ed by the pair . Pairs of this k<strong>in</strong>d are called algebras (= collections<br />
of functions and constants) which will be described <strong>in</strong> a compact way by resort<strong>in</strong>g<br />
to a comb<strong>in</strong>ator which dualizes split (10):<br />
(16)<br />
cf. diagram<br />
<br />
<br />
(17)<br />
<br />
where denotes the disjo<strong>in</strong>t union of and .<br />
Split and its dual are related to each other by the follow<strong>in</strong>g exchange law, which<br />
makes it possible to express every function of type <strong>in</strong> two alternative<br />
ways<br />
(18)<br />
<strong>for</strong> , , and .<br />
Thanks to the comb<strong>in</strong>ator, one can record the whole <strong>in</strong><strong>for</strong>mation about algebra<br />
above <strong>in</strong>to a s<strong>in</strong>gle arrow<br />
(19)<br />
where<br />
denotes an arbitrary (but fixed) s<strong>in</strong>gleton type — eg.
772 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
<strong>in</strong> <strong>VDM</strong>-SL — and denotes the “everywhere ” constant function such that<br />
Go<strong>in</strong>g further <strong>in</strong> the same direction, we can let arrow (19) participate <strong>in</strong> a larger diagram<br />
which records the whole “algorithmic” <strong>in</strong><strong>for</strong>mation about (15):<br />
(20)<br />
(21)<br />
In this diagram: is the <strong>VDM</strong>-SL type of f<strong>in</strong>ite lists (whose elements belong to );<br />
is algebra which builds -lists 5 ; is the identity function such that<br />
<strong>for</strong> every ; the “recursive call”<br />
<strong>in</strong>volves the “product”<br />
comb<strong>in</strong>ator,<br />
(22)<br />
and its dual, the “sum” comb<strong>in</strong>ator:<br />
(23)<br />
Diagram (21) expresses an equation about<br />
which we re-write <strong>in</strong>to<br />
by <strong>in</strong>troduc<strong>in</strong>g and .<br />
Equation (24) is all we need to know to def<strong>in</strong>e — provided we <strong>in</strong>stantiate , ie.<br />
and . To express this uniqueness of , dependent on , we write — read “ -<br />
catamorphism” — <strong>in</strong>stead of :<br />
For <strong>in</strong>stance, catamorphism is the “BMF-trans<strong>for</strong>mation” of the equivalent,<br />
four l<strong>in</strong>e <strong>VDM</strong>-SL (po<strong>in</strong>twise) def<strong>in</strong>ition of (14). As an exercise, the reader can<br />
Operator<br />
is def<strong>in</strong>ed <strong>in</strong> <strong>VDM</strong>-SL by<br />
(24)<br />
(25)
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
773<br />
recover this def<strong>in</strong>ition of — or <strong>in</strong> general that of (15) — from (25) by apply<strong>in</strong>g<br />
standard laws of the algebra of programm<strong>in</strong>g known as -fusion,<br />
and<br />
-absorption<br />
among others.<br />
Catamorphisms extend to any polynomial and possess a number of remarkable<br />
properties, among which we select the mutual-recursion law, also called “Fokk<strong>in</strong>ga<br />
law”:<br />
cf. diagrams<br />
(26)<br />
(27)<br />
(28)<br />
and<br />
This law is a <strong>for</strong>mal basis <strong>for</strong> “parallel loop” <strong>in</strong>ter-comb<strong>in</strong>ation (or unravell<strong>in</strong>g). It will<br />
play the major rôle <strong>in</strong> the calculations which rema<strong>in</strong> to be <strong>do</strong>ne about .<br />
12 Back to — <strong>in</strong>troduction of mutual recursion<br />
Recall functions and <strong>in</strong> (8). We will now focus our attention on function<br />
which, <strong>in</strong> the -space,<br />
is list catamorphism<br />
By deliver<strong>in</strong>g a pair of outputs,<br />
false<br />
char<br />
(7) will split <strong>in</strong>to<br />
where
774 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
mkif<br />
then<br />
else<br />
char<br />
char<br />
that is (<strong>in</strong> the<br />
-space):<br />
where , <strong>do</strong>uble projection abbreviates , and resorts<br />
to McCarthy’s conditional comb<strong>in</strong>ator def<strong>in</strong>ed by<br />
where<br />
(29)<br />
(30)<br />
Function<br />
can be reshaped<br />
false<br />
exchange law (18)<br />
false<br />
<strong>in</strong> a way so that it can be handled by the mutual-recursion law (28), <strong>for</strong><br />
and false . From this we <strong>in</strong>fer , as follows:<br />
– Unknown :<br />
false<br />
<strong>in</strong>stantiations<br />
false<br />
-absorption (27)<br />
false<br />
-natural law (31) below<br />
false
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
775<br />
One of the -natural laws<br />
was used <strong>in</strong> this calculation. Back to the<br />
false<br />
-space, one obta<strong>in</strong>s<br />
(31)<br />
(32)<br />
– Unknown : its diagram is<br />
char<br />
char<br />
char<br />
char<br />
Calculation:<br />
<strong>in</strong>stantiation of<br />
-absorption (27)<br />
McCarthy conditional and -cancellation (12)<br />
Back to the<br />
-space:<br />
if<br />
then<br />
else<br />
F<strong>in</strong>ally,<br />
recall<br />
mutual recursion <strong>in</strong>troduced above<br />
-cancellation (12)
776 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
So we a<strong>do</strong>pt as <strong>in</strong> our f<strong>in</strong>al <strong>VDM</strong>-SL, which goes back <strong>in</strong>to the -space. For a<br />
more suggestive read<strong>in</strong>g, we negate condition and abstract <strong>in</strong>to a<br />
“look ahead <strong>for</strong> word separator” predicate 6 :<br />
char<br />
if<br />
then<br />
else if hd tl<br />
then tl<br />
else tl<br />
char<br />
hd<br />
This specification adds to our understand<strong>in</strong>g of word count<strong>in</strong>g mechanism:<br />
counts the number of transitions from a non-separator to a separator character, or the<br />
end of the <strong>in</strong>put stream. As anticipated earlier on, variable has disappeared<br />
throughout this calculation. In fact, it can be regarded as a state “flag” implement<strong>in</strong>g<br />
the “separator/non-separator” state automaton which is implicit <strong>in</strong> the orig<strong>in</strong>al code:<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Function is now a proper <strong>in</strong>ductive function: it belongs to the class of so-called<br />
paramorphisms [Mee92], which are very common <strong>in</strong> <strong>for</strong>mal specification (<strong>for</strong> <strong>in</strong>stance,<br />
the usual def<strong>in</strong>ition of the factorial function is a paramorphism). Depicted <strong>in</strong> a diagram,<br />
will look as follows<br />
char<br />
char<br />
char<br />
char<br />
char<br />
where<br />
The<br />
version of the if-then-else relies on <strong>VDM</strong>-SL’s logic of partial functions [FL98].
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
777<br />
char<br />
mkif<br />
then<br />
else<br />
char<br />
All <strong>in</strong> all, we can write<br />
ma<strong>in</strong>()<br />
where<br />
len . In <strong>VDM</strong>-SL notation:<br />
char<br />
mk-<br />
len<br />
13 Summary<br />
As pictured <strong>in</strong> Fig. 2, we have comb<strong>in</strong>ed a <strong>for</strong>mal method — algebra of programm<strong>in</strong>g<br />
[BdM97] — with a semi-<strong>for</strong>mal one — code slic<strong>in</strong>g [Wei81] — <strong>in</strong> order to per<strong>for</strong>m<br />
the reverse specification of a little program already studied <strong>in</strong> the code slic<strong>in</strong>g literature<br />
[GL91]: the word count program of [KR78]. We claim that we have gone deeper than<br />
[GL91] <strong>in</strong> understand<strong>in</strong>g this piece of code.<br />
A key po<strong>in</strong>t of the approach is its constructive style, based on a change of notation<br />
which leads to powerful algebraic laws of programm<strong>in</strong>g. This also helps <strong>in</strong> overcom<strong>in</strong>g<br />
shortcom<strong>in</strong>gs of the specification language. For <strong>in</strong>stance, the coproduct construct<br />
is not polymorphic <strong>in</strong> <strong>VDM</strong>-SL. For each particular and , a specific disjo<strong>in</strong>t union<br />
datatype has to be def<strong>in</strong>ed, eg. <strong>in</strong>:<br />
::<br />
::<br />
So <strong>in</strong>jections and <strong>in</strong>stantiate to mk- and mk- , respectively, and to:<br />
if isthen<br />
else
778 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
Source code slic<strong>in</strong>g<br />
- <strong>for</strong>mal<br />
Denotational semantics<br />
<strong>for</strong>mal<br />
Algebra of program<strong>in</strong>g<br />
+ <strong>for</strong>mal<br />
Figure 2: Formal/<strong>in</strong><strong>for</strong>mal method comb<strong>in</strong>ation.<br />
Thanks to the change of notation, all our reason<strong>in</strong>g <strong>in</strong>volv<strong>in</strong>g coproducts was handsomely<br />
carried out <strong>in</strong> the -space (po<strong>in</strong>tfree notation). Po<strong>in</strong>tfree reason<strong>in</strong>g went as far<br />
as absorb<strong>in</strong>g auxiliary variables and <strong>in</strong>troduc<strong>in</strong>g mutual recursion, a specification mechanism<br />
which programmers never make explicit because they fear lack<strong>in</strong>g efficiency.<br />
Last but not least, the approach adds to program understand<strong>in</strong>g <strong>in</strong> “catalogu<strong>in</strong>g” semantic<br />
denotations <strong>in</strong>to well-known classes of <strong>in</strong>ductive schemata (cata/paramorphisms)<br />
which are rich <strong>in</strong> algebraic properties and are amenable to further reason<strong>in</strong>g (eg. reimplementation).<br />
Slic<strong>in</strong>g can be regarded as a denotational semantics “shortcut” — it trims <strong>do</strong>wn<br />
the complexity of handl<strong>in</strong>g all program variables at the same time. In fact, the mutualrecursion<br />
law (28) could have been applied to the whole program — rather than to its<br />
slices — at the sacrifice of a lot more reason<strong>in</strong>g show<strong>in</strong>g eventually that the three slices<br />
are <strong>in</strong>dependent of each other: a result known as the banana-split law,<br />
which is a special case of (28).<br />
(33)<br />
14 Related work<br />
Venkatesh [Ven91] was among the first to address program slic<strong>in</strong>g from a <strong>for</strong>mal semantics<br />
po<strong>in</strong>t of view. References [CLM95, CLM96] present approaches to recover<br />
(functional) specifications from imperative source code us<strong>in</strong>g “symbolic execution”.<br />
Specifications are expressed by pre/post-conditions. References [GC96, GC99] also<br />
base their reverse eng<strong>in</strong>eer<strong>in</strong>g strategies on pre-/post-conditions. Similarly, [CLM98]<br />
uses a first order logic <strong>for</strong>mula to map a subset of the <strong>in</strong>put program variables onto a<br />
set of <strong>in</strong>itial states. This allows one to specify any <strong>in</strong>itial state of the program. So, the
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
779<br />
slice calculated by add<strong>in</strong>g such a condition on the <strong>in</strong>put variable to the slic<strong>in</strong>g criterion<br />
is called conditioned slice.<br />
Reference [HDS95] proposes the use of program trans<strong>for</strong>mation techniques to construct<br />
a syntactically unrestricted slice, and so to improve the simplification power of<br />
slic<strong>in</strong>g. In the same direction, [HD97] <strong>in</strong>troduces the concept of amorphous slic<strong>in</strong>g,<br />
which relaxes syntactic constra<strong>in</strong>ts traditional <strong>in</strong> slic<strong>in</strong>g, thus generat<strong>in</strong>g smaller slices.<br />
In this way, the applicability of slic<strong>in</strong>g to program comprehension is improved.<br />
The repertoire of <strong>for</strong>mal techniques <strong>for</strong> reverse eng<strong>in</strong>eer<strong>in</strong>g further <strong>in</strong>cludes “type<br />
<strong>in</strong>ference” [vDM99] and concept/cluster analysis [vDK99]. These are applicable ma<strong>in</strong>ly<br />
to the detection of objects and can also be comb<strong>in</strong>ed with techniques proposed <strong>in</strong> this<br />
paper. Likewise, semi-<strong>for</strong>mal techniques [Vil01a] <strong>for</strong> the detection of recurrent algorithmic<br />
structures can also contribute to the identification — and later to the <strong>for</strong>malization<br />
— of program patterns.<br />
15 Future work<br />
The technique <strong>for</strong> code reversal reported <strong>in</strong> this paper is work <strong>in</strong> progress and some of<br />
its problems are still open. At the heart of these we place the conjecture of section 7. As<br />
anticipated <strong>in</strong> section 7 and discussed <strong>in</strong> [VO01], it needs further attention <strong>in</strong> presence<br />
of non-term<strong>in</strong>ation, <strong>for</strong>c<strong>in</strong>g equality to give place to <strong>in</strong> (4) and thus a move towards<br />
the more general relational algebra of programm<strong>in</strong>g [Bac00, BdM97].<br />
A <strong>for</strong>thcom<strong>in</strong>g master thesis [Vil01b] is expected to present several exercises such<br />
as word count which will let us to know more about which laws of programm<strong>in</strong>g are<br />
relevant <strong>in</strong> this context and to improve the <strong>in</strong>terplay between the code slic<strong>in</strong>g and the<br />
algebra of programm<strong>in</strong>g techniques. Even with<strong>in</strong> the functional <strong>in</strong>terpretation, some<br />
program structures will require laws more powerful than those which we have been<br />
th<strong>in</strong>k<strong>in</strong>g of — <strong>for</strong> <strong>in</strong>stance, comonadic calculations [Par00].<br />
Acknowledgements<br />
The research described <strong>in</strong> this paper was carried out at the ALGORITMI R&D Center<br />
and was funded by the Portuguese Science and Technology Foundation (grant P060-<br />
P31B-09/97). The author wishes to thank Gustavo Villavicencio <strong>for</strong> his contribution to<br />
this paper, and Peter G. Larsen and Andreas Kerschbaumer <strong>for</strong> their comments. The use<br />
of <strong>VDM</strong>Tools R under a Free Academic Site License provided by IFAD is gratefully<br />
acknowledged.<br />
References<br />
[Bac78]<br />
J. Backus. Can programm<strong>in</strong>g be liberated from the von Neumann style? a functional<br />
style and its algebra of programs. CACM, 21(8):613–639, August 1978.
780 Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
[Bac88] R. Backhouse. An exploration of the Bird-Meertens Formalism, July 1988. Department<br />
of Comput<strong>in</strong>g Science, Gron<strong>in</strong>gen University, The Netherlands.<br />
[Bac00] R. C. Backhouse. Fixed po<strong>in</strong>t calculus, 2000. Summer School and Workshop on Algebraic<br />
and Coalgebraic Methods <strong>in</strong> the Mathematics of Program Construction, L<strong>in</strong>coln<br />
College, Ox<strong>for</strong>d, UK 10th to 14th April 2000.<br />
[BdM97] R. Bird and O. de Moor. Algebra of Programm<strong>in</strong>g. Series <strong>in</strong> Computer Science.<br />
Prentice-Hall International, 1997. C. A. R. Hoare, series editor.<br />
[BH93] R. C. Backhouse and P. F. Hoogendijk. Elements of a relational theory of datatypes.<br />
In S.A. Schuman, B. Möller, and H.A. Partsch, editors, Formal Program Development,<br />
number 755 <strong>in</strong> Lecture Notes <strong>in</strong> Computer Science, pages 7–42. Spr<strong>in</strong>ger, 1993.<br />
[CI90] E.J. Chikofsky and J. H. Cross II. Reverse eng<strong>in</strong>eer<strong>in</strong>g and design recovery: a taxonomy.<br />
IEEE Software, 7(1):13–17, 1990.<br />
[CLM95] Aniello Cimitile, Andrea De Lucia, and Malcolm C. Munro. Qualify<strong>in</strong>g reusable<br />
functions us<strong>in</strong>g symbolic execution. In Work<strong>in</strong>g Conference on Reverse Eng<strong>in</strong>eer<strong>in</strong>g,<br />
Toronto, Canada, pages 178–187, 1995.<br />
[CLM96] Aniello Cimitile, Andrea De Lucia, and Malcolm C. Munro. A specification driven<br />
slic<strong>in</strong>g process <strong>for</strong> identify<strong>in</strong>g reusable functions. Journal of Software Ma<strong>in</strong>tenance:<br />
Research and Practice, 8(3):145–178, 1996.<br />
[CLM98] Aniello Cimitile, Andrea De Lucia, and Malcolm C. Munro. Conditioned program<br />
slic<strong>in</strong>g. In<strong>for</strong>mation and Software Technology (special issue on program slic<strong>in</strong>g),<br />
40(11/12):595–607, 1998.<br />
[FL98] J. Fitzgerald and P.G. Larsen. Modell<strong>in</strong>g Systems: Practical Tools and Techniques <strong>for</strong><br />
Software Development . Cambridge University Press, 1st edition, 1998.<br />
[GC96] G.C. Gannod and B.H.C. Cheng. Us<strong>in</strong>g <strong>in</strong><strong>for</strong>mal and <strong>for</strong>mal techniques <strong>for</strong> the reverse<br />
eng<strong>in</strong>eer<strong>in</strong>g of C programs. Technical report, DCS, Michigan State University, 1996.<br />
[GC99] G.C. Gannod and B.H.C. Cheng. A <strong>for</strong>mal apprach <strong>for</strong> reverse eng<strong>in</strong>eer<strong>in</strong>g: a case<br />
study. In WCRE’99, 1999.<br />
[GL91] K. B. Gallagher and J. R. Lyle. Us<strong>in</strong>g program slic<strong>in</strong>g <strong>in</strong> software ma<strong>in</strong>tenance. IEEE<br />
Transactions on Software Eng<strong>in</strong>eer<strong>in</strong>g, 17(8):751–761, 1991.<br />
[HD97] M. Harman and S. Danicic. Amorphous program slic<strong>in</strong>g. In Proceed<strong>in</strong>gs of the 5th<br />
IEEE Internation Workshop on Program Comprehesion (IWPC’97) (Dearborn, Michigan,<br />
USA, May 1997), pages 70–79. IEEE Computer Society Press, Los Alamitos, Cali<strong>for</strong>nia,<br />
USA, 1997.<br />
[HDS95] Mark Harman, Sebastian Danicic, and Yoga Sivagurunathan. Program comprehension<br />
assisted by slic<strong>in</strong>g and trans<strong>for</strong>mation, July 1995. In Malcolm Munro, editor, 1st<br />
Durham Workshop on Program Comprehension, Durham University, UK.<br />
[Jon86] C. B. Jones. Systematic Software Development Us<strong>in</strong>g <strong>VDM</strong>. Series <strong>in</strong> Computer Science.<br />
Prentice-Hall International, 1986. C. A. R. Hoare.<br />
[KR78] B.W. Kernighan and D.M. Richtie. The C Programm<strong>in</strong>g Language. Prentice Hall,<br />
Englewood Cliffs, N.J., 1978.<br />
[Kre88] E. Kreyszig. Advanced Eng<strong>in</strong>eer<strong>in</strong>g Mathematics. John Wiley & Sons, Inc., 6th edition,<br />
1988.<br />
[MA86] E. G. Manes and M. A. Arbib. Algebraic Approaches to Program Semantics. Texts<br />
and Monographs <strong>in</strong> Computer Science. Spr<strong>in</strong>ger-Verlag, 1986. D. Gries, series editor.<br />
[Mee92] L. Meertens. Paramorphisms. Formal Aspects of Comput<strong>in</strong>g, 4:413–424, 1992.<br />
[NSO99] F. L. Neves, J. C. Silva, and J. N. Oliveira. Convert<strong>in</strong>g In<strong>for</strong>mal Meta-data to <strong>VDM</strong>-<br />
SL: A Reverse Calculation Approach . In <strong>VDM</strong> <strong>in</strong> Practice! A Workshop co-located<br />
with FM’99: The World Congress on Formal Methods, Toulouse, France, 20-21 September,<br />
September 1999.<br />
[Par00] A. Par<strong>do</strong>. Towards merg<strong>in</strong>g recursion and comonads. In WGP’2000, Ponte de Lima,<br />
July 2000.<br />
[vDK99] Arie van Deursen and Tobias Kuipers. Identify<strong>in</strong>g objects us<strong>in</strong>g cluster and concept<br />
analysis. ICSE’99, ACM, 1999.<br />
[vDM99] Arie van Deursen and Leon Moonen. Understand<strong>in</strong>g cobol systems us<strong>in</strong>g <strong>in</strong>ferred<br />
types. 7th IWPC’99, IEEE Computer Society, 1999.
Oliveira J.N.: "Bagatelle <strong>in</strong> C <strong>arranged</strong> <strong>for</strong> <strong>VDM</strong> SoLo"<br />
781<br />
[Ven91] G. A. Venkatesh. The semantic approach to program slic<strong>in</strong>g. In Proceed<strong>in</strong>gs of the<br />
ACM SIGPLAN ’91 Conference on Programm<strong>in</strong>g Language Design and Implementation<br />
(PLDI), volume 26, pages 107–119, 1991.<br />
[Vil01a] G. Villacicencio. Program analysis <strong>for</strong> the automatic detection of programm<strong>in</strong>g plans<br />
apply<strong>in</strong>g slic<strong>in</strong>g. In 5th European Conference on Software Ma<strong>in</strong>tenance and Reeng<strong>in</strong>eer<strong>in</strong>g.<br />
IEEE Computer Society, March 2001.<br />
[Vil01b] G. Villavicencio. Formalización de una estrategia <strong>in</strong>tegral de <strong>in</strong>geniería reversa, 2001.<br />
[VO01]<br />
Master’s thesis <strong>in</strong> preparation (University of San Luis, Argent<strong>in</strong>a).<br />
G. Villavicencio and J.N. Oliveira. Formal reverse calculation supported by code slic<strong>in</strong>g,<br />
2001. To appear <strong>in</strong> WCRE 2001: 8th Work<strong>in</strong>g Conference on Reverse Eng<strong>in</strong>eer<strong>in</strong>g<br />
, 2-5 October 2001, Stuttgart, Germany.<br />
[Wei81] Mark Weiser. Program slic<strong>in</strong>g. In Fifth International Conference on Software Eng<strong>in</strong>eer<strong>in</strong>g,<br />
San Diego, Cali<strong>for</strong>nia, March 1981.