II. Notes on Data Structuring * - Cornell University

II. Notes on Data Structuring * - Cornell University II. Notes on Data Structuring * - Cornell University

cs.cornell.edu
from cs.cornell.edu More from this publisher
20.03.2013 Views

120 c.A.R. HOARE An alternative method of representation of multidimensional arrays is sometimes known as a codeword or descriptor method, but we shall give it the title of "tree representation". The essence of the method is to allocate a single-dimensional base array with one element corresponding to each row of the array, and to place in it the address of a block of consecutive storage locations which holds the values of that row. These rows do not have to be contiguous. Now the process of accessing or updating each element does not have to be done by computing a minimal representation of the subscript. All that is necessary is to add the row-number to the address of the first element of the base of the tree, and thus access the address of the first element of the required row, to which the value of the next subscript is added to give the address of the required element. Standard A [0,o] A [0, I] A [0,2] A [1,o] A[1,1] A [1,2] row 2 A [3,O] A[3,1] A [3,2] Tree AE3,o] A[3,:] A [~,Z] (a) (b) FIG. 4. Representation of two-dimensional arrays A[0,O] A[0,1] A[0,2] A[I,0] All,l] AIZ1,2] it row2 The choice between unpacked and packed representations of arrays is made on grounds similar to the choice in the case of a Cartesian product. The unpacked representation is used when fast access and updating is required; it is also the obviously appropriate choice when the range type naturally fits within computer word boundaries, for example if the elements are floating point numbers. The packed representation is recommended if the size of the elements is considerably shorter than a single word, and if storage is short, or if copying and comparison of the arrays is frequent compared with subscripting and selective updating. A particularly common case of packed arrays is the representation of identifiers in a programming language, where it is acceptable in the interests of efficiency to truncate identifiers which are too long to fit into the standard array, and pad out those that are too short with blanks.

NOTES ON DATA STRUCTURING 121 The choice between representations of multidimensional arrays is made on quite different grounds. The standard representation is more economical of storage, and gives good efficiency on sequencing through elements of the array by rows, columns, or both. Furthermore, it is more convenient when the arrays must be transferred as a whole between main and backing store. However, on a machine with slow multiplication, it will be faster to use the tree representation, and accept the extra storage required to hold the array of addresses, which is small provided that the rows are not too short. If each row contains only two words, there would be a fifty per cent overhead on data storage. There are several other possible reasons for choosing the tree represen- tation: (1) In some computing environments, where dynamic storage allocation is standard, it may be difficult to obtain large consecutive areas, in which case a large two-dimensional array can be split up into a number of smaller rows which can be accommodated without trouble. (2) It is possible to set up a scheme whereby some rows of the array are held on backing store while other rows are being processed, and then the backing store address of a row replaces the main store address in the base array when that row is absent from store. Thus it is hoped to be able to process arrays which are too large to be wholly accommodated in main store together with the program that processes them. However, the economics of this operation need to be carefully examined to ensure that the number of backing store transfers involved is acceptable. (3) In some applications, it is known that several matrices share the same rows. In the tree representation it is possible to set up a single copy of such a shared row, and merely take copies of its address rather than its full value. But in such a case, the shared row must not be selectively updated. (4) The tree representation is recommended even in the case of single- dimensional arrays if the size of the individual elements is highly variable; and on multidimensional arrays, if the length of the rows is highly variable. Exercise The character set of an input device includes only thirty characters, defined by enumeration; they include the characters space, newline, newpage. Characters may be read in one at a time from an input device to a buffer, using a procedure call read next character. They should be assembled line by line into an array page'display page,

120 c.A.R. HOARE<br />

An alternative method of representati<strong>on</strong> of multidimensi<strong>on</strong>al arrays is<br />

sometimes known as a codeword or descriptor method, but we shall give it<br />

the title of "tree representati<strong>on</strong>". The essence of the method is to allocate a<br />

single-dimensi<strong>on</strong>al base array with <strong>on</strong>e element corresp<strong>on</strong>ding to each row<br />

of the array, and to place in it the address of a block of c<strong>on</strong>secutive storage<br />

locati<strong>on</strong>s which holds the values of that row. These rows do not have to be<br />

c<strong>on</strong>tiguous. Now the process of accessing or updating each element does<br />

not have to be d<strong>on</strong>e by computing a minimal representati<strong>on</strong> of the subscript.<br />

All that is necessary is to add the row-number to the address of the first<br />

element of the base of the tree, and thus access the address of the first element<br />

of the required row, to which the value of the next subscript is added to give<br />

the address of the required element.<br />

Standard<br />

A [0,o]<br />

A [0, I]<br />

A [0,2]<br />

A [1,o]<br />

A[1,1]<br />

A [1,2]<br />

row 2<br />

A [3,O]<br />

A[3,1]<br />

A [3,2]<br />

Tree<br />

AE3,o]<br />

A[3,:]<br />

A [~,Z]<br />

(a) (b)<br />

FIG. 4. Representati<strong>on</strong> of two-dimensi<strong>on</strong>al arrays<br />

A[0,O]<br />

A[0,1]<br />

A[0,2]<br />

A[I,0]<br />

All,l]<br />

AIZ1,2]<br />

it row2<br />

The choice between unpacked and packed representati<strong>on</strong>s of arrays is<br />

made <strong>on</strong> grounds similar to the choice in the case of a Cartesian product.<br />

The unpacked representati<strong>on</strong> is used when fast access and updating is<br />

required; it is also the obviously appropriate choice when the range type<br />

naturally fits within computer word boundaries, for example if the elements<br />

are floating point numbers. The packed representati<strong>on</strong> is recommended if<br />

the size of the elements is c<strong>on</strong>siderably shorter than a single word, and if<br />

storage is short, or if copying and comparis<strong>on</strong> of the arrays is frequent<br />

compared with subscripting and selective updating. A particularly comm<strong>on</strong><br />

case of packed arrays is the representati<strong>on</strong> of identifiers in a programming<br />

language, where it is acceptable in the interests of efficiency to truncate<br />

identifiers which are too l<strong>on</strong>g to fit into the standard array, and pad out<br />

those that are too short with blanks.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!