II. Notes on Data Structuring * - Cornell University
II. Notes on Data Structuring * - Cornell University II. Notes on Data Structuring * - Cornell University
120 c.A.R. HOARE An alternative method of representation of multidimensional arrays is sometimes known as a codeword or descriptor method, but we shall give it the title of "tree representation". The essence of the method is to allocate a single-dimensional base array with one element corresponding to each row of the array, and to place in it the address of a block of consecutive storage locations which holds the values of that row. These rows do not have to be contiguous. Now the process of accessing or updating each element does not have to be done by computing a minimal representation of the subscript. All that is necessary is to add the row-number to the address of the first element of the base of the tree, and thus access the address of the first element of the required row, to which the value of the next subscript is added to give the address of the required element. Standard A [0,o] A [0, I] A [0,2] A [1,o] A[1,1] A [1,2] row 2 A [3,O] A[3,1] A [3,2] Tree AE3,o] A[3,:] A [~,Z] (a) (b) FIG. 4. Representation of two-dimensional arrays A[0,O] A[0,1] A[0,2] A[I,0] All,l] AIZ1,2] it row2 The choice between unpacked and packed representations of arrays is made on grounds similar to the choice in the case of a Cartesian product. The unpacked representation is used when fast access and updating is required; it is also the obviously appropriate choice when the range type naturally fits within computer word boundaries, for example if the elements are floating point numbers. The packed representation is recommended if the size of the elements is considerably shorter than a single word, and if storage is short, or if copying and comparison of the arrays is frequent compared with subscripting and selective updating. A particularly common case of packed arrays is the representation of identifiers in a programming language, where it is acceptable in the interests of efficiency to truncate identifiers which are too long to fit into the standard array, and pad out those that are too short with blanks.
NOTES ON DATA STRUCTURING 121 The choice between representations of multidimensional arrays is made on quite different grounds. The standard representation is more economical of storage, and gives good efficiency on sequencing through elements of the array by rows, columns, or both. Furthermore, it is more convenient when the arrays must be transferred as a whole between main and backing store. However, on a machine with slow multiplication, it will be faster to use the tree representation, and accept the extra storage required to hold the array of addresses, which is small provided that the rows are not too short. If each row contains only two words, there would be a fifty per cent overhead on data storage. There are several other possible reasons for choosing the tree represen- tation: (1) In some computing environments, where dynamic storage allocation is standard, it may be difficult to obtain large consecutive areas, in which case a large two-dimensional array can be split up into a number of smaller rows which can be accommodated without trouble. (2) It is possible to set up a scheme whereby some rows of the array are held on backing store while other rows are being processed, and then the backing store address of a row replaces the main store address in the base array when that row is absent from store. Thus it is hoped to be able to process arrays which are too large to be wholly accommodated in main store together with the program that processes them. However, the economics of this operation need to be carefully examined to ensure that the number of backing store transfers involved is acceptable. (3) In some applications, it is known that several matrices share the same rows. In the tree representation it is possible to set up a single copy of such a shared row, and merely take copies of its address rather than its full value. But in such a case, the shared row must not be selectively updated. (4) The tree representation is recommended even in the case of single- dimensional arrays if the size of the individual elements is highly variable; and on multidimensional arrays, if the length of the rows is highly variable. Exercise The character set of an input device includes only thirty characters, defined by enumeration; they include the characters space, newline, newpage. Characters may be read in one at a time from an input device to a buffer, using a procedure call read next character. They should be assembled line by line into an array page'display page,
- Page 1 and 2: II. Notes<
- Page 3 and 4: NOTES ON DATA STRUCTURING 85 Now we
- Page 5 and 6: NOTES ON DATA STRUCTURING 87 capabl
- Page 7 and 8: NOTES ON DATA STRUCTURING 89 easy f
- Page 9 and 10: NOTES ON DATA STRUCTURING 91 possib
- Page 11 and 12: NOTES ON DATA STRUCTURING 93 this s
- Page 13 and 14: NOTES ON DATA STRUCTURING 95 In the
- Page 15 and 16: NOTES ON DATA STRUCTURING 97 values
- Page 17 and 18: NOTES ON DATA STRUCTURING 99 where
- Page 19 and 20: NOTES ON DATA STRUCTURING 101 subra
- Page 21 and 22: NOTES ON DATA STRUCTURING 103 else
- Page 23 and 24: NOTES ON DATA STRUCTURING 105 are r
- Page 25 and 26: NOTES ON DATA STRUCTURING 107 In in
- Page 27 and 28: NOTES ON DATA STRUCTURING 109 The m
- Page 29 and 30: NOTES ON DATA STRUCTURING 111 Every
- Page 31 and 32: NOTES ON DATA STRUCTURING 113 ways
- Page 33 and 34: NOTES ON DATA STRUCTURING 115 6. TI
- Page 35 and 36: 6.1. MANIPULATION NOTES ON DATA STR
- Page 37: NOTES ON DATA STRUCTURING 119 shift
- Page 41 and 42: NOTES ON DATA STRUCTURING 123 floor
- Page 43 and 44: NOTES ON DATA STRUCTURING 125 (11)
- Page 45 and 46: NOTES ON DATA STRUCTURING 127 each
- Page 47 and 48: end NOTES ON DATA STRUCTURING 129 n
- Page 49 and 50: NOTES ON DATA STRUCTURING 131 littl
- Page 51 and 52: NOTES ON DATA STRUCTURING 133 can b
- Page 53 and 54: NOTES ON DATA STRUCTURING 135 In so
- Page 55 and 56: NOTES ON DATA STRUCTURING 137 whole
- Page 57 and 58: NOTES ON DATA STRUCTURING 139 in th
- Page 59 and 60: NOTES ON DATA STRUCTURING 141 indic
- Page 61 and 62: NOTES ON DATA STRUCTURING 143 to th
- Page 63 and 64: NOTES ON DATA STRUCTURING 145 The f
- Page 65 and 66: 9.2. EXAMPLE NOTES ON DATA STRUCTUR
- Page 67 and 68: NOTES ON DATA STRUCTURING 149 The r
- Page 69 and 70: NOTES ON DATA STRUCTURING 151 A sim
- Page 71 and 72: NOTES ON DATA STRUCTURING 153 entry
- Page 73 and 74: NOTES ON DATA STRUCTURING 155 Howev
- Page 75 and 76: The condition may be formalised: NO
- Page 77 and 78: NOTES ON DATA STRUCTURING 159 The p
- Page 79 and 80: procedure gensupersets; NOTES ON DA
- Page 81 and 82: end; end gensupersets; NOTES ON DAT
- Page 83 and 84: NOTES ON DATA STRUCTURING 165 An al
- Page 85 and 86: NOTES ON DATA STRUCTURING 167 The a
- Page 87 and 88: Abbreviations' NOTES ON DATA STRUCT
120 c.A.R. HOARE<br />
An alternative method of representati<strong>on</strong> of multidimensi<strong>on</strong>al arrays is<br />
sometimes known as a codeword or descriptor method, but we shall give it<br />
the title of "tree representati<strong>on</strong>". The essence of the method is to allocate a<br />
single-dimensi<strong>on</strong>al base array with <strong>on</strong>e element corresp<strong>on</strong>ding to each row<br />
of the array, and to place in it the address of a block of c<strong>on</strong>secutive storage<br />
locati<strong>on</strong>s which holds the values of that row. These rows do not have to be<br />
c<strong>on</strong>tiguous. Now the process of accessing or updating each element does<br />
not have to be d<strong>on</strong>e by computing a minimal representati<strong>on</strong> of the subscript.<br />
All that is necessary is to add the row-number to the address of the first<br />
element of the base of the tree, and thus access the address of the first element<br />
of the required row, to which the value of the next subscript is added to give<br />
the address of the required element.<br />
Standard<br />
A [0,o]<br />
A [0, I]<br />
A [0,2]<br />
A [1,o]<br />
A[1,1]<br />
A [1,2]<br />
row 2<br />
A [3,O]<br />
A[3,1]<br />
A [3,2]<br />
Tree<br />
AE3,o]<br />
A[3,:]<br />
A [~,Z]<br />
(a) (b)<br />
FIG. 4. Representati<strong>on</strong> of two-dimensi<strong>on</strong>al arrays<br />
A[0,O]<br />
A[0,1]<br />
A[0,2]<br />
A[I,0]<br />
All,l]<br />
AIZ1,2]<br />
it row2<br />
The choice between unpacked and packed representati<strong>on</strong>s of arrays is<br />
made <strong>on</strong> grounds similar to the choice in the case of a Cartesian product.<br />
The unpacked representati<strong>on</strong> is used when fast access and updating is<br />
required; it is also the obviously appropriate choice when the range type<br />
naturally fits within computer word boundaries, for example if the elements<br />
are floating point numbers. The packed representati<strong>on</strong> is recommended if<br />
the size of the elements is c<strong>on</strong>siderably shorter than a single word, and if<br />
storage is short, or if copying and comparis<strong>on</strong> of the arrays is frequent<br />
compared with subscripting and selective updating. A particularly comm<strong>on</strong><br />
case of packed arrays is the representati<strong>on</strong> of identifiers in a programming<br />
language, where it is acceptable in the interests of efficiency to truncate<br />
identifiers which are too l<strong>on</strong>g to fit into the standard array, and pad out<br />
those that are too short with blanks.