II. Notes on Data Structuring * - Cornell University

20.03.2013 Views
152 c.A.R. HOARE 10.1.1. Sequential Representation The simplest representation of a sparse array type is as a sequence of entries; i.e. sparse array D of R is represented as if it had been declared (default: R; s: sequence (key: D; information: R)). One of the possible sequence representations must now be chosen, in accordance with the same criteria that are used in the case of a sequence. But when a sequence is used to represent a sparse array, the order of the entries is immaterial, and does not have to reflect the relative times at which the entries were made. Thus the entries are often sorted into order of their key-value, particularly if this is the order in which they are going to be scanned. The chief disadvantage of the sequential representation is the length of time taken to access the element corresponding to a random subscript. In the case of structures of any great size, the program designer usually goes to considerable trouble to ensure that entries are accessed in the same standard order that they are stored in the sequence; and that if new entries are to be inserted, these are also sorted and then merged with the original sequence. Thus the standard commercial practice of batch processing and updating of sequential files may be regarded as a practical implementation of the abstract concept of a sparse array on the rather unsympathetic medium of magnetic tape. 10.1.2. Tabular Representation If there is an acceptably low upper limit N to the number of entries in a sparse mapping, a great increase in speed of lookup can be achieved by the tabular representation, in which the sparse mapping sparse array D of R is represented as a nonsparse array" (default: R; occupied: powerset 0.. N; array 0.. N of (key: D; information: R)). If all the significant entries are collected before they are used, the table can be sorted, and ther the entry with a given key can be rapidly located by logarithmic search. If access to the elements of the array is interleaved with addition of new entries, some form of hash-table technique is indicated. For this an arbitrary "hashing" function is chosen, which maps the domain type D into an integer in the range 0..N. When the entry is inserted, it is placed at this position in the table; so whenever that entry is accessed, use of the same hashing function will find it there. If that position is already occupied by an

NOTES ON DATA STRUCTURING 153 entry with a different key, some other vacant position in the table must be found. It is quite usual to search for such a vacant position in the next following locations of the table; but when the table is nearly full, this may cause undesirable bunching around an area of the table which happens to be popular. A solution to this problem is to choose N + 1 as a prime number, and to use a second hashing function to compute an arbitrary step length from any given key. The next position to try when any given position is full is obtained by adding the step length (modulo N + 1) to the previous position. 10.1.3. lndexed Representation The tabular method of storage is suitable only when the whole table can be accommodated in the main store of the computer. In the common case when this is not possible, a mixture of the tabular and sequential methods is often used. In this a sparse array is represented as a table, each of whose entries is a sequence: (default: R; table:array 1.. N of (max:D; seq: sequence (key:D; information:R))). Every entry is placed on that sequence i such that its key falls between table [i -1]. max (or D.min if i = 1) and table [i].max. The table is sorted so that the appropriate sequence can be quickly located. This technique may be likened to the organisation of a multivolume encyclopaedia, in which the keys of the first and last entries of each volume are indicated on the spine, so that the right volume can be quickly identified, without extracting the volumes from the shelf. When using this representation, it is desirable to ensure that all sequences are of roughly the same length. Indeed, if disc backing store is used, it is very advantageous to ensure that each of them is fitted onto a single cylinder, so that a random access will not involve more than a single head movement. Thus, when one sequence gets too long, it must exchange material with the adjacent sequence. This involves extracting the entries with the largest and/or smallest keys, and is best done when all the sequences are sorted into order of key-value. The sorting and reshuffling is often carried out as a separate operation at regular intervals; and the general method of file organisation is known as "indexed sequential". Naturally in this method of representation, it is an advantage to keep the sequences as short as possible, say less than a single track on disk. Consequently, the table itself may get so large that it will no longer fit in main store. In this case the table itself is split up into sections, and a second- level table may be set up to point to its sections, using the same principle again. Thus at least two accesses to backing store will in general be required for each access to an element of the array, and it is strongly recommended i

Page 1 and 2: II. Notes<

Page 3 and 4: NOTES ON DATA STRUCTURING 85 Now we

Page 5 and 6: NOTES ON DATA STRUCTURING 87 capabl

Page 7 and 8: NOTES ON DATA STRUCTURING 89 easy f

Page 9 and 10: NOTES ON DATA STRUCTURING 91 possib

Page 11 and 12: NOTES ON DATA STRUCTURING 93 this s

Page 13 and 14: NOTES ON DATA STRUCTURING 95 In the

Page 15 and 16: NOTES ON DATA STRUCTURING 97 values

Page 17 and 18: NOTES ON DATA STRUCTURING 99 where

Page 19 and 20: NOTES ON DATA STRUCTURING 101 subra

Page 21 and 22: NOTES ON DATA STRUCTURING 103 else

Page 23 and 24: NOTES ON DATA STRUCTURING 105 are r

Page 25 and 26: NOTES ON DATA STRUCTURING 107 In in

Page 27 and 28: NOTES ON DATA STRUCTURING 109 The m

Page 29 and 30: NOTES ON DATA STRUCTURING 111 Every

Page 31 and 32: NOTES ON DATA STRUCTURING 113 ways

Page 33 and 34: NOTES ON DATA STRUCTURING 115 6. TI

Page 35 and 36: 6.1. MANIPULATION NOTES ON DATA STR

Page 37 and 38: NOTES ON DATA STRUCTURING 119 shift

Page 39 and 40: NOTES ON DATA STRUCTURING 121 The c

Page 41 and 42: NOTES ON DATA STRUCTURING 123 floor

Page 43 and 44: NOTES ON DATA STRUCTURING 125 (11)

Page 45 and 46: NOTES ON DATA STRUCTURING 127 each

Page 47 and 48: end NOTES ON DATA STRUCTURING 129 n

Page 49 and 50: NOTES ON DATA STRUCTURING 131 littl

Page 51 and 52: NOTES ON DATA STRUCTURING 133 can b

Page 53 and 54: NOTES ON DATA STRUCTURING 135 In so

Page 55 and 56: NOTES ON DATA STRUCTURING 137 whole

Page 57 and 58: NOTES ON DATA STRUCTURING 139 in th

Page 59 and 60: NOTES ON DATA STRUCTURING 141 indic

Page 61 and 62: NOTES ON DATA STRUCTURING 143 to th

Page 63 and 64: NOTES ON DATA STRUCTURING 145 The f

Page 65 and 66: 9.2. EXAMPLE NOTES ON DATA STRUCTUR

Page 67 and 68: NOTES ON DATA STRUCTURING 149 The r

Page 69: NOTES ON DATA STRUCTURING 151 A sim

Page 73 and 74: NOTES ON DATA STRUCTURING 155 Howev

Page 75 and 76: The condition may be formalised: NO

Page 77 and 78: NOTES ON DATA STRUCTURING 159 The p

Page 79 and 80: procedure gensupersets; NOTES ON DA

Page 81 and 82: end; end gensupersets; NOTES ON DAT

Page 83 and 84: NOTES ON DATA STRUCTURING 165 An al

Page 85 and 86: NOTES ON DATA STRUCTURING 167 The a

Page 87 and 88: Abbreviations' NOTES ON DATA STRUCT

Page 89 and 90: 12.5 POWERSETS NOTES ON DATA STRUCT

Page 91 and 92: (5) initial (x~T(d)) = x (6) x (y z

representation

sequence

array

defined

structuring

variable

operations

method

hoare

sparse

cornell

www.cs.cornell.edu

II. Notes on Data Structuring * - Cornell University

II. Notes on Data Structuring * - Cornell University ... View more II. Notes on Data Structuring * - Cornell University

Delete template?

Save as template ?

II. Notes on Data Structuring * - Cornell University II. Notes on Data Structuring * - Cornell University