30.12.2013 Views

May June 1980 - Commodore Computers

May June 1980 - Commodore Computers

May June 1980 - Commodore Computers

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

42 COMPUTE. MAY/JUNE. I 198O <strong>1980</strong> ISSUE 4oil<br />

BIG FILES ON A<br />

BIG FILES ON A<br />

SMALL<br />

Elizabeth Deal<br />

337 W. First Ave.<br />

Malvern,<br />

COMPUTER Pa,19355<br />

~a~~§5~<br />

337 W First Ave.<br />

The program described here demonstrates a way of<br />

reducing data storage requirements by a factor of<br />

eight. It is written in Microsoft Basic for a PET<br />

computer.<br />

I have seen several programs that create and<br />

use cross-index files mes for library search, statistical<br />

surveys and similar applications. They usually require<br />

large computers, such as a 48K system with two<br />

disk drives. . A very thorough file e handling system<br />

has been described recently by Dr. Sanger in the<br />

November, 1979, , issue of Microcomputing. In his<br />

article each attribute is coded as two letters and six<br />

attributes are permitted for each record. This T requires<br />

twelve letters and, therefore, twelve bytes.<br />

.<br />

In the method described here, each attribute is<br />

coded as yes or no and the user can have as many<br />

attributes as he desires. . If the application lends itself<br />

to such coding into a list of keys or attributes,<br />

then this system will permit the handling of large<br />

amounts off data in core at one onc time. It also permits<br />

the use of logical AND, OR or NOT operators in<br />

retrieval with any combination of attributes.<br />

By way of illustration, , a library ry search requires<br />

quick access to those entries that contain desired<br />

subject matter. Two, three, or six byte coding of<br />

each key is very core consuming, and limits the<br />

number of records that can be in core at one time.<br />

The solution I propose is twofold; : (1) set up a<br />

smart coding procedure for classification of subjects<br />

described in an article into keys that can be scored<br />

yes or no, and (2) ""pack" the data for storing it<br />

in corc, core, on tape or on disk, and then ""unpack"<br />

it, , one record at a time, during the search for the<br />

applicable attributes. This paper describes an efficient<br />

way to ""pack" and "unpack" the data so that a<br />

larger file e can be searched on a small computer<br />

without the use of accessory memory devices, such as<br />

disks. Of course, if one has a system with a disk<br />

the method described here would permit use of an even<br />

larger file. We are aware that the e first part of the<br />

solution (setting up the coding procedure) is challeng·<br />

ing. It is the real problem and the performance of<br />

the system depends on how logical and meaningful<br />

the selected keys are.<br />

Each logical record consists of the text part and<br />

the data part. The text part must be adequate for<br />

positive identification of the articles being searched,<br />

but the length should be kept to a minimum. . Name,<br />

date, and page might be enough. . The data part is<br />

what we can compress. The yes-no or 1-0 codes<br />

are entered in groups of fi fifteen fteen ones and zeros.<br />

These, in turn, , are packed into the e two byte bylC integer<br />

cgcr<br />

variable S % .<br />

variable S%.<br />

Fifteen attributes require two bytes, thirty attributes<br />

require four bytes, and so on. . A user of the<br />

system need not concern himself with what the<br />

system need not concern himself with what the<br />

program does with binary ry numbers. He only needs to<br />

know that there will be as many S% values per<br />

record as there are groups of fifteen keys. The user<br />

record as there are groups of fifteen keys. The user<br />

then needs to provide a decision for retrieving<br />

the records of interest to him. . The decision is<br />

written as a statement at the e beginning of a program<br />

and is immediately edited for syntax-type errors.<br />

and is immediately edited for syntax-type errors.<br />

Logical operators AND, , OR, NOT, , as well as arithmetic<br />

ones ( ~ , < >, < , > ) are used . The<br />

metic ones ( =, , ) are used. The<br />

decision can be written on one or more lines leading<br />

to a combining variable TR. TR is set to one if true,<br />

to a combining variable TR. TR is set to one if true,<br />

and all records meeting TR condition are then<br />

displayed. Complete instructions for writing TRT R lines<br />

are listed in lines 2970 to 3420.<br />

How is it done? For once those long tables of<br />

powers of two, that are a part of every book on<br />

programming, come in handy. The program is set up<br />

in such a way that the user thinks of the list of<br />

fifteen ftee n keys from left to right, 1 to 15. TheT program<br />

sees them as being numbered from right to left, 0 to 14.<br />

Like this:<br />

-Key numbers B%(k)k = ~ 1 233 4 5676 7 8 9 101112<br />

11 12<br />

13 14 15<br />

-Program sees as m ~ 1413 12 11 10 9 8765 4 3<br />

-Program sees as m = 14 13 12 11 10 9 8 7 6 5 4 3<br />

2 1 0<br />

-Input key values 100011111100000<br />

·Input key values 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0<br />

The program now takes key values and wherever it<br />

finds a ""1" it raises 2 to the m·th m-th power. The sum of<br />

all this is then stored in integer variable S S%(record<br />

number, , sum number)' number)*. . The bytes are used instead<br />

of at least 15. During the process of retrieval the<br />

of at least 15. During the process of retrieval the<br />

opposite procedure takes place - the sum is "un<br />

­<br />

packed" into working storage of 15 values. The same<br />

packed " into working storage of 15 values. T he same<br />

values are, of course, reused by all records. The lines<br />

of the program that drive this system are 1470 to<br />

1510 and 1920 to 1990 the other way. It seems like<br />

1510 and 1920 to 1990 the other way. It seems like<br />

a lot of hassle, but the core saving is tremendous.<br />

The loops that do the packing and unpacking take<br />

The loops that do the packing and unpacking take<br />

from 0.2 second to 0.9 second, the latter representing<br />

all fi ftee n bits on . (These times could be reduced by<br />

all fifteen bits on. (These times could be reduced by<br />

rewriting these two loops as machine code subrou­<br />

tines.) Another way to save time would be to set up<br />

tines.) Another way to save time would be to set up<br />

the most frequently used keys next to one another as<br />

this will leave the loop sooner. In the example pi ~ shown<br />

above, the program will loop ten times. Had a " 1"<br />

above, the program will loop ten times. Had a "1"<br />

been in position 4 or 11 the loop would be executed<br />

five times.<br />

The program now has two sections. One packs<br />

the data, the other unpacks it. In between, , the<br />

values should be stored on tape. And at the beginning,<br />

values should be stored on tape. And at the beginning,<br />

routines for creating and updating fil files should be<br />

provided. As listed , the program works as if it were a<br />

provided. As listed, the program works as if it were a<br />

file e system. . It can be used as a training ground in<br />

writing decision lines. It should be used as a part<br />

writing decision lines. It should be used as a part<br />

of a larger system. .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!