May June 1980 - Commodore Computers
May June 1980 - Commodore Computers
May June 1980 - Commodore Computers
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
42 COMPUTE. MAY/JUNE. I 198O <strong>1980</strong> ISSUE 4oil<br />
BIG FILES ON A<br />
BIG FILES ON A<br />
SMALL<br />
Elizabeth Deal<br />
337 W. First Ave.<br />
Malvern,<br />
COMPUTER Pa,19355<br />
~a~~§5~<br />
337 W First Ave.<br />
The program described here demonstrates a way of<br />
reducing data storage requirements by a factor of<br />
eight. It is written in Microsoft Basic for a PET<br />
computer.<br />
I have seen several programs that create and<br />
use cross-index files mes for library search, statistical<br />
surveys and similar applications. They usually require<br />
large computers, such as a 48K system with two<br />
disk drives. . A very thorough file e handling system<br />
has been described recently by Dr. Sanger in the<br />
November, 1979, , issue of Microcomputing. In his<br />
article each attribute is coded as two letters and six<br />
attributes are permitted for each record. This T requires<br />
twelve letters and, therefore, twelve bytes.<br />
.<br />
In the method described here, each attribute is<br />
coded as yes or no and the user can have as many<br />
attributes as he desires. . If the application lends itself<br />
to such coding into a list of keys or attributes,<br />
then this system will permit the handling of large<br />
amounts off data in core at one onc time. It also permits<br />
the use of logical AND, OR or NOT operators in<br />
retrieval with any combination of attributes.<br />
By way of illustration, , a library ry search requires<br />
quick access to those entries that contain desired<br />
subject matter. Two, three, or six byte coding of<br />
each key is very core consuming, and limits the<br />
number of records that can be in core at one time.<br />
The solution I propose is twofold; : (1) set up a<br />
smart coding procedure for classification of subjects<br />
described in an article into keys that can be scored<br />
yes or no, and (2) ""pack" the data for storing it<br />
in corc, core, on tape or on disk, and then ""unpack"<br />
it, , one record at a time, during the search for the<br />
applicable attributes. This paper describes an efficient<br />
way to ""pack" and "unpack" the data so that a<br />
larger file e can be searched on a small computer<br />
without the use of accessory memory devices, such as<br />
disks. Of course, if one has a system with a disk<br />
the method described here would permit use of an even<br />
larger file. We are aware that the e first part of the<br />
solution (setting up the coding procedure) is challeng·<br />
ing. It is the real problem and the performance of<br />
the system depends on how logical and meaningful<br />
the selected keys are.<br />
Each logical record consists of the text part and<br />
the data part. The text part must be adequate for<br />
positive identification of the articles being searched,<br />
but the length should be kept to a minimum. . Name,<br />
date, and page might be enough. . The data part is<br />
what we can compress. The yes-no or 1-0 codes<br />
are entered in groups of fi fifteen fteen ones and zeros.<br />
These, in turn, , are packed into the e two byte bylC integer<br />
cgcr<br />
variable S % .<br />
variable S%.<br />
Fifteen attributes require two bytes, thirty attributes<br />
require four bytes, and so on. . A user of the<br />
system need not concern himself with what the<br />
system need not concern himself with what the<br />
program does with binary ry numbers. He only needs to<br />
know that there will be as many S% values per<br />
record as there are groups of fifteen keys. The user<br />
record as there are groups of fifteen keys. The user<br />
then needs to provide a decision for retrieving<br />
the records of interest to him. . The decision is<br />
written as a statement at the e beginning of a program<br />
and is immediately edited for syntax-type errors.<br />
and is immediately edited for syntax-type errors.<br />
Logical operators AND, , OR, NOT, , as well as arithmetic<br />
ones ( ~ , < >, < , > ) are used . The<br />
metic ones ( =, , ) are used. The<br />
decision can be written on one or more lines leading<br />
to a combining variable TR. TR is set to one if true,<br />
to a combining variable TR. TR is set to one if true,<br />
and all records meeting TR condition are then<br />
displayed. Complete instructions for writing TRT R lines<br />
are listed in lines 2970 to 3420.<br />
How is it done? For once those long tables of<br />
powers of two, that are a part of every book on<br />
programming, come in handy. The program is set up<br />
in such a way that the user thinks of the list of<br />
fifteen ftee n keys from left to right, 1 to 15. TheT program<br />
sees them as being numbered from right to left, 0 to 14.<br />
Like this:<br />
-Key numbers B%(k)k = ~ 1 233 4 5676 7 8 9 101112<br />
11 12<br />
13 14 15<br />
-Program sees as m ~ 1413 12 11 10 9 8765 4 3<br />
-Program sees as m = 14 13 12 11 10 9 8 7 6 5 4 3<br />
2 1 0<br />
-Input key values 100011111100000<br />
·Input key values 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0<br />
The program now takes key values and wherever it<br />
finds a ""1" it raises 2 to the m·th m-th power. The sum of<br />
all this is then stored in integer variable S S%(record<br />
number, , sum number)' number)*. . The bytes are used instead<br />
of at least 15. During the process of retrieval the<br />
of at least 15. During the process of retrieval the<br />
opposite procedure takes place - the sum is "un<br />
<br />
packed" into working storage of 15 values. The same<br />
packed " into working storage of 15 values. T he same<br />
values are, of course, reused by all records. The lines<br />
of the program that drive this system are 1470 to<br />
1510 and 1920 to 1990 the other way. It seems like<br />
1510 and 1920 to 1990 the other way. It seems like<br />
a lot of hassle, but the core saving is tremendous.<br />
The loops that do the packing and unpacking take<br />
The loops that do the packing and unpacking take<br />
from 0.2 second to 0.9 second, the latter representing<br />
all fi ftee n bits on . (These times could be reduced by<br />
all fifteen bits on. (These times could be reduced by<br />
rewriting these two loops as machine code subrou<br />
tines.) Another way to save time would be to set up<br />
tines.) Another way to save time would be to set up<br />
the most frequently used keys next to one another as<br />
this will leave the loop sooner. In the example pi ~ shown<br />
above, the program will loop ten times. Had a " 1"<br />
above, the program will loop ten times. Had a "1"<br />
been in position 4 or 11 the loop would be executed<br />
five times.<br />
The program now has two sections. One packs<br />
the data, the other unpacks it. In between, , the<br />
values should be stored on tape. And at the beginning,<br />
values should be stored on tape. And at the beginning,<br />
routines for creating and updating fil files should be<br />
provided. As listed , the program works as if it were a<br />
provided. As listed, the program works as if it were a<br />
file e system. . It can be used as a training ground in<br />
writing decision lines. It should be used as a part<br />
writing decision lines. It should be used as a part<br />
of a larger system. .