05.11.2015 Views

Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 10 ■ DATABASE TABLES 355<br />

What is the point of an IOT? You might ask the converse, actually: what is the point of a<br />

heap organized table? Since all tables in a relational database are supposed to have a primary<br />

key anyway, isn’t a heap organized table just a waste of space? We have to make room for both<br />

the table <strong>and</strong> the index on the primary key of the table when using a heap organized table.<br />

With an IOT, the space overhead of the primary key index is removed, as the index is the data<br />

<strong>and</strong> the data is the index. The fact is that an index is a complex data structure that requires a<br />

lot of work to manage <strong>and</strong> maintain, <strong>and</strong> the maintenance requirements increase as the width<br />

of the row to store increases. A heap, on the other h<strong>and</strong>, is trivial to manage by comparison.<br />

There are efficiencies in a heap organized table over an IOT. That said, IOTs have some definite<br />

advantages over their heap counterparts. For example, I remember once building an inverted<br />

list index on some textual data (this predated the introduction of interMedia <strong>and</strong> related technologies).<br />

I had a table full of documents, <strong>and</strong> I would parse the documents <strong>and</strong> find words<br />

within them. I had a table that then looked like this:<br />

create table keywords<br />

( word varchar2(50),<br />

position int,<br />

doc_id int,<br />

primary key(word,position,doc_id)<br />

);<br />

Here I had a table that consisted solely of columns of the primary key. I had over 100 percent<br />

overhead; the size of my table <strong>and</strong> primary key index were comparable (actually, the<br />

primary key index was larger since it physically stored the rowid of the row it pointed to,<br />

whereas a rowid is not stored in the table—it is inferred). I only used this table with a WHERE<br />

clause on the WORD or WORD <strong>and</strong> POSITION columns. That is, I never used the table—I used only<br />

the index on the table. The table itself was no more than overhead. I wanted to find all documents<br />

containing a given word (or “near” another word, <strong>and</strong> so on). The heap table was<br />

useless, <strong>and</strong> it just slowed down the application during maintenance of the KEYWORDS table<br />

<strong>and</strong> doubled the storage requirements. This is a perfect application for an IOT.<br />

Another implementation that begs for an IOT is a code lookup table. Here you might have<br />

ZIP_CODE to STATE lookup, for example. You can now do away with the heap table <strong>and</strong> just use<br />

an IOT itself. Anytime you have a table that you access via its primary key exclusively, it is a<br />

c<strong>and</strong>idate for an IOT.<br />

When you want to enforce co-location of data or you want data to be physically stored in<br />

a specific order, the IOT is the structure for you. For users of Sybase <strong>and</strong> SQL Server, this is<br />

where you would have used a clustered index, but IOTs go one better. A clustered index in<br />

those databases may have up to a 110 percent overhead (similar to the previous KEYWORDS<br />

table example). Here, we have a 0 percent overhead since the data is stored only once. A classic<br />

example of when you might want this physically co-located data would be in a parent/child<br />

relationship. Let’s say the EMP table had a child table containing addresses. You might have a<br />

home address entered into the system when the employee is initially sent an offer letter for a<br />

job, <strong>and</strong> later he adds his work address. Over time, he moves <strong>and</strong> changes the home address<br />

to a previous address <strong>and</strong> adds a new home address. Then he has a school address he added<br />

when he went back for a degree, <strong>and</strong> so on. That is, the employee has three or four (or more)<br />

detail records, but these details arrive r<strong>and</strong>omly over time. In a normal heap-based table, they<br />

just go “anywhere.” The odds that two or more of the address records would be on the same<br />

database block in the heap table are very near zero. However, when you query an employee’s

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!