20.07.2013 Views

Beginning SQL

Beginning SQL

Beginning SQL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

On the other hand, an index on a true/false field is not so useful, again depending on the data in the<br />

field. If the distribution is 50/50, then you return half of the records. If the distribution is 5% true and<br />

95% false, and you are searching for true, then the index might help. The problem in a case like this is<br />

that the DBMS may incorrectly assume that the index is going to return so many records that it simply<br />

ignores the index when it performs the query optimize phase. If that happens, then you have a query<br />

that takes up space on the disk (and has to be updated on inserts or changes) but is never used.<br />

I have read that in cases where the distribution is 50/50, it is better just to leave the column nonindexed<br />

and use a table scan to find the data. I find this difficult to believe, at least on large tables, since the disk<br />

I/O to search through an index and pull out all of the pointers to the records on the disk would seem to<br />

be much less than the disk I/O to search through every record. Remember that any index column has<br />

just the column data plus the pointer to the record, and therefore an index tends to be very compact relative<br />

to the table itself and thus can stay in the cache longer.<br />

If you are using this index by itself and you are pulling a 50/50 mix, then that may very well be true<br />

since you have to perform the disk I/O anyway to retrieve all those records. However, if you are using<br />

the index in conjunction with other indexes to narrow down and pull only a small set of records off the<br />

disk, then having the index saves the work of actually looking in each data record to see whether it<br />

matches the search criteria.<br />

Indexes in general tend to be used often, and thus they stay in the cache. In addition, the actual data<br />

pulled from using indexes tends to be cached better since only those rows actually used get pulled into<br />

the cache. Since you aren’t pulling huge blocks of data, you aren’t inadvertently flushing the cache of<br />

useful data. Indexed reads only examine the exact pieces of the bigger data block containing the data<br />

records, reducing the amount of CPU time spent examining the data. And finally, indexed reads can<br />

scale up as the table size grows, giving stable and consistent results for data retrieval, even when the<br />

table itself grows huge.<br />

Table Scans — What Are They?<br />

Database Tuning<br />

As the name implies, a table scan is where the DBMS has to physically retrieve every record in the table<br />

to examine one or more columns to see whether the data in that column matches a join or WHERE condition.<br />

A table scan is what happens when no index exists on a column that has to be compared to something<br />

else in a query. In the absence of an index, the DBMS scans the table, looking at the data item in<br />

that field in every record in the table.<br />

Data is usually organized in sectors, with multiple grouped sectors called extents. In <strong>SQL</strong> Server, for<br />

example, it is common for the database to use 8K blocks (the maximum size of a data record) with eight<br />

of these blocks read or written at one time by the database, resulting in 64K of data being read or written<br />

at once. See Figure 13-5 for an illustration of a table scan.<br />

361

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!