Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005

rekharaghuram
from rekharaghuram More from this publisher
05.11.2015 Views

CHAPTER 11 ■ INDEXES 439 an 8KB blocksize, we will find about 100 rows per block. That means the table has approximately 1,000 blocks. From here, the math is very easy. We are going to read 20,000 rows via the index; this will mean, quite likely 20,000 TABLE ACCESS BY ROWID operations. We will process 20,000 table blocks to execute this query. There are only about 1,000 blocks in the entire table, however! We would end up reading and processing each block in the table on average 20 times. Even if we increased the size of the row by an order of magnitude to 800 bytes per row, and 10 rows per block, we now have 10,000 blocks in the table. Index accesses for 20,000 rows would cause us to still read each block on average two times. In this case, a full table scan will be much more efficient than using an index, as it has to touch each block only once. Any query that used this index to access the data would not be very efficient until it accesses on average less than 5 percent of the data for the 800-byte column (then we access about 5,000 blocks) and even less for the 80-byte column (about 0.5 percent or less). Physical Organization How the data is organized physically on disk deeply impacts these calculations, as it materially affects how expensive (or inexpensive) index access will be. Suppose you have a table where the rows have a primary key populated by a sequence. As data is added to the table, rows with sequential sequence numbers might be in general “next” to each other. ■Note The use of features such as ASSM or multiple freelist/freelist groups will affect how the data is organized on disk. Those features tend to spread the data out, and this natural clustering by primary key may not be observed. The table is naturally clustered in order by the primary key (since the data is added in more or less that order). It will not be strictly clustered in order by the key, of course (we would have to use an IOT to achieve that), but in general rows with primary keys that are close in value will be “close” together in physical proximity. Now when you issue the query select * from T where primary_key between :x and :y the rows you want are typically located on the same blocks. In this case, an index range scan may be useful even if it accesses a large percentage of rows, simply because the database blocks that we need to read and reread will most likely be cached, since the data is co-located. On the other hand, if the rows are not co-located, using that same index may be disastrous for performance. A small demonstration will drive this fact home. We’ll start with a table that is pretty much ordered by its primary key: ops$tkyte@ORA10G> create table colocated ( x int, y varchar2(80) ); Table created. ops$tkyte@ORA10G> begin 2 for i in 1 .. 100000 3 loop 4 insert into colocated(x,y) 5 values (i, rpad(dbms_random.random,75,'*') );

440 CHAPTER 11 ■ INDEXES 6 end loop; 7 end; 8 / PL/SQL procedure successfully completed. ops$tkyte@ORA10G> alter table colocated 2 add constraint colocated_pk 3 primary key(x); Table altered. ops$tkyte@ORA10G> begin 2 dbms_stats.gather_table_stats( user, 'COLOCATED', cascade=>true ); 3 end; 4 / PL/SQL procedure successfully completed. This table fits the description we laid out earlier with about 100 rows/block in an 8KB database. In this table, there is a very good chance that the rows with X=1, 2, 3 are on the same block. Now, we’ll take this table and purposely “disorganize” it. In the COLOCATED table, we created the Y column with a leading random number, and we’ll use that fact to “disorganize” the data so that it will definitely not be ordered by primary key anymore: ops$tkyte@ORA10G> create table disorganized 2 as 3 select x,y 4 from colocated 5 order by y; Table created. ops$tkyte@ORA10G> alter table disorganized 2 add constraint disorganized_pk 3 primary key (x); Table altered. ops$tkyte@ORA10G> begin 2 dbms_stats.gather_table_stats( user, 'DISORGANIZED', cascade=>true ); 3 end; 4 / PL/SQL procedure successfully completed. Arguably, these are the same tables—it is a relational database, so physical organization has no bearing on the answers returned (at least that’s what they teach in theoretical database courses). In fact, the performance characteristics of these two tables are as different as night and day, while the answers returned are identical. Given the same exact question, using the same exact query plans, and reviewing the TKPROF (SQL trace) output, we see the following:

CHAPTER 11 ■ INDEXES 439<br />

an 8KB blocksize, we will find about 100 rows per block. That means the table has approximately<br />

1,000 blocks. From here, the math is very easy. We are going to read 20,000 rows via the<br />

index; this will mean, quite likely 20,000 TABLE ACCESS BY ROWID operations. We will process<br />

20,000 table blocks to execute this query. There are only about 1,000 blocks in the entire table,<br />

however! We would end up reading <strong>and</strong> processing each block in the table on average 20<br />

times. Even if we increased the size of the row by an order of magnitude to 800 bytes per row,<br />

<strong>and</strong> 10 rows per block, we now have 10,000 blocks in the table. Index accesses for 20,000 rows<br />

would cause us to still read each block on average two times. In this case, a full table scan will<br />

be much more efficient than using an index, as it has to touch each block only once. Any<br />

query that used this index to access the data would not be very efficient until it accesses on<br />

average less than 5 percent of the data for the 800-byte column (then we access about 5,000<br />

blocks) <strong>and</strong> even less for the 80-byte column (about 0.5 percent or less).<br />

Physical Organization<br />

How the data is organized physically on disk deeply impacts these calculations, as it materially<br />

affects how expensive (or inexpensive) index access will be. Suppose you have a table where<br />

the rows have a primary key populated by a sequence. As data is added to the table, rows with<br />

sequential sequence numbers might be in general “next” to each other.<br />

■Note The use of features such as ASSM or multiple freelist/freelist groups will affect how the data is<br />

organized on disk. Those features tend to spread the data out, <strong>and</strong> this natural clustering by primary key<br />

may not be observed.<br />

The table is naturally clustered in order by the primary key (since the data is added in<br />

more or less that order). It will not be strictly clustered in order by the key, of course (we would<br />

have to use an IOT to achieve that), but in general rows with primary keys that are close in<br />

value will be “close” together in physical proximity. Now when you issue the query<br />

select * from T where primary_key between :x <strong>and</strong> :y<br />

the rows you want are typically located on the same blocks. In this case, an index range scan<br />

may be useful even if it accesses a large percentage of rows, simply because the database<br />

blocks that we need to read <strong>and</strong> reread will most likely be cached, since the data is co-located.<br />

On the other h<strong>and</strong>, if the rows are not co-located, using that same index may be disastrous for<br />

performance. A small demonstration will drive this fact home. We’ll start with a table that is<br />

pretty much ordered by its primary key:<br />

ops$tkyte@ORA10G> create table colocated ( x int, y varchar2(80) );<br />

Table created.<br />

ops$tkyte@ORA10G> begin<br />

2 for i in 1 .. 100000<br />

3 loop<br />

4 insert into colocated(x,y)<br />

5 values (i, rpad(dbms_r<strong>and</strong>om.r<strong>and</strong>om,75,'*') );

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!