Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005
CHAPTER 11 ■ INDEXES 439 an 8KB blocksize, we will find about 100 rows per block. That means the table has approximately 1,000 blocks. From here, the math is very easy. We are going to read 20,000 rows via the index; this will mean, quite likely 20,000 TABLE ACCESS BY ROWID operations. We will process 20,000 table blocks to execute this query. There are only about 1,000 blocks in the entire table, however! We would end up reading and processing each block in the table on average 20 times. Even if we increased the size of the row by an order of magnitude to 800 bytes per row, and 10 rows per block, we now have 10,000 blocks in the table. Index accesses for 20,000 rows would cause us to still read each block on average two times. In this case, a full table scan will be much more efficient than using an index, as it has to touch each block only once. Any query that used this index to access the data would not be very efficient until it accesses on average less than 5 percent of the data for the 800-byte column (then we access about 5,000 blocks) and even less for the 80-byte column (about 0.5 percent or less). Physical Organization How the data is organized physically on disk deeply impacts these calculations, as it materially affects how expensive (or inexpensive) index access will be. Suppose you have a table where the rows have a primary key populated by a sequence. As data is added to the table, rows with sequential sequence numbers might be in general “next” to each other. ■Note The use of features such as ASSM or multiple freelist/freelist groups will affect how the data is organized on disk. Those features tend to spread the data out, and this natural clustering by primary key may not be observed. The table is naturally clustered in order by the primary key (since the data is added in more or less that order). It will not be strictly clustered in order by the key, of course (we would have to use an IOT to achieve that), but in general rows with primary keys that are close in value will be “close” together in physical proximity. Now when you issue the query select * from T where primary_key between :x and :y the rows you want are typically located on the same blocks. In this case, an index range scan may be useful even if it accesses a large percentage of rows, simply because the database blocks that we need to read and reread will most likely be cached, since the data is co-located. On the other hand, if the rows are not co-located, using that same index may be disastrous for performance. A small demonstration will drive this fact home. We’ll start with a table that is pretty much ordered by its primary key: ops$tkyte@ORA10G> create table colocated ( x int, y varchar2(80) ); Table created. ops$tkyte@ORA10G> begin 2 for i in 1 .. 100000 3 loop 4 insert into colocated(x,y) 5 values (i, rpad(dbms_random.random,75,'*') );
440 CHAPTER 11 ■ INDEXES 6 end loop; 7 end; 8 / PL/SQL procedure successfully completed. ops$tkyte@ORA10G> alter table colocated 2 add constraint colocated_pk 3 primary key(x); Table altered. ops$tkyte@ORA10G> begin 2 dbms_stats.gather_table_stats( user, 'COLOCATED', cascade=>true ); 3 end; 4 / PL/SQL procedure successfully completed. This table fits the description we laid out earlier with about 100 rows/block in an 8KB database. In this table, there is a very good chance that the rows with X=1, 2, 3 are on the same block. Now, we’ll take this table and purposely “disorganize” it. In the COLOCATED table, we created the Y column with a leading random number, and we’ll use that fact to “disorganize” the data so that it will definitely not be ordered by primary key anymore: ops$tkyte@ORA10G> create table disorganized 2 as 3 select x,y 4 from colocated 5 order by y; Table created. ops$tkyte@ORA10G> alter table disorganized 2 add constraint disorganized_pk 3 primary key (x); Table altered. ops$tkyte@ORA10G> begin 2 dbms_stats.gather_table_stats( user, 'DISORGANIZED', cascade=>true ); 3 end; 4 / PL/SQL procedure successfully completed. Arguably, these are the same tables—it is a relational database, so physical organization has no bearing on the answers returned (at least that’s what they teach in theoretical database courses). In fact, the performance characteristics of these two tables are as different as night and day, while the answers returned are identical. Given the same exact question, using the same exact query plans, and reviewing the TKPROF (SQL trace) output, we see the following:
- Page 433 and 434: 388 CHAPTER 10 ■ DATABASE TABLES
- Page 435 and 436: 390 CHAPTER 10 ■ DATABASE TABLES
- Page 437 and 438: 392 CHAPTER 10 ■ DATABASE TABLES
- Page 439 and 440: 394 CHAPTER 10 ■ DATABASE TABLES
- Page 441 and 442: 396 CHAPTER 10 ■ DATABASE TABLES
- Page 443 and 444: 398 CHAPTER 10 ■ DATABASE TABLES
- Page 445 and 446: 400 CHAPTER 10 ■ DATABASE TABLES
- Page 447 and 448: 402 CHAPTER 10 ■ DATABASE TABLES
- Page 449 and 450: 404 CHAPTER 10 ■ DATABASE TABLES
- Page 451 and 452: 406 CHAPTER 10 ■ DATABASE TABLES
- Page 453 and 454: 408 CHAPTER 10 ■ DATABASE TABLES
- Page 455 and 456: 410 CHAPTER 10 ■ DATABASE TABLES
- Page 457 and 458: 412 CHAPTER 10 ■ DATABASE TABLES
- Page 459 and 460: 414 CHAPTER 10 ■ DATABASE TABLES
- Page 461 and 462: 416 CHAPTER 10 ■ DATABASE TABLES
- Page 463 and 464: 418 CHAPTER 10 ■ DATABASE TABLES
- Page 466 and 467: CHAPTER 11 ■ ■ ■ Indexes Inde
- Page 468 and 469: CHAPTER 11 ■ INDEXES 423 value of
- Page 470 and 471: CHAPTER 11 ■ INDEXES 425 One of t
- Page 472 and 473: CHAPTER 11 ■ INDEXES 427 We then
- Page 474 and 475: CHAPTER 11 ■ INDEXES 429 we ended
- Page 476 and 477: CHAPTER 11 ■ INDEXES 431 The data
- Page 478 and 479: CHAPTER 11 ■ INDEXES 433 if ( (++
- Page 480 and 481: CHAPTER 11 ■ INDEXES 435 Table 11
- Page 482 and 483: CHAPTER 11 ■ INDEXES 437 When Sho
- Page 486 and 487: CHAPTER 11 ■ INDEXES 441 select *
- Page 488 and 489: CHAPTER 11 ■ INDEXES 443 select *
- Page 490 and 491: CHAPTER 11 ■ INDEXES 445 Indicate
- Page 492 and 493: CHAPTER 11 ■ INDEXES 447 an index
- Page 494 and 495: CHAPTER 11 ■ INDEXES 449 Table 11
- Page 496 and 497: CHAPTER 11 ■ INDEXES 451 9 1, 'M'
- Page 498 and 499: CHAPTER 11 ■ INDEXES 453 column w
- Page 500 and 501: CHAPTER 11 ■ INDEXES 455 Bitmap j
- Page 502 and 503: CHAPTER 11 ■ INDEXES 457 INSERT a
- Page 504 and 505: CHAPTER 11 ■ INDEXES 459 7 l_last
- Page 506 and 507: CHAPTER 11 ■ INDEXES 461 ops$tkyt
- Page 508 and 509: CHAPTER 11 ■ INDEXES 463 If we co
- Page 510 and 511: CHAPTER 11 ■ INDEXES 465 ops$tkyt
- Page 512 and 513: CHAPTER 11 ■ INDEXES 467 Caveat o
- Page 514 and 515: CHAPTER 11 ■ INDEXES 469 ops$tkyt
- Page 516 and 517: CHAPTER 11 ■ INDEXES 471 Frequent
- Page 518 and 519: CHAPTER 11 ■ INDEXES 473 select *
- Page 520 and 521: CHAPTER 11 ■ INDEXES 475 If you s
- Page 522 and 523: CHAPTER 11 ■ INDEXES 477 we’ll
- Page 524 and 525: CHAPTER 11 ■ INDEXES 479 Predicat
- Page 526 and 527: CHAPTER 11 ■ INDEXES 481 ops$tkyt
- Page 528 and 529: CHAPTER 11 ■ INDEXES 483 ops$tkyt
- Page 530 and 531: CHAPTER 11 ■ INDEXES 485 This dem
- Page 532 and 533: CHAPTER 11 ■ INDEXES 487 SELECT /
CHAPTER 11 ■ INDEXES 439<br />
an 8KB blocksize, we will find about 100 rows per block. That means the table has approximately<br />
1,000 blocks. From here, the math is very easy. We are going to read 20,000 rows via the<br />
index; this will mean, quite likely 20,000 TABLE ACCESS BY ROWID operations. We will process<br />
20,000 table blocks to execute this query. There are only about 1,000 blocks in the entire table,<br />
however! We would end up reading <strong>and</strong> processing each block in the table on average 20<br />
times. Even if we increased the size of the row by an order of magnitude to 800 bytes per row,<br />
<strong>and</strong> 10 rows per block, we now have 10,000 blocks in the table. Index accesses for 20,000 rows<br />
would cause us to still read each block on average two times. In this case, a full table scan will<br />
be much more efficient than using an index, as it has to touch each block only once. Any<br />
query that used this index to access the data would not be very efficient until it accesses on<br />
average less than 5 percent of the data for the 800-byte column (then we access about 5,000<br />
blocks) <strong>and</strong> even less for the 80-byte column (about 0.5 percent or less).<br />
Physical Organization<br />
How the data is organized physically on disk deeply impacts these calculations, as it materially<br />
affects how expensive (or inexpensive) index access will be. Suppose you have a table where<br />
the rows have a primary key populated by a sequence. As data is added to the table, rows with<br />
sequential sequence numbers might be in general “next” to each other.<br />
■Note The use of features such as ASSM or multiple freelist/freelist groups will affect how the data is<br />
organized on disk. Those features tend to spread the data out, <strong>and</strong> this natural clustering by primary key<br />
may not be observed.<br />
The table is naturally clustered in order by the primary key (since the data is added in<br />
more or less that order). It will not be strictly clustered in order by the key, of course (we would<br />
have to use an IOT to achieve that), but in general rows with primary keys that are close in<br />
value will be “close” together in physical proximity. Now when you issue the query<br />
select * from T where primary_key between :x <strong>and</strong> :y<br />
the rows you want are typically located on the same blocks. In this case, an index range scan<br />
may be useful even if it accesses a large percentage of rows, simply because the database<br />
blocks that we need to read <strong>and</strong> reread will most likely be cached, since the data is co-located.<br />
On the other h<strong>and</strong>, if the rows are not co-located, using that same index may be disastrous for<br />
performance. A small demonstration will drive this fact home. We’ll start with a table that is<br />
pretty much ordered by its primary key:<br />
ops$tkyte@ORA10G> create table colocated ( x int, y varchar2(80) );<br />
Table created.<br />
ops$tkyte@ORA10G> begin<br />
2 for i in 1 .. 100000<br />
3 loop<br />
4 insert into colocated(x,y)<br />
5 values (i, rpad(dbms_r<strong>and</strong>om.r<strong>and</strong>om,75,'*') );