Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005

rekharaghuram
from rekharaghuram More from this publisher
05.11.2015 Views

CHAPTER 11 ■ INDEXES 443 select * from colocated a15 where x between 20000 and 40000 Rows Row Source Operation ------- --------------------------------------------------- 20001 TABLE ACCESS BY INDEX ROWID COLOCATED (cr=2899 pr=0 pw=0 time=120125... 20001 INDEX RANGE SCAN COLOCATED_PK (cr=1374 pr=0 pw=0 time=40072 us)(... select * from colocated a100 where x between 20000 and 40000 Rows Row Source Operation ------- --------------------------------------------------- 20001 TABLE ACCESS BY INDEX ROWID COLOCATED (cr=684 pr=0 pw=0 ...) 20001 INDEX RANGE SCAN COLOCATED_PK (cr=245 pr=0 pw=0 ... The first query was executed with the ARRAYSIZE of 15, and the (cr=nnnn) values in the Row ➥ Source Operation shows we performed 1,374 logical I/Os against the index and then 1,625 logical I/Os against the table (2,899–1,374; the numbers are cumulative in the Row Source Operation steps). When we increased the ARRAYSIZE to 100 from 15, the amount of logical I/O against the index dropped to 245, which was the direct result of not having to reread the index leaf blocks from the buffer cache every 15 rows, but only every 100 rows. To understand this, assume that we were able to store 200 rows per leaf block. As we are scanning through the index reading 15 rows at a time, we would have to retrieve the first leaf block 14 times to get all 200 entries off it. On the other hand, when we array fetch 100 rows at a time, we need to retrieve this same leaf block only two times from the buffer cache to exhaust all of its entries. The same thing happened in this case with the table blocks. Since the table was sorted in the same order as the index keys, we would tend to retrieve each table block less often, as we would get more of the rows from it with each fetch call. So, if this was good for the COLOCATED table, it must have been just as good for the DISORGANIZED table, right? Not so. The results from the DISORGANIZED table would look like this: select /*+ index( a15 disorganized_pk ) */ * from disorganized a15 where x between 20000 and 40000 Rows Row Source Operation ------- --------------------------------------------------- 20001 TABLE ACCESS BY INDEX ROWID DISORGANIZED (cr=21357 pr=0 pw=0 ... 20001 INDEX RANGE SCAN DISORGANIZED_PK (cr=1374 pr=0 pw=0 ... select /*+ index( a100 disorganized_pk ) */ * from disorganized a100 where x between 20000 and 40000 Rows Row Source Operation ------- --------------------------------------------------- 20001 TABLE ACCESS BY INDEX ROWID OBJ#(75652) (cr=20228 pr=0 pw=0 ... 20001 INDEX RANGE SCAN OBJ#(75653) (cr=245 pr=0 pw=0 time=20281 us)(...

444 CHAPTER 11 ■ INDEXES The results against the index here were identical, which makes sense, as the data is stored in the index is just the same regardless of how the table is organized. The logical I/O went from 1,374 for a single execution of this query to 245, just as before. But overall the amount of logical I/O performed by this query did not differ significantly: 21,357 versus 20,281. The reason? The amount of logical I/O performed against the table did not differ at all—if you subtract the logical I/O against the index from the total logical I/O performed by each query, you’ll find that both queries did 19,983 logical I/Os against the table. This is because every time we wanted N rows from the database—the odds that any two of those rows would be on the same block was very small—there was no opportunity to get multiple rows from a table block in a single call. Every professional programming language I have seen that can interact with Oracle implements this concept of array fetching. In PL/SQL, you may use BULK COLLECT or rely on the implicit array fetch of 100 that is performed for implicit cursor for loops. In Java/JDBC, there is a prefetch method on a connect or statement object. Oracle Call Interface (OCI; a C API) allows you to programmatically set the prefetch size, as does Pro*C. As you can see, this can have a material and measurable affect on the amount of logical I/O performed by your query, and it deserves your attention. Just to wrap up this example, let’s look at what happens when we full scan the DISORGANIZED table: select * from disorganized where x between 20000 and 40000 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 5 0.00 0.00 0 0 0 0 Execute 5 0.00 0.00 0 0 0 0 Fetch 6675 0.53 0.54 0 12565 0 100005 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 6685 0.53 0.54 0 12565 0 100005 Rows Row Source Operation ------- --------------------------------------------------- 20001 TABLE ACCESS FULL DISORGANIZED (cr=2513 pr=0 pw=0 time=60115 us) That shows that in this particular case, the full scan is very appropriate due to the way the data is physically stored on disk. This begs the question, “Why didn’t the optimizer full scan in the first place for this query?” Well, it would have if left to its own design, but in the first example query against DISORGANIZED I purposely hinted the query and told the optimizer to construct a plan that used the index. In the second case, I let the optimizer pick the best overall plan. The Clustering Factor Next, let’s look at some of the information Oracle will use. We are specifically going to look at the CLUSTERING_FACTOR column found in the USER_INDEXES view. The Oracle Reference manual tells us this column has the following meaning:

444<br />

CHAPTER 11 ■ INDEXES<br />

The results against the index here were identical, which makes sense, as the data is stored in the index<br />

is just the same regardless of how the table is organized. The logical I/O went from 1,374 for a single execution<br />

of this query to 245, just as before. But overall the amount of logical I/O performed by this query did not<br />

differ significantly: 21,357 versus 20,281. The reason? The amount of logical I/O performed against the table<br />

did not differ at all—if you subtract the logical I/O against the index from the total logical I/O performed by<br />

each query, you’ll find that both queries did 19,983 logical I/Os against the table. This is because every time<br />

we wanted N rows from the database—the odds that any two of those rows would be on the same block<br />

was very small—there was no opportunity to get multiple rows from a table block in a single call.<br />

Every professional programming language I have seen that can interact with <strong>Oracle</strong> implements this<br />

concept of array fetching. In PL/SQL, you may use BULK COLLECT or rely on the implicit array fetch of 100<br />

that is performed for implicit cursor for loops. In Java/JDBC, there is a prefetch method on a connect or<br />

statement object. <strong>Oracle</strong> Call Interface (OCI; a C API) allows you to programmatically set the prefetch size, as<br />

does Pro*C. As you can see, this can have a material <strong>and</strong> measurable affect on the amount of logical I/O performed<br />

by your query, <strong>and</strong> it deserves your attention.<br />

Just to wrap up this example, let’s look at what happens when we full scan the<br />

DISORGANIZED table:<br />

select * from disorganized where x between 20000 <strong>and</strong> 40000<br />

call count cpu elapsed disk query current rows<br />

------- ------ -------- ---------- ---------- ---------- ---------- ----------<br />

Parse 5 0.00 0.00 0 0 0 0<br />

Execute 5 0.00 0.00 0 0 0 0<br />

Fetch 6675 0.53 0.54 0 12565 0 100005<br />

------- ------ -------- ---------- ---------- ---------- ---------- ----------<br />

total 6685 0.53 0.54 0 12565 0 100005<br />

Rows Row Source Operation<br />

------- ---------------------------------------------------<br />

20001 TABLE ACCESS FULL DISORGANIZED (cr=2513 pr=0 pw=0 time=60115 us)<br />

That shows that in this particular case, the full scan is very appropriate due to the way<br />

the data is physically stored on disk. This begs the question, “Why didn’t the optimizer full<br />

scan in the first place for this query?” Well, it would have if left to its own design, but in the<br />

first example query against DISORGANIZED I purposely hinted the query <strong>and</strong> told the optimizer<br />

to construct a plan that used the index. In the second case, I let the optimizer pick the best<br />

overall plan.<br />

The Clustering Factor<br />

Next, let’s look at some of the information <strong>Oracle</strong> will use. We are specifically going to look at<br />

the CLUSTERING_FACTOR column found in the USER_INDEXES view. The <strong>Oracle</strong> Reference manual<br />

tells us this column has the following meaning:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!