Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005
CHAPTER 11 ■ INDEXES 463 If we compare the two examples (unindexed versus indexed), we find that the insert was affected by a little more than doubling the runtime. However, the select went from over a second to effectively “instantly.” The important things to note here are the following: • The insertion of 9,999 records took approximately two times longer. Indexing a userwritten function will necessarily affect the performance of inserts and some updates. You should realize that any index will impact performance, of course. For example, I did a simple test without the MY_SOUNDEX function, just indexing the ENAME column itself. That caused the INSERT to take about one second to execute—the PL/SQL function is not responsible for the entire overhead. Since most applications insert and update singleton entries, and each row took less than 1/10,000 of a second to insert, you probably won’t even notice this in a typical application. Since we insert a row only once, we pay the price of executing the function on the column once, not the thousands of times we query the data. • While the insert ran two times slower, the query ran many times faster. It evaluated the MY_SOUNDEX function a few times instead of almost 20,000 times. The difference in performance of our query here is measurable and quite large. Also, as the size of our table grows, the full scan query will take longer and longer to execute. The index-based query will always execute with nearly the same performance characteristics as the table gets larger. • We had to use SUBSTR in our query. This is not as nice as just coding WHERE MY_ SOUNDEX(ename)=MY_SOUNDEX( 'King' ), but we can easily get around that, as we will see shortly. So, the insert was affected, but the query ran incredibly fast. The payoff for a small reduction in insert/update performance is huge. Additionally, if you never update the columns involved in the MY_SOUNDEX function call, the updates are not penalized at all (MY_SOUNDEX is invoked only if the ENAME column is modified and its value changed). Now let’s see how to make it so the query does not have use the SUBSTR function call. The use of the SUBSTR call could be error-prone—our end users have to know to SUBSTR from 1 for six characters. If they use a different size, the index will not be used. Also, we want to control in the server the number of bytes to index. This will allow us to reimplement the MY_SOUNDEX function later with 7 bytes instead of 6 if we want to. We can hide the SUBSTR with a view quite easily as follows: ops$tkyte@ORA10G> create or replace view emp_v 2 as 3 select ename, substr(my_soundex(ename),1,6) ename_soundex, hiredate 4 from emp 5 / View created. ops$tkyte@ORA10G> exec stats.cnt := 0; PL/SQL procedure successfully completed. ops$tkyte@ORA10G> set timing on ops$tkyte@ORA10G> select ename, hiredate
464 CHAPTER 11 ■ INDEXES 2 from emp_v 3 where ename_soundex = my_soundex('Kings') 4 / ENAME HIREDATE ---------- --------- Ku$_Chunk_ 10-AUG-04 Ku$_Chunk_ 10-AUG-04 Elapsed: 00:00:00.03 ops$tkyte@ORA10G> set timing off ops$tkyte@ORA10G> exec dbms_output.put_line( stats.cnt ) 2 PL/SQL procedure successfully completed. We see the same sort of query plan we did with the base table. All we have done here is hidden the SUBSTR( F(X), 1, 6 ) in the view itself. The optimizer still recognizes that this virtual column is, in fact, the indexed column and does the “right thing.” We see the same performance improvement and the same query plan. Using this view is as good as using the base table—better even because it hides the complexity and allows us to change the size of the SUBSTR later. Indexing Only Some of the Rows In addition to transparently helping out queries that use built-in functions like UPPER, LOWER, and so on, function-based indexes can be used to selectively index only some of the rows in a table. As we’ll discuss a little later, B*Tree indexes do not contain entries for entirely NULL keys. That is, if you have an index I on a table T: Create index I on t(a,b); and you have a row where A and B are both NULL, there will be no entry in the index structure. This comes in handy when you are indexing just some of the rows in a table. Consider a large table with a NOT NULL column called PROCESSED_FLAG that may take one of two values, Y or N, with a default value of N. New rows are added with a value of N to signify not processed, and as they are processed, they are updated to Y to signify processed. We would like to index this column to be able to retrieve the N records rapidly, but there are millions of rows and almost all of them are going to have a value of Y. The resulting B*Tree index will be large, and the cost of maintaining it as we update from N to Y will be high. This table sounds like a candidate for a bitmap index (this is low cardinality, after all!), but this is a transactional system and lots of people will be inserting records at the same time with the processed column set to N and, as we discussed earlier, bitmaps are not good for concurrent modifications. When we factor in the constant updating of N to Y in this table as well, then bitmaps would be out of the question, as this process would serialize entirely. So, what we would really like is to index only the records of interest (the N records). We’ll see how to do this with function-based indexes, but before we do, let’s see what happens if we just use a regular index. Using the standard BIG_TABLE script described in the setup, we’ll update the TEMPORARY column, flipping the Ys to Ns and the Ns to Ys:
- Page 457 and 458: 412 CHAPTER 10 ■ DATABASE TABLES
- Page 459 and 460: 414 CHAPTER 10 ■ DATABASE TABLES
- Page 461 and 462: 416 CHAPTER 10 ■ DATABASE TABLES
- Page 463 and 464: 418 CHAPTER 10 ■ DATABASE TABLES
- Page 466 and 467: CHAPTER 11 ■ ■ ■ Indexes Inde
- Page 468 and 469: CHAPTER 11 ■ INDEXES 423 value of
- Page 470 and 471: CHAPTER 11 ■ INDEXES 425 One of t
- Page 472 and 473: CHAPTER 11 ■ INDEXES 427 We then
- Page 474 and 475: CHAPTER 11 ■ INDEXES 429 we ended
- Page 476 and 477: CHAPTER 11 ■ INDEXES 431 The data
- Page 478 and 479: CHAPTER 11 ■ INDEXES 433 if ( (++
- Page 480 and 481: CHAPTER 11 ■ INDEXES 435 Table 11
- Page 482 and 483: CHAPTER 11 ■ INDEXES 437 When Sho
- Page 484 and 485: CHAPTER 11 ■ INDEXES 439 an 8KB b
- Page 486 and 487: CHAPTER 11 ■ INDEXES 441 select *
- Page 488 and 489: CHAPTER 11 ■ INDEXES 443 select *
- Page 490 and 491: CHAPTER 11 ■ INDEXES 445 Indicate
- Page 492 and 493: CHAPTER 11 ■ INDEXES 447 an index
- Page 494 and 495: CHAPTER 11 ■ INDEXES 449 Table 11
- Page 496 and 497: CHAPTER 11 ■ INDEXES 451 9 1, 'M'
- Page 498 and 499: CHAPTER 11 ■ INDEXES 453 column w
- Page 500 and 501: CHAPTER 11 ■ INDEXES 455 Bitmap j
- Page 502 and 503: CHAPTER 11 ■ INDEXES 457 INSERT a
- Page 504 and 505: CHAPTER 11 ■ INDEXES 459 7 l_last
- Page 506 and 507: CHAPTER 11 ■ INDEXES 461 ops$tkyt
- Page 510 and 511: CHAPTER 11 ■ INDEXES 465 ops$tkyt
- Page 512 and 513: CHAPTER 11 ■ INDEXES 467 Caveat o
- Page 514 and 515: CHAPTER 11 ■ INDEXES 469 ops$tkyt
- Page 516 and 517: CHAPTER 11 ■ INDEXES 471 Frequent
- Page 518 and 519: CHAPTER 11 ■ INDEXES 473 select *
- Page 520 and 521: CHAPTER 11 ■ INDEXES 475 If you s
- Page 522 and 523: CHAPTER 11 ■ INDEXES 477 we’ll
- Page 524 and 525: CHAPTER 11 ■ INDEXES 479 Predicat
- Page 526 and 527: CHAPTER 11 ■ INDEXES 481 ops$tkyt
- Page 528 and 529: CHAPTER 11 ■ INDEXES 483 ops$tkyt
- Page 530 and 531: CHAPTER 11 ■ INDEXES 485 This dem
- Page 532 and 533: CHAPTER 11 ■ INDEXES 487 SELECT /
- Page 534 and 535: CHAPTER 12 ■ ■ ■ Datatypes Ch
- Page 536 and 537: CHAPTER 12 ■ DATATYPES 491 • TI
- Page 538 and 539: CHAPTER 12 ■ DATATYPES 493 (in th
- Page 540 and 541: CHAPTER 12 ■ DATATYPES 495 That d
- Page 542 and 543: CHAPTER 12 ■ DATATYPES 497 ops$tk
- Page 544 and 545: CHAPTER 12 ■ DATATYPES 499 Table
- Page 546 and 547: CHAPTER 12 ■ DATATYPES 501 The IN
- Page 548 and 549: CHAPTER 12 ■ DATATYPES 503 ops$tk
- Page 550 and 551: CHAPTER 12 ■ DATATYPES 505 • BI
- Page 552 and 553: CHAPTER 12 ■ DATATYPES 507 NUMBER
- Page 554 and 555: CHAPTER 12 ■ DATATYPES 509 MSG NU
- Page 556 and 557: CHAPTER 12 ■ DATATYPES 511 They a
CHAPTER 11 ■ INDEXES 463<br />
If we compare the two examples (unindexed versus indexed), we find that the insert was<br />
affected by a little more than doubling the runtime. However, the select went from over a second<br />
to effectively “instantly.” The important things to note here are the following:<br />
• The insertion of 9,999 records took approximately two times longer. Indexing a userwritten<br />
function will necessarily affect the performance of inserts <strong>and</strong> some updates.<br />
You should realize that any index will impact performance, of course. For example, I did<br />
a simple test without the MY_SOUNDEX function, just indexing the ENAME column itself.<br />
That caused the INSERT to take about one second to execute—the PL/SQL function is<br />
not responsible for the entire overhead. Since most applications insert <strong>and</strong> update singleton<br />
entries, <strong>and</strong> each row took less than 1/10,000 of a second to insert, you probably<br />
won’t even notice this in a typical application. Since we insert a row only once, we pay<br />
the price of executing the function on the column once, not the thous<strong>and</strong>s of times we<br />
query the data.<br />
• While the insert ran two times slower, the query ran many times faster. It evaluated the<br />
MY_SOUNDEX function a few times instead of almost 20,000 times. The difference in performance<br />
of our query here is measurable <strong>and</strong> quite large. Also, as the size of our table<br />
grows, the full scan query will take longer <strong>and</strong> longer to execute. The index-based query<br />
will always execute with nearly the same performance characteristics as the table gets<br />
larger.<br />
• We had to use SUBSTR in our query. This is not as nice as just coding WHERE MY_<br />
SOUNDEX(ename)=MY_SOUNDEX( 'King' ), but we can easily get around that, as we will<br />
see shortly.<br />
So, the insert was affected, but the query ran incredibly fast. The payoff for a small reduction<br />
in insert/update performance is huge. Additionally, if you never update the columns<br />
involved in the MY_SOUNDEX function call, the updates are not penalized at all (MY_SOUNDEX is<br />
invoked only if the ENAME column is modified <strong>and</strong> its value changed).<br />
Now let’s see how to make it so the query does not have use the SUBSTR function call. The<br />
use of the SUBSTR call could be error-prone—our end users have to know to SUBSTR from 1 for<br />
six characters. If they use a different size, the index will not be used. Also, we want to control<br />
in the server the number of bytes to index. This will allow us to reimplement the MY_SOUNDEX<br />
function later with 7 bytes instead of 6 if we want to. We can hide the SUBSTR with a view quite<br />
easily as follows:<br />
ops$tkyte@ORA10G> create or replace view emp_v<br />
2 as<br />
3 select ename, substr(my_soundex(ename),1,6) ename_soundex, hiredate<br />
4 from emp<br />
5 /<br />
View created.<br />
ops$tkyte@ORA10G> exec stats.cnt := 0;<br />
PL/SQL procedure successfully completed.<br />
ops$tkyte@ORA10G> set timing on<br />
ops$tkyte@ORA10G> select ename, hiredate