Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005

rekharaghuram
from rekharaghuram More from this publisher
05.11.2015 Views

CHAPTER 11 ■ INDEXES 463 If we compare the two examples (unindexed versus indexed), we find that the insert was affected by a little more than doubling the runtime. However, the select went from over a second to effectively “instantly.” The important things to note here are the following: • The insertion of 9,999 records took approximately two times longer. Indexing a userwritten function will necessarily affect the performance of inserts and some updates. You should realize that any index will impact performance, of course. For example, I did a simple test without the MY_SOUNDEX function, just indexing the ENAME column itself. That caused the INSERT to take about one second to execute—the PL/SQL function is not responsible for the entire overhead. Since most applications insert and update singleton entries, and each row took less than 1/10,000 of a second to insert, you probably won’t even notice this in a typical application. Since we insert a row only once, we pay the price of executing the function on the column once, not the thousands of times we query the data. • While the insert ran two times slower, the query ran many times faster. It evaluated the MY_SOUNDEX function a few times instead of almost 20,000 times. The difference in performance of our query here is measurable and quite large. Also, as the size of our table grows, the full scan query will take longer and longer to execute. The index-based query will always execute with nearly the same performance characteristics as the table gets larger. • We had to use SUBSTR in our query. This is not as nice as just coding WHERE MY_ SOUNDEX(ename)=MY_SOUNDEX( 'King' ), but we can easily get around that, as we will see shortly. So, the insert was affected, but the query ran incredibly fast. The payoff for a small reduction in insert/update performance is huge. Additionally, if you never update the columns involved in the MY_SOUNDEX function call, the updates are not penalized at all (MY_SOUNDEX is invoked only if the ENAME column is modified and its value changed). Now let’s see how to make it so the query does not have use the SUBSTR function call. The use of the SUBSTR call could be error-prone—our end users have to know to SUBSTR from 1 for six characters. If they use a different size, the index will not be used. Also, we want to control in the server the number of bytes to index. This will allow us to reimplement the MY_SOUNDEX function later with 7 bytes instead of 6 if we want to. We can hide the SUBSTR with a view quite easily as follows: ops$tkyte@ORA10G> create or replace view emp_v 2 as 3 select ename, substr(my_soundex(ename),1,6) ename_soundex, hiredate 4 from emp 5 / View created. ops$tkyte@ORA10G> exec stats.cnt := 0; PL/SQL procedure successfully completed. ops$tkyte@ORA10G> set timing on ops$tkyte@ORA10G> select ename, hiredate

464 CHAPTER 11 ■ INDEXES 2 from emp_v 3 where ename_soundex = my_soundex('Kings') 4 / ENAME HIREDATE ---------- --------- Ku$_Chunk_ 10-AUG-04 Ku$_Chunk_ 10-AUG-04 Elapsed: 00:00:00.03 ops$tkyte@ORA10G> set timing off ops$tkyte@ORA10G> exec dbms_output.put_line( stats.cnt ) 2 PL/SQL procedure successfully completed. We see the same sort of query plan we did with the base table. All we have done here is hidden the SUBSTR( F(X), 1, 6 ) in the view itself. The optimizer still recognizes that this virtual column is, in fact, the indexed column and does the “right thing.” We see the same performance improvement and the same query plan. Using this view is as good as using the base table—better even because it hides the complexity and allows us to change the size of the SUBSTR later. Indexing Only Some of the Rows In addition to transparently helping out queries that use built-in functions like UPPER, LOWER, and so on, function-based indexes can be used to selectively index only some of the rows in a table. As we’ll discuss a little later, B*Tree indexes do not contain entries for entirely NULL keys. That is, if you have an index I on a table T: Create index I on t(a,b); and you have a row where A and B are both NULL, there will be no entry in the index structure. This comes in handy when you are indexing just some of the rows in a table. Consider a large table with a NOT NULL column called PROCESSED_FLAG that may take one of two values, Y or N, with a default value of N. New rows are added with a value of N to signify not processed, and as they are processed, they are updated to Y to signify processed. We would like to index this column to be able to retrieve the N records rapidly, but there are millions of rows and almost all of them are going to have a value of Y. The resulting B*Tree index will be large, and the cost of maintaining it as we update from N to Y will be high. This table sounds like a candidate for a bitmap index (this is low cardinality, after all!), but this is a transactional system and lots of people will be inserting records at the same time with the processed column set to N and, as we discussed earlier, bitmaps are not good for concurrent modifications. When we factor in the constant updating of N to Y in this table as well, then bitmaps would be out of the question, as this process would serialize entirely. So, what we would really like is to index only the records of interest (the N records). We’ll see how to do this with function-based indexes, but before we do, let’s see what happens if we just use a regular index. Using the standard BIG_TABLE script described in the setup, we’ll update the TEMPORARY column, flipping the Ys to Ns and the Ns to Ys:

CHAPTER 11 ■ INDEXES 463<br />

If we compare the two examples (unindexed versus indexed), we find that the insert was<br />

affected by a little more than doubling the runtime. However, the select went from over a second<br />

to effectively “instantly.” The important things to note here are the following:<br />

• The insertion of 9,999 records took approximately two times longer. Indexing a userwritten<br />

function will necessarily affect the performance of inserts <strong>and</strong> some updates.<br />

You should realize that any index will impact performance, of course. For example, I did<br />

a simple test without the MY_SOUNDEX function, just indexing the ENAME column itself.<br />

That caused the INSERT to take about one second to execute—the PL/SQL function is<br />

not responsible for the entire overhead. Since most applications insert <strong>and</strong> update singleton<br />

entries, <strong>and</strong> each row took less than 1/10,000 of a second to insert, you probably<br />

won’t even notice this in a typical application. Since we insert a row only once, we pay<br />

the price of executing the function on the column once, not the thous<strong>and</strong>s of times we<br />

query the data.<br />

• While the insert ran two times slower, the query ran many times faster. It evaluated the<br />

MY_SOUNDEX function a few times instead of almost 20,000 times. The difference in performance<br />

of our query here is measurable <strong>and</strong> quite large. Also, as the size of our table<br />

grows, the full scan query will take longer <strong>and</strong> longer to execute. The index-based query<br />

will always execute with nearly the same performance characteristics as the table gets<br />

larger.<br />

• We had to use SUBSTR in our query. This is not as nice as just coding WHERE MY_<br />

SOUNDEX(ename)=MY_SOUNDEX( 'King' ), but we can easily get around that, as we will<br />

see shortly.<br />

So, the insert was affected, but the query ran incredibly fast. The payoff for a small reduction<br />

in insert/update performance is huge. Additionally, if you never update the columns<br />

involved in the MY_SOUNDEX function call, the updates are not penalized at all (MY_SOUNDEX is<br />

invoked only if the ENAME column is modified <strong>and</strong> its value changed).<br />

Now let’s see how to make it so the query does not have use the SUBSTR function call. The<br />

use of the SUBSTR call could be error-prone—our end users have to know to SUBSTR from 1 for<br />

six characters. If they use a different size, the index will not be used. Also, we want to control<br />

in the server the number of bytes to index. This will allow us to reimplement the MY_SOUNDEX<br />

function later with 7 bytes instead of 6 if we want to. We can hide the SUBSTR with a view quite<br />

easily as follows:<br />

ops$tkyte@ORA10G> create or replace view emp_v<br />

2 as<br />

3 select ename, substr(my_soundex(ename),1,6) ename_soundex, hiredate<br />

4 from emp<br />

5 /<br />

View created.<br />

ops$tkyte@ORA10G> exec stats.cnt := 0;<br />

PL/SQL procedure successfully completed.<br />

ops$tkyte@ORA10G> set timing on<br />

ops$tkyte@ORA10G> select ename, hiredate

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!