Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005

rekharaghuram
from rekharaghuram More from this publisher
05.11.2015 Views

CHAPTER 14 ■ PARALLEL EXECUTION 643 8 / 48250 rows created. ops$tkyte-ORA10G> commit; Commit complete. Just to see what happened here, we can query the newly inserted data out and group by SESSION_ID to see first how many parallel execution servers were used, and second how many rows each processed: ops$tkyte-ORA10G> select session_id, count(*) 2 from t2 3 group by session_id; SESSION_ID COUNT(*) ---------- ---------- 241 8040 246 8045 253 8042 254 8042 258 8040 260 8041 6 rows selected. Apparently, we used six parallel execution servers for the SELECT component of this parallel operation, and each one processed about 8,040 records each. As you can see, Oracle parallelized our process, but we underwent a fairly radical rewrite of our process. This is a long way from the original implementation. So, while Oracle can process our routine in parallel, we may well not have any routines that are coded to be parallelized. If a rather large rewrite of your procedure is not feasible, you may be interested in the next implementation: DIY parallelism. Do-It-Yourself Parallelism Say we have that same process as in the preceding section: the serial, simple procedure. We cannot afford a rather extensive rewrite of the implementation, but we would like to execute it in parallel. What can we do? My approach many times has been to use rowid ranges to break up the table into some number of ranges that don’t overlap (yet completely cover the table). This is very similar to how Oracle performs a parallel query conceptually. If you think of a full table scan, Oracle processes that by coming up with some method to break the table into many “small” tables, each of which is processed by a parallel execution server. We are going to do the same thing using rowid ranges. In early releases, Oracle’s parallel implementation actually used rowid ranges itself. We’ll use a BIG_TABLE of 1,000,000 rows, as this technique works best on big tables with lots of extents, and the method I use for creating rowid ranges depends on extent boundaries. The more extents used, the better the data distribution. So, after creating the BIG_TABLE with 1,000,000 rows, we’ll create T2 like this:

644 CHAPTER 14 ■ PARALLEL EXECUTION big_table-ORA10G> create table t2 2 as 3 select object_id id, object_name text, 0 session_id 4 from big_table 5 where 1=0; Table created. We are going to use the job queues built into the database to parallel process our procedure. We will schedule some number of jobs. Each job is our procedure slightly modified to just process the rows in a given rowid range. ■Note In Oracle 10g, you could use the scheduler for something this simple, but in order to make the example 9i compatible, we’ll use the job queues here. To efficiently support the job queues, we’ll use a parameter table to pass inputs to our jobs: big_table-ORA10G> create table job_parms 2 ( job number primary key, 3 lo_rid rowid, 4 hi_rid rowid 5 ) 6 / Table created. This will allow us to just pass the job ID into our procedure, so it can query this table to get the rowid range it is to process. Now for our procedure. The code in bold is the new code we’ll be adding: big_table-ORA10G> create or replace 2 procedure serial( p_job in number ) 3 is 4 l_rec job_parms%rowtype; 5 begin 6 select * into l_rec 7 from job_parms 8 where job = p_job; 9 10 for x in ( select object_id id, object_name text 11 from big_table 12 where rowid between l_rec.lo_rid 13 and l_rec.hi_rid ) 14 loop 15 -- complex process here 16 insert into t2 (id, text, session_id ) 17 values ( x.id, x.text, p_job ); 18 end loop;

CHAPTER 14 ■ PARALLEL EXECUTION 643<br />

8 /<br />

48250 rows created.<br />

ops$tkyte-ORA10G> commit;<br />

Commit complete.<br />

Just to see what happened here, we can query the newly inserted data out <strong>and</strong> group by<br />

SESSION_ID to see first how many parallel execution servers were used, <strong>and</strong> second how many<br />

rows each processed:<br />

ops$tkyte-ORA10G> select session_id, count(*)<br />

2 from t2<br />

3 group by session_id;<br />

SESSION_ID COUNT(*)<br />

---------- ----------<br />

241 8040<br />

246 8045<br />

253 8042<br />

254 8042<br />

258 8040<br />

260 8041<br />

6 rows selected.<br />

Apparently, we used six parallel execution servers for the SELECT component of this parallel<br />

operation, <strong>and</strong> each one processed about 8,040 records each.<br />

As you can see, <strong>Oracle</strong> parallelized our process, but we underwent a fairly radical rewrite<br />

of our process. This is a long way from the original implementation. So, while <strong>Oracle</strong> can<br />

process our routine in parallel, we may well not have any routines that are coded to be parallelized.<br />

If a rather large rewrite of your procedure is not feasible, you may be interested in the<br />

next implementation: DIY parallelism.<br />

Do-It-Yourself Parallelism<br />

Say we have that same process as in the preceding section: the serial, simple procedure. We<br />

cannot afford a rather extensive rewrite of the implementation, but we would like to execute it<br />

in parallel. What can we do? My approach many times has been to use rowid ranges to break<br />

up the table into some number of ranges that don’t overlap (yet completely cover the table).<br />

This is very similar to how <strong>Oracle</strong> performs a parallel query conceptually. If you think of a<br />

full table scan, <strong>Oracle</strong> processes that by coming up with some method to break the table into<br />

many “small” tables, each of which is processed by a parallel execution server. We are going to<br />

do the same thing using rowid ranges. In early releases, <strong>Oracle</strong>’s parallel implementation actually<br />

used rowid ranges itself.<br />

We’ll use a BIG_TABLE of 1,000,000 rows, as this technique works best on big tables with<br />

lots of extents, <strong>and</strong> the method I use for creating rowid ranges depends on extent boundaries.<br />

The more extents used, the better the data distribution. So, after creating the BIG_TABLE with<br />

1,000,000 rows, we’ll create T2 like this:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!