Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005

rekharaghuram
from rekharaghuram More from this publisher
05.11.2015 Views

CHAPTER 14 ■ PARALLEL EXECUTION 639 we can see that as we load more and more data into the table UNIFORM_TEST using parallel direct path operations, the space utilization gets worse over time. We would want to use a significantly smaller uniform extent size or use the AUTOALLOCATE. The AUTOALLOCATE may well generate more extents over time, but the space utilization is superior due to the extent trimming that takes place. Parallel Recovery Another form of parallel execution in Oracle is the ability to perform parallel recovery. Parallel recovery may be performed at the instance level, perhaps by increasing the speed of a recovery that needs to be performed after a software, operating system, or general system failure. Parallel recovery may also be applied during media recovery (e.g., restoration from backups). It is not my goal to cover recovery-related topics in this book, so I’ll just mention the existence of parallel recovery in passing. I recommend the following Oracle manuals for further reading on the topic: • Oracle Backup and Recovery Basics for information regarding parallel media recovery • Oracle Performance Tuning Guide for information regarding parallel instance recovery Procedural Parallelism I would like to discuss two types of procedural parallelism: • Parallel pipelined functions, which is a feature of Oracle. • “Do-it-yourself (DIY) parallelism,” which is the application to your own applications of the same techniques that Oracle applies to parallel full table scans. DIY parallelism is more of a development technique than anything built into Oracle directly. Many times you’ll find that applications—typically batch processes—designed to execute serially will look something like the following procedure: Create procedure process_data As Begin For x in ( select * from some_table ) Perform complex process on X Update some other table, or insert the record somewhere else End loop end In this case, Oracle’s parallel query or PDML won’t help a bit (in fact, parallel execution of the SQL by Oracle here would likely only cause the database to consume more resources and take longer). If Oracle were to execute the simple SELECT * FROM SOME_TABLE in parallel, it would provide this algorithm no apparent increase in speed whatsoever. If Oracle were to perform in parallel the UPDATE or INSERT after the complex process, it would have no positive affect (it is a single-row UPDATE/INSERT, after all).

640 CHAPTER 14 ■ PARALLEL EXECUTION There is one obvious thing you could do here: use array processing for the UPDATE/INSERT after the complex process. However, that isn’t going to give you a 50 percent reduction or more in runtime, and often that is what you are looking for. Don’t get me wrong, you definitely want to implement array processing for the modifications here, but it won’t make this process run two, three, four, or more times faster. Now, suppose this process runs at night on a machine with four CPUs, and it is the only activity taking place. You have observed that only one CPU is partially used on this system, and the disk system is not being used very much at all. Further, this process is taking hours, and every day it takes a little longer as more data is added. You need to reduce the runtime by many times—it needs to run four or eight times faster—so incremental percentage increases will not be sufficient. What can you do? There are two approaches you can take. One approach is to implement a parallel pipelined function, whereby Oracle will decide on appropriate degrees of parallelism (assuming you have opted for that, which is recommended). Oracle will create the sessions, coordinate them, and run them, very much like the previous example with parallel DDL where, by using CREATE TABLE AS SELECT OR INSERT /*+APPEND*/, Oracle fully automated parallel direct path loads for us. The other approach is DIY parallelism. We’ll take a look at both approaches in the sections that follow. Parallel Pipelined Functions We’d like to take that very serial process PROCESS_DATA from earlier and have Oracle execute it in parallel for us. To accomplish this, we need to turn the routine “inside out.” Instead of selecting rows from some table, processing them, and inserting them into another table, we will insert into another table the results of fetching some rows and processing them. We will remove the INSERT at the bottom of that loop and replace it in the code with a PIPE ROW clause. The PIPE ROW clause allows our PL/SQL routine to generate table data as its output, so we’ll be able to SELECT from our PL/SQL process. The PL/SQL routine that used to procedurally process the data becomes a table, in effect, and the rows we fetch and process are the outputs. We’ve seen this many times throughout this book every time we’ve issued the following: Select * from table(dbms_xplan.display); That is a PL/SQL routine that reads the PLAN_TABLE; restructures the output, even to the extent of adding rows; and then outputs this data using PIPE ROW to send it back to the client. We’re going to do the same thing here in effect, but we’ll allow for it to be processed in parallel. We’re going to use two tables in this example: T1 and T2. T1 is the table we were reading previously, and T2 is the table we need to move this information into. Assume this is some sort of ETL process we run to take the transactional data from the day and convert it into reporting information for tomorrow. The two tables we’ll use are as follows: ops$tkyte-ORA10G> create table t1 2 as 3 select object_id id, object_name text 4 from all_objects; Table created. ops$tkyte-ORA10G> begin 2 dbms_stats.set_table_stats

640<br />

CHAPTER 14 ■ PARALLEL EXECUTION<br />

There is one obvious thing you could do here: use array processing for the UPDATE/INSERT<br />

after the complex process. However, that isn’t going to give you a 50 percent reduction or more<br />

in runtime, <strong>and</strong> often that is what you are looking for. Don’t get me wrong, you definitely want<br />

to implement array processing for the modifications here, but it won’t make this process run<br />

two, three, four, or more times faster.<br />

Now, suppose this process runs at night on a machine with four CPUs, <strong>and</strong> it is the only<br />

activity taking place. You have observed that only one CPU is partially used on this system,<br />

<strong>and</strong> the disk system is not being used very much at all. Further, this process is taking hours,<br />

<strong>and</strong> every day it takes a little longer as more data is added. You need to reduce the runtime by<br />

many times—it needs to run four or eight times faster—so incremental percentage increases<br />

will not be sufficient. What can you do?<br />

There are two approaches you can take. One approach is to implement a parallel pipelined<br />

function, whereby <strong>Oracle</strong> will decide on appropriate degrees of parallelism (assuming<br />

you have opted for that, which is recommended). <strong>Oracle</strong> will create the sessions, coordinate<br />

them, <strong>and</strong> run them, very much like the previous example with parallel DDL where, by using<br />

CREATE TABLE AS SELECT OR INSERT /*+APPEND*/, <strong>Oracle</strong> fully automated parallel direct path<br />

loads for us. The other approach is DIY parallelism. We’ll take a look at both approaches in the<br />

sections that follow.<br />

Parallel Pipelined Functions<br />

We’d like to take that very serial process PROCESS_DATA from earlier <strong>and</strong> have <strong>Oracle</strong> execute<br />

it in parallel for us. To accomplish this, we need to turn the routine “inside out.” Instead of<br />

selecting rows from some table, processing them, <strong>and</strong> inserting them into another table, we<br />

will insert into another table the results of fetching some rows <strong>and</strong> processing them. We will<br />

remove the INSERT at the bottom of that loop <strong>and</strong> replace it in the code with a PIPE ROW clause.<br />

The PIPE ROW clause allows our PL/SQL routine to generate table data as its output, so we’ll<br />

be able to SELECT from our PL/SQL process. The PL/SQL routine that used to procedurally<br />

process the data becomes a table, in effect, <strong>and</strong> the rows we fetch <strong>and</strong> process are the outputs.<br />

We’ve seen this many times throughout this book every time we’ve issued the following:<br />

Select * from table(dbms_xplan.display);<br />

That is a PL/SQL routine that reads the PLAN_TABLE; restructures the output, even to the<br />

extent of adding rows; <strong>and</strong> then outputs this data using PIPE ROW to send it back to the client.<br />

We’re going to do the same thing here in effect, but we’ll allow for it to be processed in parallel.<br />

We’re going to use two tables in this example: T1 <strong>and</strong> T2. T1 is the table we were reading<br />

previously, <strong>and</strong> T2 is the table we need to move this information into. Assume this is some sort<br />

of ETL process we run to take the transactional data from the day <strong>and</strong> convert it into reporting<br />

information for tomorrow. The two tables we’ll use are as follows:<br />

ops$tkyte-ORA10G> create table t1<br />

2 as<br />

3 select object_id id, object_name text<br />

4 from all_objects;<br />

Table created.<br />

ops$tkyte-ORA10G> begin<br />

2 dbms_stats.set_table_stats

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!