Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005
CHAPTER 14 ■ PARALLEL EXECUTION 639 we can see that as we load more and more data into the table UNIFORM_TEST using parallel direct path operations, the space utilization gets worse over time. We would want to use a significantly smaller uniform extent size or use the AUTOALLOCATE. The AUTOALLOCATE may well generate more extents over time, but the space utilization is superior due to the extent trimming that takes place. Parallel Recovery Another form of parallel execution in Oracle is the ability to perform parallel recovery. Parallel recovery may be performed at the instance level, perhaps by increasing the speed of a recovery that needs to be performed after a software, operating system, or general system failure. Parallel recovery may also be applied during media recovery (e.g., restoration from backups). It is not my goal to cover recovery-related topics in this book, so I’ll just mention the existence of parallel recovery in passing. I recommend the following Oracle manuals for further reading on the topic: • Oracle Backup and Recovery Basics for information regarding parallel media recovery • Oracle Performance Tuning Guide for information regarding parallel instance recovery Procedural Parallelism I would like to discuss two types of procedural parallelism: • Parallel pipelined functions, which is a feature of Oracle. • “Do-it-yourself (DIY) parallelism,” which is the application to your own applications of the same techniques that Oracle applies to parallel full table scans. DIY parallelism is more of a development technique than anything built into Oracle directly. Many times you’ll find that applications—typically batch processes—designed to execute serially will look something like the following procedure: Create procedure process_data As Begin For x in ( select * from some_table ) Perform complex process on X Update some other table, or insert the record somewhere else End loop end In this case, Oracle’s parallel query or PDML won’t help a bit (in fact, parallel execution of the SQL by Oracle here would likely only cause the database to consume more resources and take longer). If Oracle were to execute the simple SELECT * FROM SOME_TABLE in parallel, it would provide this algorithm no apparent increase in speed whatsoever. If Oracle were to perform in parallel the UPDATE or INSERT after the complex process, it would have no positive affect (it is a single-row UPDATE/INSERT, after all).
640 CHAPTER 14 ■ PARALLEL EXECUTION There is one obvious thing you could do here: use array processing for the UPDATE/INSERT after the complex process. However, that isn’t going to give you a 50 percent reduction or more in runtime, and often that is what you are looking for. Don’t get me wrong, you definitely want to implement array processing for the modifications here, but it won’t make this process run two, three, four, or more times faster. Now, suppose this process runs at night on a machine with four CPUs, and it is the only activity taking place. You have observed that only one CPU is partially used on this system, and the disk system is not being used very much at all. Further, this process is taking hours, and every day it takes a little longer as more data is added. You need to reduce the runtime by many times—it needs to run four or eight times faster—so incremental percentage increases will not be sufficient. What can you do? There are two approaches you can take. One approach is to implement a parallel pipelined function, whereby Oracle will decide on appropriate degrees of parallelism (assuming you have opted for that, which is recommended). Oracle will create the sessions, coordinate them, and run them, very much like the previous example with parallel DDL where, by using CREATE TABLE AS SELECT OR INSERT /*+APPEND*/, Oracle fully automated parallel direct path loads for us. The other approach is DIY parallelism. We’ll take a look at both approaches in the sections that follow. Parallel Pipelined Functions We’d like to take that very serial process PROCESS_DATA from earlier and have Oracle execute it in parallel for us. To accomplish this, we need to turn the routine “inside out.” Instead of selecting rows from some table, processing them, and inserting them into another table, we will insert into another table the results of fetching some rows and processing them. We will remove the INSERT at the bottom of that loop and replace it in the code with a PIPE ROW clause. The PIPE ROW clause allows our PL/SQL routine to generate table data as its output, so we’ll be able to SELECT from our PL/SQL process. The PL/SQL routine that used to procedurally process the data becomes a table, in effect, and the rows we fetch and process are the outputs. We’ve seen this many times throughout this book every time we’ve issued the following: Select * from table(dbms_xplan.display); That is a PL/SQL routine that reads the PLAN_TABLE; restructures the output, even to the extent of adding rows; and then outputs this data using PIPE ROW to send it back to the client. We’re going to do the same thing here in effect, but we’ll allow for it to be processed in parallel. We’re going to use two tables in this example: T1 and T2. T1 is the table we were reading previously, and T2 is the table we need to move this information into. Assume this is some sort of ETL process we run to take the transactional data from the day and convert it into reporting information for tomorrow. The two tables we’ll use are as follows: ops$tkyte-ORA10G> create table t1 2 as 3 select object_id id, object_name text 4 from all_objects; Table created. ops$tkyte-ORA10G> begin 2 dbms_stats.set_table_stats
- Page 634 and 635: CHAPTER 13 ■ PARTITIONING 589 12
- Page 636 and 637: CHAPTER 13 ■ PARTITIONING 591 ops
- Page 638 and 639: CHAPTER 13 ■ PARTITIONING 593 •
- Page 640 and 641: CHAPTER 13 ■ PARTITIONING 595 Now
- Page 642 and 643: CHAPTER 13 ■ PARTITIONING 597 the
- Page 644 and 645: CHAPTER 13 ■ PARTITIONING 599 imp
- Page 646 and 647: CHAPTER 13 ■ PARTITIONING 601 OLT
- Page 648 and 649: CHAPTER 13 ■ PARTITIONING 603 5 s
- Page 650 and 651: CHAPTER 13 ■ PARTITIONING 605 Sur
- Page 652 and 653: CHAPTER 13 ■ PARTITIONING 607 On
- Page 654 and 655: CHAPTER 13 ■ PARTITIONING 609 Row
- Page 656 and 657: CHAPTER 13 ■ PARTITIONING 611 So,
- Page 658 and 659: CHAPTER 13 ■ PARTITIONING 613 Aud
- Page 660 and 661: CHAPTER 14 ■ ■ ■ Parallel Exe
- Page 662 and 663: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 664 and 665: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 666 and 667: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 668 and 669: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 670 and 671: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 672 and 673: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 674 and 675: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 676 and 677: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 678 and 679: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 680 and 681: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 682 and 683: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 686 and 687: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 688 and 689: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 690 and 691: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 692 and 693: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 694 and 695: CHAPTER 15 ■ ■ ■ Data Loading
- Page 696 and 697: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 698 and 699: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 700 and 701: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 702 and 703: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 704 and 705: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 706 and 707: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 708 and 709: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 710 and 711: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 712 and 713: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 714 and 715: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 716 and 717: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 718 and 719: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 720 and 721: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 722 and 723: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 724 and 725: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 726 and 727: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 728 and 729: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 730 and 731: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 732 and 733: CHAPTER 15 ■ DATA LOADING AND UNL
640<br />
CHAPTER 14 ■ PARALLEL EXECUTION<br />
There is one obvious thing you could do here: use array processing for the UPDATE/INSERT<br />
after the complex process. However, that isn’t going to give you a 50 percent reduction or more<br />
in runtime, <strong>and</strong> often that is what you are looking for. Don’t get me wrong, you definitely want<br />
to implement array processing for the modifications here, but it won’t make this process run<br />
two, three, four, or more times faster.<br />
Now, suppose this process runs at night on a machine with four CPUs, <strong>and</strong> it is the only<br />
activity taking place. You have observed that only one CPU is partially used on this system,<br />
<strong>and</strong> the disk system is not being used very much at all. Further, this process is taking hours,<br />
<strong>and</strong> every day it takes a little longer as more data is added. You need to reduce the runtime by<br />
many times—it needs to run four or eight times faster—so incremental percentage increases<br />
will not be sufficient. What can you do?<br />
There are two approaches you can take. One approach is to implement a parallel pipelined<br />
function, whereby <strong>Oracle</strong> will decide on appropriate degrees of parallelism (assuming<br />
you have opted for that, which is recommended). <strong>Oracle</strong> will create the sessions, coordinate<br />
them, <strong>and</strong> run them, very much like the previous example with parallel DDL where, by using<br />
CREATE TABLE AS SELECT OR INSERT /*+APPEND*/, <strong>Oracle</strong> fully automated parallel direct path<br />
loads for us. The other approach is DIY parallelism. We’ll take a look at both approaches in the<br />
sections that follow.<br />
Parallel Pipelined Functions<br />
We’d like to take that very serial process PROCESS_DATA from earlier <strong>and</strong> have <strong>Oracle</strong> execute<br />
it in parallel for us. To accomplish this, we need to turn the routine “inside out.” Instead of<br />
selecting rows from some table, processing them, <strong>and</strong> inserting them into another table, we<br />
will insert into another table the results of fetching some rows <strong>and</strong> processing them. We will<br />
remove the INSERT at the bottom of that loop <strong>and</strong> replace it in the code with a PIPE ROW clause.<br />
The PIPE ROW clause allows our PL/SQL routine to generate table data as its output, so we’ll<br />
be able to SELECT from our PL/SQL process. The PL/SQL routine that used to procedurally<br />
process the data becomes a table, in effect, <strong>and</strong> the rows we fetch <strong>and</strong> process are the outputs.<br />
We’ve seen this many times throughout this book every time we’ve issued the following:<br />
Select * from table(dbms_xplan.display);<br />
That is a PL/SQL routine that reads the PLAN_TABLE; restructures the output, even to the<br />
extent of adding rows; <strong>and</strong> then outputs this data using PIPE ROW to send it back to the client.<br />
We’re going to do the same thing here in effect, but we’ll allow for it to be processed in parallel.<br />
We’re going to use two tables in this example: T1 <strong>and</strong> T2. T1 is the table we were reading<br />
previously, <strong>and</strong> T2 is the table we need to move this information into. Assume this is some sort<br />
of ETL process we run to take the transactional data from the day <strong>and</strong> convert it into reporting<br />
information for tomorrow. The two tables we’ll use are as follows:<br />
ops$tkyte-ORA10G> create table t1<br />
2 as<br />
3 select object_id id, object_name text<br />
4 from all_objects;<br />
Table created.<br />
ops$tkyte-ORA10G> begin<br />
2 dbms_stats.set_table_stats