Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005
CHAPTER 14 ■ PARALLEL EXECUTION 617 So, before applying parallel execution, you need the following two things to be true: • You must have a very large task, such as the full scan of 50GB of data. • You must have sufficient available resources. Before parallel full scanning 50GB of data, you would want to make sure that there is sufficient free CPU (to accommodate the parallel processes) as well as sufficient I/O. The 50GB should be spread over more than one physical disk to allow for many concurrent read requests to happen simultaneously, there should be sufficient I/O channels from the disk to the computer to retrieve the data from disk in parallel, and so on. If you have a small task, as generally typified by the queries carried out in an OLTP system, or you have insufficient available resources, again as is typical in an OLTP system where CPU and I/O resources are often already used to their maximum, then parallel execution is not something you’ll want to consider. A Parallel Processing Analogy I often use an analogy to describe parallel processing and why you need both a large task and sufficient free resources in the database. It goes like this: suppose you have two tasks to complete. The first is to write a one-page summary of a new product. The other is to write a ten-chapter comprehensive report, with each chapter being very much independent of the others. For example, consider this book. This chapter, “Parallel Execution,” is very much separate and distinct from the chapter titled “Redo and Undo”—they did not have to be written sequentially. How do you approach each task? Which one do you think would benefit from parallel processing? One-Page Summary In this analogy, the one-page summary you have been assigned is not a large task. You would either do it yourself or assign it to a single individual. Why? Because the amount of work required to “parallelize” this process would exceed the work needed just to write the paper yourself. You would have to sit down, figure out that there should be 12 paragraphs, determine that each paragraph is not dependent on the other paragraphs, hold a team meeting, pick 12 individuals, explain to them the problem and assign each person a paragraph, act as the coordinator and collect all of their paragraphs, sequence them into the right order, verify they are correct, and then print the report. This is all likely to take longer than it would to just write the paper yourself, serially. The overhead of managing a large group of people on a project of this scale will far outweigh any gains to be had from having the 12 paragraphs written in parallel. The exact same principle applies to parallel execution in the database. If you have a job that takes seconds or less to complete serially, then the introduction of parallel execution and its associated managerial overhead will likely make the entire thing take longer. Ten-Chapter Report Now let’s examine the second task. If you want that ten-chapter report fast—as fast as possible—the slowest way to accomplish it would be to assign all of the work to a single individual (trust me, I know—look at this book! Some days I wished there were 15 of me working on it).
618 CHAPTER 14 ■ PARALLEL EXECUTION Here you would hold the meeting, review the process, assign the work, act as the coordinator, collect the results, bind up the finished report, and deliver it. It would not have been done in one-tenth the time, but perhaps one-eighth or so. Again, I say this with the proviso that you have sufficient free resources. If you have a large staff that is currently not actually doing anything, then splitting up the work makes complete sense. However, consider that as the manager, your staff is multitasking and they have a lot on their plates. In that case, you have to be careful with that big project. You need to be sure not to overwhelm your staff; you don’t want to work them beyond the point of exhaustion. You can’t delegate out more work than your resources (your people) can cope with, otherwise they’ll quit. If your staff is already fully utilized, adding more work will cause all schedules to slip and all projects to be delayed. Parallel execution in Oracle is very much the same. If you have a task that takes many minutes, hours, or days, then the introduction of parallel execution may be the thing that makes it run eight times faster. But then again, if you are already seriously low on resources (the overworked team of people), then the introduction of parallel execution would be something to avoid, as the system will become even more bogged down. While the Oracle server processes won’t “quit” in protest, they could start running out of RAM and failing, or just suffer from such long waits for I/O or CPU as to make it appear as if they were doing no work whatsoever. If you keep that in mind, remembering never to take an analogy to illogical extremes, you’ll have the commonsense guiding rule to see if parallelism can be of some use. If you have a job that takes seconds, it is doubtful that parallel execution can be used to make it go faster—the converse would be more likely. If you are low on resources already (i.e., your resources are fully utilized), adding parallel execution would likely make things worse, not better. Parallel execution is excellent for when you have a really big job and plenty of excess capacity. In this chapter, we’ll take a look at some of the ways we can exploit those resources. Parallel Query Parallel query allows a single SQL SELECT statement to be divided into many smaller queries, with each component query being run concurrently, and then the results from each combined to provide the final answer. For example, consider the following query: big_table@ORA10G> select count(status) from big_table; Using parallel query, this query could use some number of parallel sessions; break the BIG_TABLE into small, nonoverlapping slices; and ask each parallel session to read the table and count its section of rows. The parallel query coordinator for this session would then receive each of the aggregated counts from the individual parallel sessions and further aggregate them, returning the final answer to the client application. Graphically, it might look like Figure 14-1. The P000, P001, P002, and P003 processes are known as parallel execution servers, sometimes also referred to as parallel query (PQ) slaves. Each of these parallel execution servers is a separate session connected as if it were a dedicated server process. Each one is responsible for scanning a nonoverlapping region of BIG_TABLE, aggregating their results subsets, and sending back their output to the coordinating server—the original session’s server process—which will aggregate the subresults into the final answer.
- Page 612 and 613: CHAPTER 13 ■ PARTITIONING 567 Tab
- Page 614 and 615: CHAPTER 13 ■ PARTITIONING 569 tha
- Page 616 and 617: CHAPTER 13 ■ PARTITIONING 571 PAR
- Page 618 and 619: CHAPTER 13 ■ PARTITIONING 573 35
- Page 620 and 621: CHAPTER 13 ■ PARTITIONING 575 If
- Page 622 and 623: CHAPTER 13 ■ PARTITIONING 577 We
- Page 624 and 625: CHAPTER 13 ■ PARTITIONING 579 14
- Page 626 and 627: CHAPTER 13 ■ PARTITIONING 581 ops
- Page 628 and 629: CHAPTER 13 ■ PARTITIONING 583 In
- Page 630 and 631: CHAPTER 13 ■ PARTITIONING 585 ops
- Page 632 and 633: CHAPTER 13 ■ PARTITIONING 587 | S
- Page 634 and 635: CHAPTER 13 ■ PARTITIONING 589 12
- Page 636 and 637: CHAPTER 13 ■ PARTITIONING 591 ops
- Page 638 and 639: CHAPTER 13 ■ PARTITIONING 593 •
- Page 640 and 641: CHAPTER 13 ■ PARTITIONING 595 Now
- Page 642 and 643: CHAPTER 13 ■ PARTITIONING 597 the
- Page 644 and 645: CHAPTER 13 ■ PARTITIONING 599 imp
- Page 646 and 647: CHAPTER 13 ■ PARTITIONING 601 OLT
- Page 648 and 649: CHAPTER 13 ■ PARTITIONING 603 5 s
- Page 650 and 651: CHAPTER 13 ■ PARTITIONING 605 Sur
- Page 652 and 653: CHAPTER 13 ■ PARTITIONING 607 On
- Page 654 and 655: CHAPTER 13 ■ PARTITIONING 609 Row
- Page 656 and 657: CHAPTER 13 ■ PARTITIONING 611 So,
- Page 658 and 659: CHAPTER 13 ■ PARTITIONING 613 Aud
- Page 660 and 661: CHAPTER 14 ■ ■ ■ Parallel Exe
- Page 664 and 665: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 666 and 667: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 668 and 669: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 670 and 671: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 672 and 673: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 674 and 675: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 676 and 677: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 678 and 679: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 680 and 681: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 682 and 683: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 684 and 685: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 686 and 687: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 688 and 689: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 690 and 691: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 692 and 693: CHAPTER 14 ■ PARALLEL EXECUTION 6
- Page 694 and 695: CHAPTER 15 ■ ■ ■ Data Loading
- Page 696 and 697: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 698 and 699: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 700 and 701: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 702 and 703: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 704 and 705: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 706 and 707: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 708 and 709: CHAPTER 15 ■ DATA LOADING AND UNL
- Page 710 and 711: CHAPTER 15 ■ DATA LOADING AND UNL
CHAPTER 14 ■ PARALLEL EXECUTION 617<br />
So, before applying parallel execution, you need the following two things to be true:<br />
• You must have a very large task, such as the full scan of 50GB of data.<br />
• You must have sufficient available resources. Before parallel full scanning 50GB of data,<br />
you would want to make sure that there is sufficient free CPU (to accommodate the<br />
parallel processes) as well as sufficient I/O. The 50GB should be spread over more<br />
than one physical disk to allow for many concurrent read requests to happen simultaneously,<br />
there should be sufficient I/O channels from the disk to the computer to<br />
retrieve the data from disk in parallel, <strong>and</strong> so on.<br />
If you have a small task, as generally typified by the queries carried out in an OLTP system,<br />
or you have insufficient available resources, again as is typical in an OLTP system where<br />
CPU <strong>and</strong> I/O resources are often already used to their maximum, then parallel execution is<br />
not something you’ll want to consider.<br />
A Parallel Processing Analogy<br />
I often use an analogy to describe parallel processing <strong>and</strong> why you need both a large task<br />
<strong>and</strong> sufficient free resources in the database. It goes like this: suppose you have two tasks to<br />
complete. The first is to write a one-page summary of a new product. The other is to write a<br />
ten-chapter comprehensive report, with each chapter being very much independent of the<br />
others. For example, consider this book. This chapter, “Parallel Execution,” is very much separate<br />
<strong>and</strong> distinct from the chapter titled “Redo <strong>and</strong> Undo”—they did not have to be written<br />
sequentially.<br />
How do you approach each task? Which one do you think would benefit from parallel<br />
processing?<br />
One-Page Summary<br />
In this analogy, the one-page summary you have been assigned is not a large task. You would<br />
either do it yourself or assign it to a single individual. Why? Because the amount of work<br />
required to “parallelize” this process would exceed the work needed just to write the paper<br />
yourself. You would have to sit down, figure out that there should be 12 paragraphs, determine<br />
that each paragraph is not dependent on the other paragraphs, hold a team meeting, pick 12<br />
individuals, explain to them the problem <strong>and</strong> assign each person a paragraph, act as the coordinator<br />
<strong>and</strong> collect all of their paragraphs, sequence them into the right order, verify they are<br />
correct, <strong>and</strong> then print the report. This is all likely to take longer than it would to just write the<br />
paper yourself, serially. The overhead of managing a large group of people on a project of this<br />
scale will far outweigh any gains to be had from having the 12 paragraphs written in parallel.<br />
The exact same principle applies to parallel execution in the database. If you have a job<br />
that takes seconds or less to complete serially, then the introduction of parallel execution <strong>and</strong><br />
its associated managerial overhead will likely make the entire thing take longer.<br />
Ten-Chapter Report<br />
Now let’s examine the second task. If you want that ten-chapter report fast—as fast as possible—the<br />
slowest way to accomplish it would be to assign all of the work to a single individual<br />
(trust me, I know—look at this book! Some days I wished there were 15 of me working on it).