Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005

rekharaghuram
from rekharaghuram More from this publisher
05.11.2015 Views

CHAPTER 14 ■ PARALLEL EXECUTION 617 So, before applying parallel execution, you need the following two things to be true: • You must have a very large task, such as the full scan of 50GB of data. • You must have sufficient available resources. Before parallel full scanning 50GB of data, you would want to make sure that there is sufficient free CPU (to accommodate the parallel processes) as well as sufficient I/O. The 50GB should be spread over more than one physical disk to allow for many concurrent read requests to happen simultaneously, there should be sufficient I/O channels from the disk to the computer to retrieve the data from disk in parallel, and so on. If you have a small task, as generally typified by the queries carried out in an OLTP system, or you have insufficient available resources, again as is typical in an OLTP system where CPU and I/O resources are often already used to their maximum, then parallel execution is not something you’ll want to consider. A Parallel Processing Analogy I often use an analogy to describe parallel processing and why you need both a large task and sufficient free resources in the database. It goes like this: suppose you have two tasks to complete. The first is to write a one-page summary of a new product. The other is to write a ten-chapter comprehensive report, with each chapter being very much independent of the others. For example, consider this book. This chapter, “Parallel Execution,” is very much separate and distinct from the chapter titled “Redo and Undo”—they did not have to be written sequentially. How do you approach each task? Which one do you think would benefit from parallel processing? One-Page Summary In this analogy, the one-page summary you have been assigned is not a large task. You would either do it yourself or assign it to a single individual. Why? Because the amount of work required to “parallelize” this process would exceed the work needed just to write the paper yourself. You would have to sit down, figure out that there should be 12 paragraphs, determine that each paragraph is not dependent on the other paragraphs, hold a team meeting, pick 12 individuals, explain to them the problem and assign each person a paragraph, act as the coordinator and collect all of their paragraphs, sequence them into the right order, verify they are correct, and then print the report. This is all likely to take longer than it would to just write the paper yourself, serially. The overhead of managing a large group of people on a project of this scale will far outweigh any gains to be had from having the 12 paragraphs written in parallel. The exact same principle applies to parallel execution in the database. If you have a job that takes seconds or less to complete serially, then the introduction of parallel execution and its associated managerial overhead will likely make the entire thing take longer. Ten-Chapter Report Now let’s examine the second task. If you want that ten-chapter report fast—as fast as possible—the slowest way to accomplish it would be to assign all of the work to a single individual (trust me, I know—look at this book! Some days I wished there were 15 of me working on it).

618 CHAPTER 14 ■ PARALLEL EXECUTION Here you would hold the meeting, review the process, assign the work, act as the coordinator, collect the results, bind up the finished report, and deliver it. It would not have been done in one-tenth the time, but perhaps one-eighth or so. Again, I say this with the proviso that you have sufficient free resources. If you have a large staff that is currently not actually doing anything, then splitting up the work makes complete sense. However, consider that as the manager, your staff is multitasking and they have a lot on their plates. In that case, you have to be careful with that big project. You need to be sure not to overwhelm your staff; you don’t want to work them beyond the point of exhaustion. You can’t delegate out more work than your resources (your people) can cope with, otherwise they’ll quit. If your staff is already fully utilized, adding more work will cause all schedules to slip and all projects to be delayed. Parallel execution in Oracle is very much the same. If you have a task that takes many minutes, hours, or days, then the introduction of parallel execution may be the thing that makes it run eight times faster. But then again, if you are already seriously low on resources (the overworked team of people), then the introduction of parallel execution would be something to avoid, as the system will become even more bogged down. While the Oracle server processes won’t “quit” in protest, they could start running out of RAM and failing, or just suffer from such long waits for I/O or CPU as to make it appear as if they were doing no work whatsoever. If you keep that in mind, remembering never to take an analogy to illogical extremes, you’ll have the commonsense guiding rule to see if parallelism can be of some use. If you have a job that takes seconds, it is doubtful that parallel execution can be used to make it go faster—the converse would be more likely. If you are low on resources already (i.e., your resources are fully utilized), adding parallel execution would likely make things worse, not better. Parallel execution is excellent for when you have a really big job and plenty of excess capacity. In this chapter, we’ll take a look at some of the ways we can exploit those resources. Parallel Query Parallel query allows a single SQL SELECT statement to be divided into many smaller queries, with each component query being run concurrently, and then the results from each combined to provide the final answer. For example, consider the following query: big_table@ORA10G> select count(status) from big_table; Using parallel query, this query could use some number of parallel sessions; break the BIG_TABLE into small, nonoverlapping slices; and ask each parallel session to read the table and count its section of rows. The parallel query coordinator for this session would then receive each of the aggregated counts from the individual parallel sessions and further aggregate them, returning the final answer to the client application. Graphically, it might look like Figure 14-1. The P000, P001, P002, and P003 processes are known as parallel execution servers, sometimes also referred to as parallel query (PQ) slaves. Each of these parallel execution servers is a separate session connected as if it were a dedicated server process. Each one is responsible for scanning a nonoverlapping region of BIG_TABLE, aggregating their results subsets, and sending back their output to the coordinating server—the original session’s server process—which will aggregate the subresults into the final answer.

618<br />

CHAPTER 14 ■ PARALLEL EXECUTION<br />

Here you would hold the meeting, review the process, assign the work, act as the coordinator,<br />

collect the results, bind up the finished report, <strong>and</strong> deliver it. It would not have been done in<br />

one-tenth the time, but perhaps one-eighth or so. Again, I say this with the proviso that you<br />

have sufficient free resources. If you have a large staff that is currently not actually doing anything,<br />

then splitting up the work makes complete sense.<br />

However, consider that as the manager, your staff is multitasking <strong>and</strong> they have a lot on<br />

their plates. In that case, you have to be careful with that big project. You need to be sure not<br />

to overwhelm your staff; you don’t want to work them beyond the point of exhaustion. You<br />

can’t delegate out more work than your resources (your people) can cope with, otherwise<br />

they’ll quit. If your staff is already fully utilized, adding more work will cause all schedules to<br />

slip <strong>and</strong> all projects to be delayed.<br />

Parallel execution in <strong>Oracle</strong> is very much the same. If you have a task that takes many<br />

minutes, hours, or days, then the introduction of parallel execution may be the thing that<br />

makes it run eight times faster. But then again, if you are already seriously low on resources<br />

(the overworked team of people), then the introduction of parallel execution would be something<br />

to avoid, as the system will become even more bogged down. While the <strong>Oracle</strong> server<br />

processes won’t “quit” in protest, they could start running out of RAM <strong>and</strong> failing, or just<br />

suffer from such long waits for I/O or CPU as to make it appear as if they were doing no<br />

work whatsoever.<br />

If you keep that in mind, remembering never to take an analogy to illogical extremes,<br />

you’ll have the commonsense guiding rule to see if parallelism can be of some use. If you<br />

have a job that takes seconds, it is doubtful that parallel execution can be used to make it go<br />

faster—the converse would be more likely. If you are low on resources already (i.e., your<br />

resources are fully utilized), adding parallel execution would likely make things worse, not<br />

better. Parallel execution is excellent for when you have a really big job <strong>and</strong> plenty of excess<br />

capacity. In this chapter, we’ll take a look at some of the ways we can exploit those resources.<br />

Parallel Query<br />

Parallel query allows a single SQL SELECT statement to be divided into many smaller queries,<br />

with each component query being run concurrently, <strong>and</strong> then the results from each combined<br />

to provide the final answer. For example, consider the following query:<br />

big_table@ORA10G> select count(status) from big_table;<br />

Using parallel query, this query could use some number of parallel sessions; break the<br />

BIG_TABLE into small, nonoverlapping slices; <strong>and</strong> ask each parallel session to read the table<br />

<strong>and</strong> count its section of rows. The parallel query coordinator for this session would then<br />

receive each of the aggregated counts from the individual parallel sessions <strong>and</strong> further aggregate<br />

them, returning the final answer to the client application. Graphically, it might look like<br />

Figure 14-1.<br />

The P000, P001, P002, <strong>and</strong> P003 processes are known as parallel execution servers, sometimes<br />

also referred to as parallel query (PQ) slaves. Each of these parallel execution servers is a<br />

separate session connected as if it were a dedicated server process. Each one is responsible for<br />

scanning a nonoverlapping region of BIG_TABLE, aggregating their results subsets, <strong>and</strong> sending<br />

back their output to the coordinating server—the original session’s server process—which will<br />

aggregate the subresults into the final answer.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!