Beginning Microsoft SQL Server 2008 ... - S3 Tech Training

Beginning Microsoft SQL Server 2008 ... - S3 Tech Training Beginning Microsoft SQL Server 2008 ... - S3 Tech Training

cdn.s3techtraining.com
from cdn.s3techtraining.com More from this publisher
17.06.2013 Views

Chapter 9: SQL Server Storage and Index Structures But the overhead doesn’t stop there. Since we’re in a tree arrangement, you have the possibility for something of a cascading action. When you create the new page (because of the split), you need to make another entry in the parent node. This entry in the parent node also has the potential to cause a page split at that level, and the process starts all over again. Indeed, this possibility extends all the way up to and can even affect the root node. If the root node splits, then you actually end up creating two additional pages. Because there can be only one root node, the page that was formerly the root node is split into two pages and becomes a new intermediate level of the tree. An entirely new root node is then created, and will have two entries (one to the old root node, one to the split page). Needless to say, page splits can have a very negative impact on system performance and are characterized by behavior where your process on the server seems to just pause for a few seconds (while the pages are being split and rewritten). We will talk about page-split prevention before we’re done with this chapter. While page splits at the leaf level are a common fact of life, page splits at intermediate nodes happen far less frequently. As your table grows, every layer of the index will experience page splits, but because the intermediate nodes have only one entry for several entries on the next lower node, the number of page splits gets less and less frequent as you move further up the tree. Still, for a split to occur above the leaf level, there must have already been a split at the next lowest level — this means that page splits up the tree are cumulative (and expensive performance-wise) in nature. SQL Server has a number of different types of indexes (which we will discuss shortly), but they all make use of this B-Tree approach in some way or another. Indeed, they are all very similar in structure, thanks to the flexible nature of a B-Tree. Still, we shall see that there are indeed some significant differences, and these can have an impact on the performance of our system. For an SQL Server index, the nodes of the tree come in the form of pages, but you can actually apply this concept of a root node, the non-leaf level, the leaf level, and the tree structure to more than just SQL Server or even just databases. How Data Is Accessed in SQL Server In the broadest sense, there are only two ways in which SQL Server retrieves the data you request: ❑ Using a table scan ❑ Using an index Which method SQL Server uses to run your particular query will depend on what indexes are available, what columns you are asking about, what kind of joins you are doing, and the size of your tables. Use of Table Scans 268 A table scan is a pretty straightforward process. When a table scan is performed, SQL Server starts at the physical beginning of the table, looking through every row in the table. As it finds rows that match the criteria of your query, it includes them in the result set.

You may hear lots of bad things about table scans, and in general, they will be true. However, table scans can actually be the fastest method of access in some instances. Typically, this is the case when retrieving data from rather small tables. The exact size where this becomes the case will vary widely according to the width of your table and the specific nature of the query. See if you can spot why the use of EXISTS in the WHERE clause of your queries has so much to offer performance-wise when it fits the problem. When you use the EXISTS operator, SQL Server stops as soon as it finds one record that matches the criteria. If you had a million-record table and it found a matching record on the third record, then use of the EXISTS option would have saved you the reading of 999,997 records! NOT EXISTS works in much the same way. Use of Indexes When SQL Server decides to use an index, the process actually works somewhat similarly to a table scan, but with a few shortcuts. During the query optimization process, the optimizer takes a look at all the available indexes and chooses the best one (this is primarily based on the information you specify in your joins and WHERE clause, combined with statistical information SQL Server keeps on index makeup). Once that index is chosen, SQL Server navigates the tree structure to the point of data that matches your criteria and again extracts only the records it needs. The difference is that since the data is sorted, the query engine knows when it has reached the end of the current range it is looking for. It can then end the query, or move on to the next range of data as necessary. If you ponder the query topics we’ve studied thus far (Chapter 7 specifically), you may notice some striking resemblances to how the EXISTS option works. The EXISTS keyword allowed a query to quit running the instant that it found a match. The performance gains using an index are similar or better than EXISTS since the process of searching for data can work in a similar fashion; that is, the server can use the sort of the index to know when there is nothing left that’s relevant and can stop things right there. Even better, however, is that by using an index, we don’t have to limit ourselves to Boolean situations (does the piece of data I was after exist — yes or no?). We can apply this same notion to both the beginning and end of a range. We are able to gather ranges of data with essentially the same benefits that using an index gives to finding data. What’s more, we can do a very fast lookup (called a SEEK) of our data rather than hunting through the entire table. Don’t get the impression from my comparing what indexes do to what the EXISTS operator does that indexes replace the EXISTS operator altogether (or vice versa). The two are not mutually exclusive; they can be used together, and often are. I mention them here together only because they have the similarity of being able to tell when their work is done, and quit before getting to the physical end of the table. Index Types and Index Navigation Although there are nominally two types of base index structures in SQL Server (clustered and non-clustered), there are actually, internally speaking, three different types: ❑ Clustered indexes Chapter 9: SQL Server Storage and Index Structures ❑ Non-clustered indexes, which comprise: ❑ Non-clustered indexes on a heap ❑ Non-clustered indexes on a clustered index 269

Chapter 9: <strong>SQL</strong> <strong>Server</strong> Storage and Index Structures<br />

But the overhead doesn’t stop there. Since we’re in a tree arrangement, you have the possibility for<br />

something of a cascading action. When you create the new page (because of the split), you need to make<br />

another entry in the parent node. This entry in the parent node also has the potential to cause a page<br />

split at that level, and the process starts all over again. Indeed, this possibility extends all the way up to<br />

and can even affect the root node.<br />

If the root node splits, then you actually end up creating two additional pages. Because there can be only<br />

one root node, the page that was formerly the root node is split into two pages and becomes a new intermediate<br />

level of the tree. An entirely new root node is then created, and will have two entries (one to the<br />

old root node, one to the split page).<br />

Needless to say, page splits can have a very negative impact on system performance and are characterized<br />

by behavior where your process on the server seems to just pause for a few seconds (while the<br />

pages are being split and rewritten).<br />

We will talk about page-split prevention before we’re done with this chapter.<br />

While page splits at the leaf level are a common fact of life, page splits at intermediate nodes happen far<br />

less frequently. As your table grows, every layer of the index will experience page splits, but because the<br />

intermediate nodes have only one entry for several entries on the next lower node, the number of page<br />

splits gets less and less frequent as you move further up the tree. Still, for a split to occur above the leaf<br />

level, there must have already been a split at the next lowest level — this means that page splits up the<br />

tree are cumulative (and expensive performance-wise) in nature.<br />

<strong>SQL</strong> <strong>Server</strong> has a number of different types of indexes (which we will discuss shortly), but they all make<br />

use of this B-Tree approach in some way or another. Indeed, they are all very similar in structure, thanks<br />

to the flexible nature of a B-Tree. Still, we shall see that there are indeed some significant differences, and<br />

these can have an impact on the performance of our system.<br />

For an <strong>SQL</strong> <strong>Server</strong> index, the nodes of the tree come in the form of pages, but you can actually apply this<br />

concept of a root node, the non-leaf level, the leaf level, and the tree structure to more than just <strong>SQL</strong><br />

<strong>Server</strong> or even just databases.<br />

How Data Is Accessed in <strong>SQL</strong> <strong>Server</strong><br />

In the broadest sense, there are only two ways in which <strong>SQL</strong> <strong>Server</strong> retrieves the data you request:<br />

❑ Using a table scan<br />

❑ Using an index<br />

Which method <strong>SQL</strong> <strong>Server</strong> uses to run your particular query will depend on what indexes are available,<br />

what columns you are asking about, what kind of joins you are doing, and the size of your tables.<br />

Use of Table Scans<br />

268<br />

A table scan is a pretty straightforward process. When a table scan is performed, <strong>SQL</strong> <strong>Server</strong> starts at the<br />

physical beginning of the table, looking through every row in the table. As it finds rows that match the<br />

criteria of your query, it includes them in the result set.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!