Beginning Microsoft SQL Server 2008 ... - S3 Tech Training

Beginning Microsoft SQL Server 2008 ... - S3 Tech Training Beginning Microsoft SQL Server 2008 ... - S3 Tech Training

cdn.s3techtraining.com
from cdn.s3techtraining.com More from this publisher
17.06.2013 Views

Chapter 9: SQL Server Storage and Index Structures As you can see, it looks essentially identical to the more generic B-Trees we discussed earlier in the chapter. In this case, we’re doing a range search (something clustered indexes are particularly good at) for numbers 158–400. All we have to do is navigate to the first record and include all remaining records on that page. We know we need the rest of that page because the information from the node one level up lets us know that we’ll also need data from a few other pages. Because this is an ordered list, we can be sure it’s continuous — that means if the next page has records that should be included, then the rest of this page must be included. We can just start spewing out data from those pages without having to do any verification. We start off by navigating to the root node. SQL Server is able to locate the root node based on an entry that you can see in the system metadata view called sys.indexes. By looking through the page that serves as the root node, we can figure out what the next page we need to examine is (the second page on the second level as we have it drawn here). We then continue the process. With each step we take down the tree, we are getting to smaller and smaller subsets of data. Eventually, we will get to the leaf level of the index. In the case of our clustered index, getting to the leaf level of the index means that we are also at our desired row(s) and our desired data. Non-Clustered Indexes on a Heap 272 I can’t stress enough the importance of this distinction: With a clustered index, when you’ve fully navigated the index, you’ve fully navigated to your data. How much of a performance difference this can make will really show its head as we look at nonclustered indexes — particularly when the non-clustered index is built over a clustered index. Non-clustered indexes on a heap work very similarly to clustered indexes in most ways. They do, however, have a few notable differences: The leaf level is not the data — instead, it is the level at which you are able to obtain a pointer to that data. This pointer comes in the form of a row identifier or RID, which, as we described earlier in the chapter, is made up of the extent, page, and row offset for the particular row being pointed to by the index. Even though the leaf level is not the actual data (instead, it has the RID), we only have one more step than with a clustered index. Because the RID has the full information on the location of the row, we can go directly to the data. Don’t, however, misunderstand this “one more step” to mean that there’s only a small amount of overhead difference and that non-clustered indexes on a heap will run close to as fast as a clustered index. With a clustered index, the data is physically in the order of the index. That means, for a range of data, when you find the row that has the beginning of your data on it, there’s a good chance that the other rows are on that page with it (that is, you’re already physically almost to the next record since they are stored together). With a heap, the data is not linked together in any way other than through the index. From a physical standpoint, there is absolutely no sorting of any kind. This means that from a physical read standpoint, your system may have to retrieve records from all over the file. Indeed, it’s quite possible (possibly even probable) that you will wind up fetching data from the same page several separate times. SQL Server has no way of knowing it will have to come back to that physical location because

there was no link between the data. With the clustered index, it knows that’s the physical sort, and can therefore grab it all in just one visit to the page. Just to be fair to the non-clustered index on a heap here vs. the clustered index, the odds are extremely high that any page that was already read once will still be in the memory cache and, as such, will be retrieved extremely quickly. Still, it does add some additional logical operations to retrieve the data. Figure 9-6 shows the same search we performed on the clustered index, only with a non-clustered index on a heap this time. Through most of the index navigation, things work exactly as they did before. We start out at the same root node, and we traverse the tree dealing with more and more focused pages until we get to the leaf level of our index. This is where we run into the difference. With a clustered index, we could have stopped right here, but with a non-clustered index, we have more work to do. If the non-clustered index is on a heap, then we have just one more level to go. We take the Row ID from the leaf level page and navigate to it. It is not until this point that we are at our actual data. Root Non-Leaf Level Leaf Level Data Pages Figure 9-6 1 2 52 476405 236205 111903 Chapter 9: SQL Server Storage and Index Structures 53 54 103 1 53 104 100403 236201 241905 104 105 156 220701 220702 220703 220704 220701 334205 141604 020001 Ralph Ashley Bill Non-Clustered Indexes on a Clustered Table Looking for Records 158 through 400 With non-clustered indexes on a clustered table, the similarities continue — but so do the differences. Just as with non-clustered indexes on a heap, the non-leaf level of the index looks pretty much as it did for a clustered index. The difference does not come until we get to the leaf level. 1 157 157 158 269 141602 220702 220701 241901 241902 241903 241904 241905 270 271 400 401 Bob Sue Tony George 157 270 410 220703 236204 127504 126003 411 412 236201 236202 236203 236204 236205 151501 102404 Nick Don Kate Tony Francis 273

Chapter 9: <strong>SQL</strong> <strong>Server</strong> Storage and Index Structures<br />

As you can see, it looks essentially identical to the more generic B-Trees we discussed earlier in the chapter.<br />

In this case, we’re doing a range search (something clustered indexes are particularly good at) for<br />

numbers 158–400. All we have to do is navigate to the first record and include all remaining records on<br />

that page. We know we need the rest of that page because the information from the node one level up<br />

lets us know that we’ll also need data from a few other pages. Because this is an ordered list, we can be<br />

sure it’s continuous — that means if the next page has records that should be included, then the rest of<br />

this page must be included. We can just start spewing out data from those pages without having to do<br />

any verification.<br />

We start off by navigating to the root node. <strong>SQL</strong> <strong>Server</strong> is able to locate the root node based on an entry<br />

that you can see in the system metadata view called sys.indexes.<br />

By looking through the page that serves as the root node, we can figure out what the next page we need<br />

to examine is (the second page on the second level as we have it drawn here). We then continue the<br />

process. With each step we take down the tree, we are getting to smaller and smaller subsets of data.<br />

Eventually, we will get to the leaf level of the index. In the case of our clustered index, getting to the leaf<br />

level of the index means that we are also at our desired row(s) and our desired data.<br />

Non-Clustered Indexes on a Heap<br />

272<br />

I can’t stress enough the importance of this distinction: With a clustered index, when<br />

you’ve fully navigated the index, you’ve fully navigated to your data. How much of<br />

a performance difference this can make will really show its head as we look at nonclustered<br />

indexes — particularly when the non-clustered index is built over a clustered<br />

index.<br />

Non-clustered indexes on a heap work very similarly to clustered indexes in most ways. They do, however,<br />

have a few notable differences:<br />

The leaf level is not the data — instead, it is the level at which you are able to obtain a pointer to that<br />

data. This pointer comes in the form of a row identifier or RID, which, as we described earlier in the<br />

chapter, is made up of the extent, page, and row offset for the particular row being pointed to by the<br />

index. Even though the leaf level is not the actual data (instead, it has the RID), we only have one more<br />

step than with a clustered index. Because the RID has the full information on the location of the row, we<br />

can go directly to the data.<br />

Don’t, however, misunderstand this “one more step” to mean that there’s only a small amount of overhead<br />

difference and that non-clustered indexes on a heap will run close to as fast as a clustered index.<br />

With a clustered index, the data is physically in the order of the index. That means, for a range of data,<br />

when you find the row that has the beginning of your data on it, there’s a good chance that the other<br />

rows are on that page with it (that is, you’re already physically almost to the next record since they are<br />

stored together). With a heap, the data is not linked together in any way other than through the index.<br />

From a physical standpoint, there is absolutely no sorting of any kind. This means that from a physical<br />

read standpoint, your system may have to retrieve records from all over the file. Indeed, it’s quite possible<br />

(possibly even probable) that you will wind up fetching data from the same page several separate<br />

times. <strong>SQL</strong> <strong>Server</strong> has no way of knowing it will have to come back to that physical location because

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!