Beginning Microsoft SQL Server 2008 ... - S3 Tech Training
Beginning Microsoft SQL Server 2008 ... - S3 Tech Training Beginning Microsoft SQL Server 2008 ... - S3 Tech Training
Chapter 9: SQL Server Storage and Index Structures As you can see, it looks essentially identical to the more generic B-Trees we discussed earlier in the chapter. In this case, we’re doing a range search (something clustered indexes are particularly good at) for numbers 158–400. All we have to do is navigate to the first record and include all remaining records on that page. We know we need the rest of that page because the information from the node one level up lets us know that we’ll also need data from a few other pages. Because this is an ordered list, we can be sure it’s continuous — that means if the next page has records that should be included, then the rest of this page must be included. We can just start spewing out data from those pages without having to do any verification. We start off by navigating to the root node. SQL Server is able to locate the root node based on an entry that you can see in the system metadata view called sys.indexes. By looking through the page that serves as the root node, we can figure out what the next page we need to examine is (the second page on the second level as we have it drawn here). We then continue the process. With each step we take down the tree, we are getting to smaller and smaller subsets of data. Eventually, we will get to the leaf level of the index. In the case of our clustered index, getting to the leaf level of the index means that we are also at our desired row(s) and our desired data. Non-Clustered Indexes on a Heap 272 I can’t stress enough the importance of this distinction: With a clustered index, when you’ve fully navigated the index, you’ve fully navigated to your data. How much of a performance difference this can make will really show its head as we look at nonclustered indexes — particularly when the non-clustered index is built over a clustered index. Non-clustered indexes on a heap work very similarly to clustered indexes in most ways. They do, however, have a few notable differences: The leaf level is not the data — instead, it is the level at which you are able to obtain a pointer to that data. This pointer comes in the form of a row identifier or RID, which, as we described earlier in the chapter, is made up of the extent, page, and row offset for the particular row being pointed to by the index. Even though the leaf level is not the actual data (instead, it has the RID), we only have one more step than with a clustered index. Because the RID has the full information on the location of the row, we can go directly to the data. Don’t, however, misunderstand this “one more step” to mean that there’s only a small amount of overhead difference and that non-clustered indexes on a heap will run close to as fast as a clustered index. With a clustered index, the data is physically in the order of the index. That means, for a range of data, when you find the row that has the beginning of your data on it, there’s a good chance that the other rows are on that page with it (that is, you’re already physically almost to the next record since they are stored together). With a heap, the data is not linked together in any way other than through the index. From a physical standpoint, there is absolutely no sorting of any kind. This means that from a physical read standpoint, your system may have to retrieve records from all over the file. Indeed, it’s quite possible (possibly even probable) that you will wind up fetching data from the same page several separate times. SQL Server has no way of knowing it will have to come back to that physical location because
there was no link between the data. With the clustered index, it knows that’s the physical sort, and can therefore grab it all in just one visit to the page. Just to be fair to the non-clustered index on a heap here vs. the clustered index, the odds are extremely high that any page that was already read once will still be in the memory cache and, as such, will be retrieved extremely quickly. Still, it does add some additional logical operations to retrieve the data. Figure 9-6 shows the same search we performed on the clustered index, only with a non-clustered index on a heap this time. Through most of the index navigation, things work exactly as they did before. We start out at the same root node, and we traverse the tree dealing with more and more focused pages until we get to the leaf level of our index. This is where we run into the difference. With a clustered index, we could have stopped right here, but with a non-clustered index, we have more work to do. If the non-clustered index is on a heap, then we have just one more level to go. We take the Row ID from the leaf level page and navigate to it. It is not until this point that we are at our actual data. Root Non-Leaf Level Leaf Level Data Pages Figure 9-6 1 2 52 476405 236205 111903 Chapter 9: SQL Server Storage and Index Structures 53 54 103 1 53 104 100403 236201 241905 104 105 156 220701 220702 220703 220704 220701 334205 141604 020001 Ralph Ashley Bill Non-Clustered Indexes on a Clustered Table Looking for Records 158 through 400 With non-clustered indexes on a clustered table, the similarities continue — but so do the differences. Just as with non-clustered indexes on a heap, the non-leaf level of the index looks pretty much as it did for a clustered index. The difference does not come until we get to the leaf level. 1 157 157 158 269 141602 220702 220701 241901 241902 241903 241904 241905 270 271 400 401 Bob Sue Tony George 157 270 410 220703 236204 127504 126003 411 412 236201 236202 236203 236204 236205 151501 102404 Nick Don Kate Tony Francis 273
- Page 259 and 260: Chapter 8: Being Normal: Normalizat
- Page 261 and 262: Chapter 8: Being Normal: Normalizat
- Page 263 and 264: Chapter 8: Being Normal: Normalizat
- Page 265 and 266: Chapter 8: Being Normal: Normalizat
- Page 267 and 268: Chapter 8: Being Normal: Normalizat
- Page 269 and 270: Chapter 8: Being Normal: Normalizat
- Page 271 and 272: Chapter 8: Being Normal: Normalizat
- Page 273 and 274: Chapter 8: Being Normal: Normalizat
- Page 275 and 276: Chapter 8: Being Normal: Normalizat
- Page 277 and 278: Chapter 8: Being Normal: Normalizat
- Page 279 and 280: Chapter 8: Being Normal: Normalizat
- Page 281 and 282: Chapter 8: Being Normal: Normalizat
- Page 283 and 284: Chapter 8: Being Normal: Normalizat
- Page 285 and 286: Chapter 8: Being Normal: Normalizat
- Page 287 and 288: Chapter 8: Being Normal: Normalizat
- Page 289 and 290: Chapter 8: Being Normal: Normalizat
- Page 291 and 292: Chapter 8: Being Normal: Normalizat
- Page 293 and 294: Chapter 8: Being Normal: Normalizat
- Page 295 and 296: Chapter 8: Being Normal: Normalizat
- Page 297 and 298: Chapter 8: Being Normal: Normalizat
- Page 299 and 300: 9 SQL Ser ver Storage and Index Str
- Page 301 and 302: Page Splits When a page becomes ful
- Page 303 and 304: The point here is that what happens
- Page 305 and 306: Page Splits — A First Look All of
- Page 307 and 308: You may hear lots of bad things abo
- Page 309: Navigating the Tree Figure 9-4 As I
- Page 313 and 314: Root Non-Leaf Level Leaf Level Figu
- Page 315 and 316: The CREATE INDEX Statement The CREA
- Page 317 and 318: FILLFACTOR When SQL Server first cr
- Page 319 and 320: works only if tempdb is on a separa
- Page 321 and 322: Secondary XML Indexes Chapter 9: SQ
- Page 323 and 324: occur, and that one or more non-lea
- Page 325 and 326: isn’t room on the page, SQL Serve
- Page 327 and 328: more administrator oriented and usu
- Page 329 and 330: The Database Engine Tuning Advisor
- Page 331 and 332: The output is far more self-describ
- Page 333 and 334: We use a FILLFACTOR when we need to
- Page 335: Chapter 9: SQL Server Storage and I
- Page 338 and 339: Chapter 10: Views The preceding syn
- Page 340 and 341: Chapter 10: Views 302 columns to a
- Page 342 and 343: Chapter 10: Views Try It Out Using
- Page 344 and 345: Chapter 10: Views 306 soh.SalesOrde
- Page 346 and 347: Chapter 10: Views AW00000676 43659
- Page 348 and 349: Chapter 10: Views NULL values will
- Page 350 and 351: Chapter 10: Views Editing V iews wi
- Page 352 and 353: Chapter 10: Views 314 There are fou
- Page 354 and 355: Chapter 10: Views Editing Views in
- Page 356 and 357: Chapter 10: Views 318 In addition,
- Page 358 and 359: Chapter 10: Views 320 from the firs
Chapter 9: <strong>SQL</strong> <strong>Server</strong> Storage and Index Structures<br />
As you can see, it looks essentially identical to the more generic B-Trees we discussed earlier in the chapter.<br />
In this case, we’re doing a range search (something clustered indexes are particularly good at) for<br />
numbers 158–400. All we have to do is navigate to the first record and include all remaining records on<br />
that page. We know we need the rest of that page because the information from the node one level up<br />
lets us know that we’ll also need data from a few other pages. Because this is an ordered list, we can be<br />
sure it’s continuous — that means if the next page has records that should be included, then the rest of<br />
this page must be included. We can just start spewing out data from those pages without having to do<br />
any verification.<br />
We start off by navigating to the root node. <strong>SQL</strong> <strong>Server</strong> is able to locate the root node based on an entry<br />
that you can see in the system metadata view called sys.indexes.<br />
By looking through the page that serves as the root node, we can figure out what the next page we need<br />
to examine is (the second page on the second level as we have it drawn here). We then continue the<br />
process. With each step we take down the tree, we are getting to smaller and smaller subsets of data.<br />
Eventually, we will get to the leaf level of the index. In the case of our clustered index, getting to the leaf<br />
level of the index means that we are also at our desired row(s) and our desired data.<br />
Non-Clustered Indexes on a Heap<br />
272<br />
I can’t stress enough the importance of this distinction: With a clustered index, when<br />
you’ve fully navigated the index, you’ve fully navigated to your data. How much of<br />
a performance difference this can make will really show its head as we look at nonclustered<br />
indexes — particularly when the non-clustered index is built over a clustered<br />
index.<br />
Non-clustered indexes on a heap work very similarly to clustered indexes in most ways. They do, however,<br />
have a few notable differences:<br />
The leaf level is not the data — instead, it is the level at which you are able to obtain a pointer to that<br />
data. This pointer comes in the form of a row identifier or RID, which, as we described earlier in the<br />
chapter, is made up of the extent, page, and row offset for the particular row being pointed to by the<br />
index. Even though the leaf level is not the actual data (instead, it has the RID), we only have one more<br />
step than with a clustered index. Because the RID has the full information on the location of the row, we<br />
can go directly to the data.<br />
Don’t, however, misunderstand this “one more step” to mean that there’s only a small amount of overhead<br />
difference and that non-clustered indexes on a heap will run close to as fast as a clustered index.<br />
With a clustered index, the data is physically in the order of the index. That means, for a range of data,<br />
when you find the row that has the beginning of your data on it, there’s a good chance that the other<br />
rows are on that page with it (that is, you’re already physically almost to the next record since they are<br />
stored together). With a heap, the data is not linked together in any way other than through the index.<br />
From a physical standpoint, there is absolutely no sorting of any kind. This means that from a physical<br />
read standpoint, your system may have to retrieve records from all over the file. Indeed, it’s quite possible<br />
(possibly even probable) that you will wind up fetching data from the same page several separate<br />
times. <strong>SQL</strong> <strong>Server</strong> has no way of knowing it will have to come back to that physical location because