Beginning Microsoft SQL Server 2008 ... - S3 Tech Training
Beginning Microsoft SQL Server 2008 ... - S3 Tech Training Beginning Microsoft SQL Server 2008 ... - S3 Tech Training
Chapter 9: SQL Server Storage and Index Structures The Pros Clustered indexes are best for queries when the column(s) in question will frequently be the subject of a ranged query. This kind of query is typified by use of the BETWEEN statement or the < or > symbols. Queries that use a GROUP BY and make use of the MAX, MIN, and COUNT aggregators are also great examples of queries that use ranges and love clustered indexes. Clustering works well here because the search can go straight to a particular point in the physical data, keep reading until it gets to the end of the range, and then stop. It is extremely efficient. Clusters can also be excellent when you want your data sorted (using ORDER BY) based on the cluster key. The Cons 286 There are two situations in which you don’t want to create that clustered index. The first is fairly obvious — when there’s a better place to use it. I know I’m sounding repetitive here, but don’t use a clustered index on a column just because it seems like the thing to do (primary keys are the common culprit here). Be sure that you don’t have another column that it’s better suited to first. Perhaps the much bigger no-no use for clustered indexes, however, is when you are going to be doing a lot of inserts in a non-sequential order. Remember that concept of page splits? Well, here’s where it can come back and haunt you big time. Imagine this scenario: You are creating an accounting system. You would like to make use of the concept of a transaction number for your primary key in your transaction files, but you would also like those transaction numbers to be somewhat indicative of what kind of transaction it is (it really helps trouble - shooting for your accountants). So you come up with something of a scheme — you’ll place a prefix on all the transactions indicating what sub-system they come out of. They will look something like this: ARXXXXXX Accounts Receivable Transactions GLXXXXXX General Ledger Transactions APXXXXXX Accounts Payable Transactions where XXXXXX will be a sequential numeric value. This seems like a great idea, so you implement it, leaving the default of the clustered index going on the primary key. At first glance, everything about this setup looks fine. You’re going to have unique values, and the accountants will love the fact that they can infer where something came from based on the transaction number. The clustered index seems to make sense since they will often be querying for ranges of transaction IDs. Ah, if only it were that simple. Think about your inserts for a bit. With a clustered index, we originally had a nice mechanism to avoid much of the overhead of page splits. When a new record was inserted that was to go after the last record in the table, then even if there was a page split, only that record would go to the new page — SQL Server wouldn’t try and move around any of the old data. Now we’ve messed things up though. New records inserted from the General Ledger will wind up going on the end of the file just fine (GL is last alphabetically, and the numbers will be sequential). The AR and AP transactions have a major problem though — they are going to be doing non-sequential inserts. When AP000025 gets inserted and there
isn’t room on the page, SQL Server is going to see AR000001 in the table and know that it’s not a sequential insert. Half the records from the old page will be copied to a new page before AP000025 is inserted. The overhead of this can be staggering. Remember that we’re dealing with a clustered index, and that the clustered index is the data. The data is in index order. This means that when you move the index to a new page, you are also moving the data. Now imagine that you’re running this accounting system in a typical OLTP environment (you don’t get much more OLTP-like than an accounting system) with a bunch of data-entry people keying in vendor invoices or customer orders as fast as they can. You’re going to have page splits occurring constantly, and every time you do, you’re going to see a brief hesitation for users of that table while the system moves data around. Fortunately, there are a couple of ways to avoid this scenario: ❑ Choose a cluster key that is going to be sequential in its inserting. You can either create an identity column for this, or you may have another column that logically is sequential to any transaction entered regardless of the system. ❑ Choose not to use a clustered index on this table. This is often the best option in a situation like this, since an insert into a non-clustered index on a heap is usually faster than one on a cluster key. Even as I’ve told you to lean toward sequential cluster keys to avoid page splits, you also have to realize that there’s a cost there. Among the downsides of sequential cluster keys are concurrency (two or more people trying to get to the same object at the same time). It’s all about balancing out what you want, what you’re doing, and what it’s going to cost you elsewhere. This is perhaps one of the best examples of why I have gone into so much depth about how things work. You need to think through how things are actually going to get done before you have a good feel for what the right index to use (or not to use) is. Column Order Matters Chapter 9: SQL Server Storage and Index Structures Just because an index has two columns, it doesn’t mean that the index is useful for any query that refers to either column. An index is only considered for use if the first column listed in the index is used in the query. The bright side is that there doesn’t have to be an exact one-for-one match to every column — just the first. Naturally, the more columns that match (in order), the better, but only the first creates a definite do-not-use situation. Think about things this way. Imagine that you are using a phone book. Everything is indexed by last name and then first name — does this sorting do you any real good if all you know is that the person you want to call is named Fred? On the other hand, if all you know is that his last name is Blake, the index will still serve to narrow the field for you. One of the more common mistakes that I see in index construction is the belief that one index that includes all the columns is going to be helpful for all situations. Indeed, what you’re really doing is storing all the data a second time. The index will totally be ignored if the first column of the index isn’t mentioned in the JOIN, ORDER BY, or WHERE clauses of the query. 287
- Page 273 and 274: Chapter 8: Being Normal: Normalizat
- Page 275 and 276: Chapter 8: Being Normal: Normalizat
- Page 277 and 278: Chapter 8: Being Normal: Normalizat
- Page 279 and 280: Chapter 8: Being Normal: Normalizat
- Page 281 and 282: Chapter 8: Being Normal: Normalizat
- Page 283 and 284: Chapter 8: Being Normal: Normalizat
- Page 285 and 286: Chapter 8: Being Normal: Normalizat
- Page 287 and 288: Chapter 8: Being Normal: Normalizat
- Page 289 and 290: Chapter 8: Being Normal: Normalizat
- Page 291 and 292: Chapter 8: Being Normal: Normalizat
- Page 293 and 294: Chapter 8: Being Normal: Normalizat
- Page 295 and 296: Chapter 8: Being Normal: Normalizat
- Page 297 and 298: Chapter 8: Being Normal: Normalizat
- Page 299 and 300: 9 SQL Ser ver Storage and Index Str
- Page 301 and 302: Page Splits When a page becomes ful
- Page 303 and 304: The point here is that what happens
- Page 305 and 306: Page Splits — A First Look All of
- Page 307 and 308: You may hear lots of bad things abo
- Page 309 and 310: Navigating the Tree Figure 9-4 As I
- Page 311 and 312: there was no link between the data.
- Page 313 and 314: Root Non-Leaf Level Leaf Level Figu
- Page 315 and 316: The CREATE INDEX Statement The CREA
- Page 317 and 318: FILLFACTOR When SQL Server first cr
- Page 319 and 320: works only if tempdb is on a separa
- Page 321 and 322: Secondary XML Indexes Chapter 9: SQ
- Page 323: occur, and that one or more non-lea
- Page 327 and 328: more administrator oriented and usu
- Page 329 and 330: The Database Engine Tuning Advisor
- Page 331 and 332: The output is far more self-describ
- Page 333 and 334: We use a FILLFACTOR when we need to
- Page 335: Chapter 9: SQL Server Storage and I
- Page 338 and 339: Chapter 10: Views The preceding syn
- Page 340 and 341: Chapter 10: Views 302 columns to a
- Page 342 and 343: Chapter 10: Views Try It Out Using
- Page 344 and 345: Chapter 10: Views 306 soh.SalesOrde
- Page 346 and 347: Chapter 10: Views AW00000676 43659
- Page 348 and 349: Chapter 10: Views NULL values will
- Page 350 and 351: Chapter 10: Views Editing V iews wi
- Page 352 and 353: Chapter 10: Views 314 There are fou
- Page 354 and 355: Chapter 10: Views Editing Views in
- Page 356 and 357: Chapter 10: Views 318 In addition,
- Page 358 and 359: Chapter 10: Views 320 from the firs
- Page 360 and 361: Chapter 10: Views You can get the y
- Page 363 and 364: 11 Writing Scripts and Batches Whet
- Page 365 and 366: Next we have a DECLARE statement to
- Page 367 and 368: I’m not going to pick any bones a
- Page 369 and 370: Using @@IDENTITY @@IDENTITY is one
- Page 371 and 372: How It Works What we’re doing in
- Page 373 and 374: DECLARE @RowCount int; --Notice the
isn’t room on the page, <strong>SQL</strong> <strong>Server</strong> is going to see AR000001 in the table and know that it’s not a sequential<br />
insert. Half the records from the old page will be copied to a new page before AP000025 is inserted.<br />
The overhead of this can be staggering. Remember that we’re dealing with a clustered index, and that<br />
the clustered index is the data. The data is in index order. This means that when you move the index to a<br />
new page, you are also moving the data. Now imagine that you’re running this accounting system in a<br />
typical OLTP environment (you don’t get much more OLTP-like than an accounting system) with a<br />
bunch of data-entry people keying in vendor invoices or customer orders as fast as they can. You’re going<br />
to have page splits occurring constantly, and every time you do, you’re going to see a brief hesitation for<br />
users of that table while the system moves data around.<br />
Fortunately, there are a couple of ways to avoid this scenario:<br />
❑ Choose a cluster key that is going to be sequential in its inserting. You can either create an identity<br />
column for this, or you may have another column that logically is sequential to any transaction<br />
entered regardless of the system.<br />
❑ Choose not to use a clustered index on this table. This is often the best option in a situation like<br />
this, since an insert into a non-clustered index on a heap is usually faster than one on a cluster key.<br />
Even as I’ve told you to lean toward sequential cluster keys to avoid page splits, you also have to realize<br />
that there’s a cost there. Among the downsides of sequential cluster keys are concurrency (two or more<br />
people trying to get to the same object at the same time). It’s all about balancing out what you want,<br />
what you’re doing, and what it’s going to cost you elsewhere.<br />
This is perhaps one of the best examples of why I have gone into so much depth about how things work.<br />
You need to think through how things are actually going to get done before you have a good feel for<br />
what the right index to use (or not to use) is.<br />
Column Order Matters<br />
Chapter 9: <strong>SQL</strong> <strong>Server</strong> Storage and Index Structures<br />
Just because an index has two columns, it doesn’t mean that the index is useful for any query that refers<br />
to either column.<br />
An index is only considered for use if the first column listed in the index is used in the query. The bright<br />
side is that there doesn’t have to be an exact one-for-one match to every column — just the first. Naturally,<br />
the more columns that match (in order), the better, but only the first creates a definite do-not-use<br />
situation.<br />
Think about things this way. Imagine that you are using a phone book. Everything is indexed by last<br />
name and then first name — does this sorting do you any real good if all you know is that the person<br />
you want to call is named Fred? On the other hand, if all you know is that his last name is Blake, the<br />
index will still serve to narrow the field for you.<br />
One of the more common mistakes that I see in index construction is the belief that one index that includes<br />
all the columns is going to be helpful for all situations. Indeed, what you’re really doing is storing all the<br />
data a second time. The index will totally be ignored if the first column of the index isn’t mentioned in<br />
the JOIN, ORDER BY, or WHERE clauses of the query.<br />
287