Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005
CHAPTER 9 ■ REDO AND UNDO 313 Why Can’t I Allocate a New Log? I get this question all of the time. You are getting warning messages to this effect (this will be found in alert.log on your server): Thread 1 cannot allocate new log, sequence 1466 Checkpoint not complete Current log# 3 seq# 1465 mem# 0: /home/ora10g/oradata/ora10g/redo03.log It might say Archival required instead of Checkpoint not complete, but the effect is pretty much the same. This is really something the DBA should be looking out for. This message will be written to alert.log on the server whenever the database attempts to reuse an online redo log file and finds that it cannot. This will happen when DBWR has not yet finished checkpointing the data protected by the redo log or ARCH has not finished copying the redo log file to the archive destination. At this point in time, the database effectively halts as far as the end user is concerned. It stops cold. DBWR or ARCH will be given priority to flush the blocks to disk. Upon completion of the checkpoint or archival, everything goes back to normal. The reason the database suspends user activity is that there is simply no place to record the changes the users are making. Oracle is attempting to reuse an online redo log file, but because either the file would be needed to recover the database in the event of a failure (Checkpoint not complete), or the archiver has not yet finished copying it (Archival required), Oracle must wait (and the end users will wait) until the redo log file can safely be reused. If you see that your sessions spend a lot of time waiting on a “log file switch,” “log buffer space,” or “log file switch checkpoint or archival incomplete,” then you are most likely hitting this. You will notice it during prolonged periods of database modifications if your log files are sized incorrectly, or because DBWR and ARCH need to be tuned by the DBA or system administrator. I frequently see this issue with the “starter” database that has not been customized. The “starter” database typically sizes the redo logs far too small for any sizable amount of work (including the initial database build of the data dictionary itself). As soon as you start loading up the database, you will notice that the first 1,000 rows go fast, and then things start going in spurts: 1,000 go fast, then hang, then go fast, then hang, and so on. These are the indications you are hitting this condition. There are a couple of things you can do to solve this issue: • Make DBWR faster. Have your DBA tune DBWR by enabling ASYNC I/O, using DBWR I/O slaves, or using multiple DBWR processes. Look at the I/O on the system and see if one disk, or a set of disks, is “hot” so you need to therefore spread out the data. The same general advice applies for ARCH as well. The pros of this are that you get “something for nothing” here—increased performance without really changing any logic/structures/code. There really are no downsides to this approach. • Add more redo log files. This will postpone the Checkpoint not complete in some cases and, after a while, it will postpone the Checkpoint not complete so long that it perhaps doesn’t happen (because you gave DBWR enough breathing room to checkpoint). The same applies to the Archival required message. The benefit to this approach is the removal of the “pauses” in your system. The downside is it consumes more disk, but the benefit far outweighs any downside here.
314 CHAPTER 9 ■ REDO AND UNDO • Re-create the log files with a larger size. This will extend the amount of time between the time you fill the online redo log and the time you need to reuse it. The same applies to the Archival required message, if the redo log file usage is “bursty.” If you have a period of massive log generation (nightly loads, batch processes) followed by periods of relative calm, then having larger online redo logs can buy enough time for ARCH to catch up during the calm periods. The pros and cons are identical to the preceding approach of adding more files. Additionally, it may postpone a checkpoint from happening until later, since checkpoints happen at each log switch (at least), and the log switches will now be further apart. • Cause checkpointing to happen more frequently and more continuously. Use a smaller block buffer cache (not entirely desirable) or various parameter settings such as FAST_START_MTTR_TARGET, LOG_CHECKPOINT_INTERVAL, and LOG_CHECKPOINT_TIMEOUT. This will force DBWR to flush dirty blocks more frequently. The benefit to this approach is that recovery time from a failure is reduced. There will always be less work in the online redo logs to be applied. The downside is that blocks may be written to disk more frequently if they are modified often. The buffer cache will not be as effective as it could be, and it can defeat the block cleanout mechanism discussed in the next section. The approach you take will depend on your circumstances. This is something that must be fixed at the database level, taking the entire instance into consideration. Block Cleanout In this section, we’ll discuss block cleanouts, or the removal of “locking”-related information on the database blocks we’ve modified. This concept is important to understand when we talk about the infamous ORA-01555: snapshot too old error in a subsequent section. If you recall in Chapter 6, we talked about data locks and how they are managed. I described how they are actually attributes of the data, stored on the block header. A side effect of this is that the next time that block is accessed, we may have to “clean it out”—in other words, remove the transaction information. This action generates redo and causes the block to become dirty if it wasn’t already, meaning that a simple SELECT may generate redo and may cause lots of blocks to be written to disk with the next checkpoint. Under most normal circumstances, however, this will not happen. If you have mostly small- to medium-sized transactions (OLTP), or you have a data warehouse that performs direct path loads or uses DBMS_STATS to analyze tables after load operations, then you’ll find the blocks are generally “cleaned” for you. If you recall from the earlier section titled “What Does a COMMIT Do?” one of the steps of COMMIT-time processing is to revisit our blocks if they are still in the SGA, if they are accessible (no one else is modifying them), and then clean them out. This activity is known as a commit cleanout and is the activity that cleans out the transaction information on our modified block. Optimally, our COMMIT can clean out the blocks so that a subsequent SELECT (read) will not have to clean it out. Only an UPDATE of this block would truly clean out our residual transaction information, and since the UPDATE is already generating redo, the cleanout is not noticeable. We can force a cleanout to not happen, and therefore observe its side effects, by understanding how the commit cleanout works. In a commit list associated with our transaction, Oracle will record lists of blocks we have modified. Each of these lists is 20 blocks long, and Oracle will allocate as many of these lists as it needs—up to a point. If the sum of the
- Page 308 and 309: CHAPTER 8 ■ TRANSACTIONS 263 “s
- Page 310 and 311: CHAPTER 8 ■ TRANSACTIONS 265 busi
- Page 312 and 313: CHAPTER 8 ■ TRANSACTIONS 267 Many
- Page 314 and 315: CHAPTER 8 ■ TRANSACTIONS 269 ops$
- Page 316 and 317: CHAPTER 8 ■ TRANSACTIONS 271 last
- Page 318 and 319: CHAPTER 8 ■ TRANSACTIONS 273 Dist
- Page 320 and 321: CHAPTER 8 ■ TRANSACTIONS 275 Auto
- Page 322 and 323: CHAPTER 8 ■ TRANSACTIONS 277 3 Au
- Page 324 and 325: CHAPTER 8 ■ TRANSACTIONS 279 5 pr
- Page 326: CHAPTER 8 ■ TRANSACTIONS 281 scot
- Page 329 and 330: 284 CHAPTER 9 ■ REDO AND UNDO cri
- Page 331 and 332: 286 CHAPTER 9 ■ REDO AND UNDO Fir
- Page 333 and 334: 288 CHAPTER 9 ■ REDO AND UNDO The
- Page 335 and 336: 290 CHAPTER 9 ■ REDO AND UNDO We
- Page 337 and 338: 292 CHAPTER 9 ■ REDO AND UNDO Wha
- Page 339 and 340: 294 CHAPTER 9 ■ REDO AND UNDO row
- Page 341 and 342: 296 CHAPTER 9 ■ REDO AND UNDO If
- Page 343 and 344: 298 CHAPTER 9 ■ REDO AND UNDO ops
- Page 345 and 346: 300 CHAPTER 9 ■ REDO AND UNDO Inv
- Page 347 and 348: 302 CHAPTER 9 ■ REDO AND UNDO The
- Page 349 and 350: 304 CHAPTER 9 ■ REDO AND UNDO 41
- Page 351 and 352: 306 CHAPTER 9 ■ REDO AND UNDO ins
- Page 353 and 354: 308 CHAPTER 9 ■ REDO AND UNDO So,
- Page 355 and 356: 310 CHAPTER 9 ■ REDO AND UNDO ops
- Page 357: 312 CHAPTER 9 ■ REDO AND UNDO ops
- Page 361 and 362: 316 CHAPTER 9 ■ REDO AND UNDO ...
- Page 363 and 364: 318 CHAPTER 9 ■ REDO AND UNDO •
- Page 365 and 366: 320 CHAPTER 9 ■ REDO AND UNDO bac
- Page 367 and 368: 322 CHAPTER 9 ■ REDO AND UNDO As
- Page 369 and 370: 324 CHAPTER 9 ■ REDO AND UNDO ops
- Page 371 and 372: 326 CHAPTER 9 ■ REDO AND UNDO wil
- Page 373 and 374: 328 CHAPTER 9 ■ REDO AND UNDO Thi
- Page 375 and 376: 330 CHAPTER 9 ■ REDO AND UNDO ops
- Page 377 and 378: 332 CHAPTER 9 ■ REDO AND UNDO Whe
- Page 379 and 380: 334 CHAPTER 9 ■ REDO AND UNDO Tha
- Page 381 and 382: 336 CHAPTER 9 ■ REDO AND UNDO tou
- Page 383 and 384: 338 CHAPTER 10 ■ DATABASE TABLES
- Page 385 and 386: 340 CHAPTER 10 ■ DATABASE TABLES
- Page 387 and 388: 342 CHAPTER 10 ■ DATABASE TABLES
- Page 389 and 390: 344 CHAPTER 10 ■ DATABASE TABLES
- Page 391 and 392: 346 CHAPTER 10 ■ DATABASE TABLES
- Page 393 and 394: 348 CHAPTER 10 ■ DATABASE TABLES
- Page 395 and 396: 350 CHAPTER 10 ■ DATABASE TABLES
- Page 397 and 398: 352 CHAPTER 10 ■ DATABASE TABLES
- Page 399 and 400: 354 CHAPTER 10 ■ DATABASE TABLES
- Page 401 and 402: 356 CHAPTER 10 ■ DATABASE TABLES
- Page 403 and 404: 358 CHAPTER 10 ■ DATABASE TABLES
- Page 405 and 406: 360 CHAPTER 10 ■ DATABASE TABLES
- Page 407 and 408: 362 CHAPTER 10 ■ DATABASE TABLES
CHAPTER 9 ■ REDO AND UNDO 313<br />
Why Can’t I Allocate a New Log?<br />
I get this question all of the time. You are getting warning messages to this effect (this will be<br />
found in alert.log on your server):<br />
Thread 1 cannot allocate new log, sequence 1466<br />
Checkpoint not complete<br />
Current log# 3 seq# 1465 mem# 0: /home/ora10g/oradata/ora10g/redo03.log<br />
It might say Archival required instead of Checkpoint not complete, but the effect is pretty<br />
much the same. This is really something the DBA should be looking out for. This message will<br />
be written to alert.log on the server whenever the database attempts to reuse an online redo<br />
log file <strong>and</strong> finds that it cannot. This will happen when DBWR has not yet finished checkpointing<br />
the data protected by the redo log or ARCH has not finished copying the redo log file to the<br />
archive destination. At this point in time, the database effectively halts as far as the end user<br />
is concerned. It stops cold. DBWR or ARCH will be given priority to flush the blocks to disk. Upon<br />
completion of the checkpoint or archival, everything goes back to normal. The reason the<br />
database suspends user activity is that there is simply no place to record the changes the users<br />
are making. <strong>Oracle</strong> is attempting to reuse an online redo log file, but because either the file<br />
would be needed to recover the database in the event of a failure (Checkpoint not complete), or<br />
the archiver has not yet finished copying it (Archival required), <strong>Oracle</strong> must wait (<strong>and</strong> the end<br />
users will wait) until the redo log file can safely be reused.<br />
If you see that your sessions spend a lot of time waiting on a “log file switch,” “log buffer<br />
space,” or “log file switch checkpoint or archival incomplete,” then you are most likely hitting<br />
this. You will notice it during prolonged periods of database modifications if your log files are<br />
sized incorrectly, or because DBWR <strong>and</strong> ARCH need to be tuned by the DBA or system administrator.<br />
I frequently see this issue with the “starter” database that has not been customized. The<br />
“starter” database typically sizes the redo logs far too small for any sizable amount of work<br />
(including the initial database build of the data dictionary itself). As soon as you start loading<br />
up the database, you will notice that the first 1,000 rows go fast, <strong>and</strong> then things start going in<br />
spurts: 1,000 go fast, then hang, then go fast, then hang, <strong>and</strong> so on. These are the indications<br />
you are hitting this condition.<br />
There are a couple of things you can do to solve this issue:<br />
• Make DBWR faster. Have your DBA tune DBWR by enabling ASYNC I/O, using DBWR I/O slaves,<br />
or using multiple DBWR processes. Look at the I/O on the system <strong>and</strong> see if one disk, or<br />
a set of disks, is “hot” so you need to therefore spread out the data. The same general<br />
advice applies for ARCH as well. The pros of this are that you get “something for nothing”<br />
here—increased performance without really changing any logic/structures/code. There<br />
really are no downsides to this approach.<br />
• Add more redo log files. This will postpone the Checkpoint not complete in some cases<br />
<strong>and</strong>, after a while, it will postpone the Checkpoint not complete so long that it perhaps<br />
doesn’t happen (because you gave DBWR enough breathing room to checkpoint). The<br />
same applies to the Archival required message. The benefit to this approach is the<br />
removal of the “pauses” in your system. The downside is it consumes more disk, but<br />
the benefit far outweighs any downside here.