Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005
CHAPTER 12 ■ DATATYPES 499 Table 12-1. Four Basic String Types String Type Description VARCHAR2( ) is a number between 1 and 4,000 for up to 4,000 bytes of storage. In the following section, we’ll examine in detail the differences and nuances of the BYTE versus CHAR modifier in that clause. CHAR( ) is a number between 1 and 2,000 for up to 2,000 bytes of storage. NVARCHAR2( ) is a number greater than 0 whose upper bound is dictated by your national character set. NCHAR( ) is a number greater than 0 whose upper bound is dictated by your national character set. Bytes or Characters The VARCHAR2 and CHAR types support two methods of specifying lengths: • In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multibyte character set. • In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information. When using a multibyte character set such as UTF8, you would be well advised to use the CHAR modifier in the VARCHAR2/CHAR definition—that is, use VARCHAR2(80 CHAR), not VARCHAR2(80), since your intention is likely to define a column that can in fact store 80 characters of data. You may also use the session or system parameter NLS_LENGTH_SEMANTICS to change the default behavior from BYTE to CHAR. I do not recommend changing this setting at the system level; rather, use it as part of an ALTER SESSION setting in your database schema installation scripts. Any application that requires a database to have a specific set of NLS settings makes for an “unfriendly” application. Such applications, generally, cannot be installed into a database with other applications that do not desire these settings, but rely on the defaults to be in place. One other important thing to remember is that the upper bound of the number of bytes stored in a VARCHAR2 is 4,000. However, even if you specify VARCHAR2(4000 CHAR), you may not be able to fit 4,000 characters into that field. In fact, you may be able to fit as few as 1,000 characters in that field if all of the characters take 4 bytes to be represented in your chosen character set! The following small example demonstrates the differences between BYTE and CHAR, and how the upper bounds come into play. We’ll create a table with three columns, the first two of which will be 1 byte and 1 character, respectively, with the last column being 4,000 characters. Notice that we’re performing this test on a multibyte character set database using the character set AL32UTF8, which supports the latest version of the Unicode standard and encodes characters in a variable-length fashion using from 1 to 4 bytes for each character:
500 CHAPTER 12 ■ DATATYPES ops$tkyte@O10GUTF> select * 2 from nls_database_parameters 3 where parameter = 'NLS_CHARACTERSET'; PARAMETER VALUE ------------------------------ -------------------- NLS_CHARACTERSET AL32UTF8 ops$tkyte@O10GUTF> create table t 2 ( a varchar2(1), 3 b varchar2(1 char), 4 c varchar2(4000 char) 5 ) 6 / Table created. Now, if we try to insert into our table a single character that is 2 bytes long in UTF, we observe the following: ops$tkyte@O10GUTF> insert into t (a) values (unistr('\00d6')); insert into t (a) values (unistr('\00d6')) * ERROR at line 1: ORA-12899: value too large for column "OPS$TKYTE"."T"."A" (actual: 2, maximum: 1) This example demonstrates two things: • VARCHAR2(1) is in bytes, not characters. We have single Unicode character, but it won’t fit into a single byte. • As you migrate an application from a single-byte, fixed-width character set to a multibyte character set, you might find that the text that used to fit into your fields no longer does. The reason for the second point is that a 20-character string in a single-byte character set is 20 bytes long and will absolutely fit in a VARCHAR2(20). However a 20-character field could be as long as 80 bytes in a multibyte character set, and 20 Unicode characters may well not fit in 20 bytes. You might consider modifying your DDL to be VARCHAR2(20 CHAR) or using the NLS_LENGTH_SEMANTICS session parameter mentioned previously when running your DDL to create your tables. If we insert that single character into a field set up to hold a single character, we will observe the following: ops$tkyte@O10GUTF> insert into t (b) values (unistr('\00d6')); 1 row created. ops$tkyte@O10GUTF> select length(b), lengthb(b), dump(b) dump from t; LENGTH(B) LENGTHB(B) DUMP ---------- ---------- -------------------- 1 2 Typ=1 Len=2: 195,150
- Page 494 and 495: CHAPTER 11 ■ INDEXES 449 Table 11
- Page 496 and 497: CHAPTER 11 ■ INDEXES 451 9 1, 'M'
- Page 498 and 499: CHAPTER 11 ■ INDEXES 453 column w
- Page 500 and 501: CHAPTER 11 ■ INDEXES 455 Bitmap j
- Page 502 and 503: CHAPTER 11 ■ INDEXES 457 INSERT a
- Page 504 and 505: CHAPTER 11 ■ INDEXES 459 7 l_last
- Page 506 and 507: CHAPTER 11 ■ INDEXES 461 ops$tkyt
- Page 508 and 509: CHAPTER 11 ■ INDEXES 463 If we co
- Page 510 and 511: CHAPTER 11 ■ INDEXES 465 ops$tkyt
- Page 512 and 513: CHAPTER 11 ■ INDEXES 467 Caveat o
- Page 514 and 515: CHAPTER 11 ■ INDEXES 469 ops$tkyt
- Page 516 and 517: CHAPTER 11 ■ INDEXES 471 Frequent
- Page 518 and 519: CHAPTER 11 ■ INDEXES 473 select *
- Page 520 and 521: CHAPTER 11 ■ INDEXES 475 If you s
- Page 522 and 523: CHAPTER 11 ■ INDEXES 477 we’ll
- Page 524 and 525: CHAPTER 11 ■ INDEXES 479 Predicat
- Page 526 and 527: CHAPTER 11 ■ INDEXES 481 ops$tkyt
- Page 528 and 529: CHAPTER 11 ■ INDEXES 483 ops$tkyt
- Page 530 and 531: CHAPTER 11 ■ INDEXES 485 This dem
- Page 532 and 533: CHAPTER 11 ■ INDEXES 487 SELECT /
- Page 534 and 535: CHAPTER 12 ■ ■ ■ Datatypes Ch
- Page 536 and 537: CHAPTER 12 ■ DATATYPES 491 • TI
- Page 538 and 539: CHAPTER 12 ■ DATATYPES 493 (in th
- Page 540 and 541: CHAPTER 12 ■ DATATYPES 495 That d
- Page 542 and 543: CHAPTER 12 ■ DATATYPES 497 ops$tk
- Page 546 and 547: CHAPTER 12 ■ DATATYPES 501 The IN
- Page 548 and 549: CHAPTER 12 ■ DATATYPES 503 ops$tk
- Page 550 and 551: CHAPTER 12 ■ DATATYPES 505 • BI
- Page 552 and 553: CHAPTER 12 ■ DATATYPES 507 NUMBER
- Page 554 and 555: CHAPTER 12 ■ DATATYPES 509 MSG NU
- Page 556 and 557: CHAPTER 12 ■ DATATYPES 511 They a
- Page 558 and 559: CHAPTER 12 ■ DATATYPES 513 ■Not
- Page 560 and 561: CHAPTER 12 ■ DATATYPES 515 Coping
- Page 562 and 563: CHAPTER 12 ■ DATATYPES 517 Note t
- Page 564 and 565: CHAPTER 12 ■ DATATYPES 519 We are
- Page 566 and 567: CHAPTER 12 ■ DATATYPES 521 Format
- Page 568 and 569: CHAPTER 12 ■ DATATYPES 523 ops$tk
- Page 570 and 571: CHAPTER 12 ■ DATATYPES 525 You ca
- Page 572 and 573: CHAPTER 12 ■ DATATYPES 527 month
- Page 574 and 575: CHAPTER 12 ■ DATATYPES 529 DT2-DT
- Page 576 and 577: CHAPTER 12 ■ DATATYPES 531 DT TS
- Page 578 and 579: CHAPTER 12 ■ DATATYPES 533 ops$tk
- Page 580 and 581: CHAPTER 12 ■ DATATYPES 535 Since
- Page 582 and 583: CHAPTER 12 ■ DATATYPES 537 ops$tk
- Page 584 and 585: CHAPTER 12 ■ DATATYPES 539 ops$tk
- Page 586 and 587: CHAPTER 12 ■ DATATYPES 541 suppor
- Page 588 and 589: CHAPTER 12 ■ DATATYPES 543 Concep
- Page 590 and 591: CHAPTER 12 ■ DATATYPES 545 We can
- Page 592 and 593: CHAPTER 12 ■ DATATYPES 547 buffer
500<br />
CHAPTER 12 ■ DATATYPES<br />
ops$tkyte@O10GUTF> select *<br />
2 from nls_database_parameters<br />
3 where parameter = 'NLS_CHARACTERSET';<br />
PARAMETER<br />
VALUE<br />
------------------------------ --------------------<br />
NLS_CHARACTERSET<br />
AL32UTF8<br />
ops$tkyte@O10GUTF> create table t<br />
2 ( a varchar2(1),<br />
3 b varchar2(1 char),<br />
4 c varchar2(4000 char)<br />
5 )<br />
6 /<br />
Table created.<br />
Now, if we try to insert into our table a single character that is 2 bytes long in UTF, we<br />
observe the following:<br />
ops$tkyte@O10GUTF> insert into t (a) values (unistr('\00d6'));<br />
insert into t (a) values (unistr('\00d6'))<br />
*<br />
ERROR at line 1:<br />
ORA-12899: value too large for column "OPS$TKYTE"."T"."A"<br />
(actual: 2, maximum: 1)<br />
This example demonstrates two things:<br />
• VARCHAR2(1) is in bytes, not characters. We have single Unicode character, but it won’t fit<br />
into a single byte.<br />
• As you migrate an application from a single-byte, fixed-width character set to a multibyte<br />
character set, you might find that the text that used to fit into your fields no longer<br />
does.<br />
The reason for the second point is that a 20-character string in a single-byte character set<br />
is 20 bytes long <strong>and</strong> will absolutely fit in a VARCHAR2(20). However a 20-character field could<br />
be as long as 80 bytes in a multibyte character set, <strong>and</strong> 20 Unicode characters may well not<br />
fit in 20 bytes. You might consider modifying your DDL to be VARCHAR2(20 CHAR) or using the<br />
NLS_LENGTH_SEMANTICS session parameter mentioned previously when running your DDL to<br />
create your tables.<br />
If we insert that single character into a field set up to hold a single character, we will<br />
observe the following:<br />
ops$tkyte@O10GUTF> insert into t (b) values (unistr('\00d6'));<br />
1 row created.<br />
ops$tkyte@O10GUTF> select length(b), lengthb(b), dump(b) dump from t;<br />
LENGTH(B) LENGTHB(B) DUMP<br />
---------- ---------- --------------------<br />
1 2 Typ=1 Len=2: 195,150