Apress.Expert.Oracle.Database.Architecture.9i.and.10g.Programming.Techniques.and.Solutions.Sep.2005

rekharaghuram
from rekharaghuram More from this publisher
05.11.2015 Views

CHAPTER 12 ■ DATATYPES 499 Table 12-1. Four Basic String Types String Type Description VARCHAR2( ) is a number between 1 and 4,000 for up to 4,000 bytes of storage. In the following section, we’ll examine in detail the differences and nuances of the BYTE versus CHAR modifier in that clause. CHAR( ) is a number between 1 and 2,000 for up to 2,000 bytes of storage. NVARCHAR2( ) is a number greater than 0 whose upper bound is dictated by your national character set. NCHAR( ) is a number greater than 0 whose upper bound is dictated by your national character set. Bytes or Characters The VARCHAR2 and CHAR types support two methods of specifying lengths: • In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multibyte character set. • In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information. When using a multibyte character set such as UTF8, you would be well advised to use the CHAR modifier in the VARCHAR2/CHAR definition—that is, use VARCHAR2(80 CHAR), not VARCHAR2(80), since your intention is likely to define a column that can in fact store 80 characters of data. You may also use the session or system parameter NLS_LENGTH_SEMANTICS to change the default behavior from BYTE to CHAR. I do not recommend changing this setting at the system level; rather, use it as part of an ALTER SESSION setting in your database schema installation scripts. Any application that requires a database to have a specific set of NLS settings makes for an “unfriendly” application. Such applications, generally, cannot be installed into a database with other applications that do not desire these settings, but rely on the defaults to be in place. One other important thing to remember is that the upper bound of the number of bytes stored in a VARCHAR2 is 4,000. However, even if you specify VARCHAR2(4000 CHAR), you may not be able to fit 4,000 characters into that field. In fact, you may be able to fit as few as 1,000 characters in that field if all of the characters take 4 bytes to be represented in your chosen character set! The following small example demonstrates the differences between BYTE and CHAR, and how the upper bounds come into play. We’ll create a table with three columns, the first two of which will be 1 byte and 1 character, respectively, with the last column being 4,000 characters. Notice that we’re performing this test on a multibyte character set database using the character set AL32UTF8, which supports the latest version of the Unicode standard and encodes characters in a variable-length fashion using from 1 to 4 bytes for each character:

500 CHAPTER 12 ■ DATATYPES ops$tkyte@O10GUTF> select * 2 from nls_database_parameters 3 where parameter = 'NLS_CHARACTERSET'; PARAMETER VALUE ------------------------------ -------------------- NLS_CHARACTERSET AL32UTF8 ops$tkyte@O10GUTF> create table t 2 ( a varchar2(1), 3 b varchar2(1 char), 4 c varchar2(4000 char) 5 ) 6 / Table created. Now, if we try to insert into our table a single character that is 2 bytes long in UTF, we observe the following: ops$tkyte@O10GUTF> insert into t (a) values (unistr('\00d6')); insert into t (a) values (unistr('\00d6')) * ERROR at line 1: ORA-12899: value too large for column "OPS$TKYTE"."T"."A" (actual: 2, maximum: 1) This example demonstrates two things: • VARCHAR2(1) is in bytes, not characters. We have single Unicode character, but it won’t fit into a single byte. • As you migrate an application from a single-byte, fixed-width character set to a multibyte character set, you might find that the text that used to fit into your fields no longer does. The reason for the second point is that a 20-character string in a single-byte character set is 20 bytes long and will absolutely fit in a VARCHAR2(20). However a 20-character field could be as long as 80 bytes in a multibyte character set, and 20 Unicode characters may well not fit in 20 bytes. You might consider modifying your DDL to be VARCHAR2(20 CHAR) or using the NLS_LENGTH_SEMANTICS session parameter mentioned previously when running your DDL to create your tables. If we insert that single character into a field set up to hold a single character, we will observe the following: ops$tkyte@O10GUTF> insert into t (b) values (unistr('\00d6')); 1 row created. ops$tkyte@O10GUTF> select length(b), lengthb(b), dump(b) dump from t; LENGTH(B) LENGTHB(B) DUMP ---------- ---------- -------------------- 1 2 Typ=1 Len=2: 195,150

500<br />

CHAPTER 12 ■ DATATYPES<br />

ops$tkyte@O10GUTF> select *<br />

2 from nls_database_parameters<br />

3 where parameter = 'NLS_CHARACTERSET';<br />

PARAMETER<br />

VALUE<br />

------------------------------ --------------------<br />

NLS_CHARACTERSET<br />

AL32UTF8<br />

ops$tkyte@O10GUTF> create table t<br />

2 ( a varchar2(1),<br />

3 b varchar2(1 char),<br />

4 c varchar2(4000 char)<br />

5 )<br />

6 /<br />

Table created.<br />

Now, if we try to insert into our table a single character that is 2 bytes long in UTF, we<br />

observe the following:<br />

ops$tkyte@O10GUTF> insert into t (a) values (unistr('\00d6'));<br />

insert into t (a) values (unistr('\00d6'))<br />

*<br />

ERROR at line 1:<br />

ORA-12899: value too large for column "OPS$TKYTE"."T"."A"<br />

(actual: 2, maximum: 1)<br />

This example demonstrates two things:<br />

• VARCHAR2(1) is in bytes, not characters. We have single Unicode character, but it won’t fit<br />

into a single byte.<br />

• As you migrate an application from a single-byte, fixed-width character set to a multibyte<br />

character set, you might find that the text that used to fit into your fields no longer<br />

does.<br />

The reason for the second point is that a 20-character string in a single-byte character set<br />

is 20 bytes long <strong>and</strong> will absolutely fit in a VARCHAR2(20). However a 20-character field could<br />

be as long as 80 bytes in a multibyte character set, <strong>and</strong> 20 Unicode characters may well not<br />

fit in 20 bytes. You might consider modifying your DDL to be VARCHAR2(20 CHAR) or using the<br />

NLS_LENGTH_SEMANTICS session parameter mentioned previously when running your DDL to<br />

create your tables.<br />

If we insert that single character into a field set up to hold a single character, we will<br />

observe the following:<br />

ops$tkyte@O10GUTF> insert into t (b) values (unistr('\00d6'));<br />

1 row created.<br />

ops$tkyte@O10GUTF> select length(b), lengthb(b), dump(b) dump from t;<br />

LENGTH(B) LENGTHB(B) DUMP<br />

---------- ---------- --------------------<br />

1 2 Typ=1 Len=2: 195,150

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!