11.07.2015 Views

acknowledgements for ansi/nist-itl 1-2011 - NIST Visual Image ...

acknowledgements for ansi/nist-itl 1-2011 - NIST Visual Image ...

acknowledgements for ansi/nist-itl 1-2011 - NIST Visual Image ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ANSI/<strong>NIST</strong>-ITL 1-<strong>2011</strong> - UPDATE 2013 DRAFT VERSIONAnnex A: Character encoding in<strong>for</strong>mationNormativeField 1.015 Character encoding / DCS allows the user to specify the character set <strong>for</strong>certain fields and record types, as described in Section 5.5. This Annex lists the codes <strong>for</strong>the different characters commonly used.Several fields in the standard require Hexadecimal or Base-64 representations, which are alsodescribed in this annex.A.1: 7-bit ASCII7-bit ASCII is required <strong>for</strong> all fields in Record Type-1: Transaction in<strong>for</strong>mation record. IfField 1.015 Character encoding / DCS is not included in the transaction, the default characterset encoding is 7-bit ASCII with the leftmost (eighth) bit padded with zero. ASCII is defined inANSI X3.4-1986 (R1992) (See Section 3 Normative references). See Table 108 <strong>for</strong> theallowed values.A.2: Unicode and UTF encodingField 1.015 Character encoding / DCS allows the user to select an alternate characterencoding listed in Table 4 Character encoding. UTF-8 and UTF-16 allow <strong>for</strong> the specialnational characters such as ü, é. ß and ñ. They also allow <strong>for</strong> certain other character sets, suchas Cyrillic and Arabic.Table 108 does not list all of these characters, only including a few examples. In Table 108,the character ç is in only the 8-bit Latin set, unlike the English language characters, whichare in both the 7-bit (default) character encoding set and the 8-bit set. The Chinese character白 is not in the 8-bit Latin character set, but it is in UTF-8. When using these extendedcharacter sets, they shall only appear where the record layout tables specify 'U' or 'userdefined'<strong>for</strong> the character type.UTF-8 encoding is variable width. The first 128 characters use one byte and areequivalent to US-ASCII. The next 1,920 characters require two bytes to encode. Threeand four bytes are also possible <strong>for</strong> certain, more rare characters. Note that the UTF-8and UTF-16 encodings are substantially different. Note: Table 108 shows UTF-16BE(Big Endian) values. It is recommended that UTF-8 be used in preference to UTF-16 orUTF-32.[2013n>] The code <strong>for</strong> the space was listed as being alphabetic in <strong>2011</strong>. It ischanged to be a special character in this update, to bring the standard intoalignment with standard programming terminology. [

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!