27.06.2013 Views

OT1 encoding draft specification

OT1 encoding draft specification

OT1 encoding draft specification

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Abstract<br />

The <strong>OT1</strong> <strong>encoding</strong> is an attempt to describe the <strong>encoding</strong><br />

of the text fonts in Donald E. Knuth’s Computer<br />

Modern family of fonts [3]. This is an impossibile<br />

goal, since these fonts present no less than five<br />

different <strong>encoding</strong>s, but they are nontheless largely<br />

treated by L ATEX as having identical <strong>encoding</strong>s. Due<br />

to the ambiguities this creates, and in view of that<br />

the T1 <strong>encoding</strong> supersedes <strong>OT1</strong> as text font <strong>encoding</strong>,<br />

there is little point in using the text commands<br />

for distiguishing mandatory features of <strong>OT1</strong> from the<br />

ordinary.<br />

<strong>OT1</strong> is however also used as a math font <strong>encoding</strong><br />

(to go with OML, OMS, and OMX), and that use is<br />

today the by far most important since there is no obvious<br />

alternative to the standard math font set-up.<br />

Therefore this <strong>specification</strong> considers a feature of the<br />

<strong>encoding</strong> to be mandatory if it is (i) necessary for the<br />

standard math set-up or (ii) is a text feature which<br />

works for all <strong>encoding</strong> variants.<br />

Automatic processing of this document as a data<br />

file requires fontinst v 1.918 or higher.<br />

1 Encoding variants<br />

This document aims to record all the <strong>encoding</strong><br />

variations that are present within the set of fonts<br />

classified by L ATEX as having the <strong>OT1</strong> <strong>encoding</strong>. It<br />

turns out that they can all be described using only<br />

two parameters. One of these might be called<br />

‘ligaturing’ as one factor it affects is how many<br />

ligatures there will be in the font. The other<br />

parameter is best called ‘italicizing’ as it is different<br />

<strong>OT1</strong> <strong>encoding</strong> <strong>draft</strong> <strong>specification</strong><br />

Lars Hellström<br />

2001/09/09<br />

1<br />

between italic and non-italic fonts. Most of the<br />

Computer Modern fonts have ligaturing = 2 and<br />

italicizing = 0.<br />

Default ligaturing = 2<br />

Default italicizing = 0<br />

2 Mandatory characters<br />

2.1 Latin letters<br />

Slot 16 ‘dotlessi’<br />

Unicode character U+0131, latin small letter<br />

dotless i.<br />

A dotless i ‘ı’, used to produce accented letters such<br />

as ‘ī’. It is not used for math.<br />

Slot 17 ‘dotlessj’<br />

Unicode character U+F6BE, latin small letter<br />

dotless j.<br />

A dotless j ‘j’, used to produce accented letters such<br />

as ‘¯j’. It is not used for math. The Unicode<br />

standard does not define this character, but Adobe<br />

has assigned code point U+F6BE (which is in the<br />

private use subarea assigned by Adobe) to it.<br />

Slot 25 ‘germandbls’<br />

Unicode character U+00DF, latin small letter<br />

sharp s.<br />

This slot is not used in math.<br />

Slot 26 ‘ae’<br />

Unicode character U+00E6, latin small letter ae.<br />

This slot is not used in math.<br />

Slot 27 ‘oe’<br />

Unicode character U+0153, latin small ligature<br />

oe.


This is a single letter, and should not be faked with<br />

‘oe’. It is not used in math.<br />

Slot 28 ‘oslash’<br />

Unicode character U+00F8, latin small letter o<br />

with stroke.<br />

This slot is not used in math.<br />

Slot 29 ‘AE’<br />

Unicode character U+00C6, latin capital letter<br />

ae.<br />

This slot is not used in math.<br />

Slot 30 ‘OE’<br />

Unicode character U+0152, latin capital<br />

ligature oe.<br />

This is a single letter, and should not be faked with<br />

‘OE’. It is not used in math.<br />

Slot 31 ‘Oslash’<br />

Unicode character U+00D8, latin capital letter<br />

o with stroke.<br />

This slot is not used in math.<br />

Slot 65 ‘A’<br />

Unicode character U+0041, latin capital letter a.<br />

Slot 66 ‘B’<br />

Unicode character U+0042, latin capital letter b.<br />

Slot 67 ‘C’<br />

Unicode character U+0043, latin capital letter c.<br />

Slot 68 ‘D’<br />

Unicode character U+0044, latin capital letter d.<br />

Slot 69 ‘E’<br />

Unicode character U+0045, latin capital letter e.<br />

Slot 70 ‘F’<br />

Unicode character U+0046, latin capital letter f.<br />

Slot 71 ‘G’<br />

Unicode character U+0047, latin capital letter g.<br />

Slot 72 ‘H’<br />

Unicode character U+0048, latin capital letter h.<br />

Slot 73 ‘I’<br />

Unicode character U+0049, latin capital letter i.<br />

Slot 74 ‘J’<br />

Unicode character U+004A, latin capital letter j.<br />

2<br />

Slot 75 ‘K’<br />

Unicode character U+004B, latin capital letter k.<br />

Slot 76 ‘L’<br />

Unicode character U+004C, latin capital letter l.<br />

Slot 77 ‘M’<br />

Unicode character U+004D, latin capital letter<br />

m.<br />

Slot 78 ‘N’<br />

Unicode character U+004E, latin capital letter n.<br />

Slot 79 ‘O’<br />

Unicode character U+004F, latin capital letter o.<br />

Slot 80 ‘P’<br />

Unicode character U+0050, latin capital letter p.<br />

Slot 81 ‘Q’<br />

Unicode character U+0051, latin capital letter q.<br />

Slot 82 ‘R’<br />

Unicode character U+0052, latin capital letter r.<br />

Slot 83 ‘S’<br />

Unicode character U+0053, latin capital letter s.<br />

Slot 84 ‘T’<br />

Unicode character U+0054, latin capital letter t.<br />

Slot 85 ‘U’<br />

Unicode character U+0055, latin capital letter u.<br />

Slot 86 ‘V’<br />

Unicode character U+0056, latin capital letter v.<br />

Slot 87 ‘W’<br />

Unicode character U+0057, latin capital letter<br />

w.<br />

Slot 88 ‘X’<br />

Unicode character U+0058, latin capital letter x.<br />

Slot 89 ‘Y’<br />

Unicode character U+0059, latin capital letter y.<br />

Slot 90 ‘Z’<br />

Unicode character U+005A, latin capital letter z.<br />

Slot 97 ‘a’<br />

Unicode character U+0061, latin small letter a.<br />

Slot 98 ‘b’<br />

Unicode character U+0062, latin small letter b.


Slot 99 ‘c’<br />

Unicode character U+0063, latin small letter c.<br />

Slot 100 ‘d’<br />

Unicode character U+0064, latin small letter d.<br />

Slot 101 ‘e’<br />

Unicode character U+0065, latin small letter e.<br />

Slot 102 ‘f’<br />

Unicode character U+0066, latin small letter f.<br />

If ligaturing = 2 then<br />

Ordinary ligature f ∗ f → ff<br />

Ordinary ligature f ∗ i → fi<br />

Ordinary ligature f ∗ l → fl<br />

Fi<br />

Slot 103 ‘g’<br />

Unicode character U+0067, latin small letter g.<br />

Slot 104 ‘h’<br />

Unicode character U+0068, latin small letter h.<br />

Slot 105 ‘i’<br />

Unicode character U+0069, latin small letter i.<br />

Slot 106 ‘j’<br />

Unicode character U+006A, latin small letter j.<br />

Slot 107 ‘k’<br />

Unicode character U+006B, latin small letter k.<br />

Slot 108 ‘l’<br />

Unicode character U+006C, latin small letter l.<br />

Slot 109 ‘m’<br />

Unicode character U+006D, latin small letter m.<br />

Slot 110 ‘n’<br />

Unicode character U+006E, latin small letter n.<br />

Slot 111 ‘o’<br />

Unicode character U+006F, latin small letter o.<br />

Slot 112 ‘p’<br />

Unicode character U+0070, latin small letter p.<br />

Slot 113 ‘q’<br />

Unicode character U+0071, latin small letter q.<br />

Slot 114 ‘r’<br />

Unicode character U+0072, latin small letter r.<br />

3<br />

Slot 115 ‘s’<br />

Unicode character U+0073, latin small letter s.<br />

Slot 116 ‘t’<br />

Unicode character U+0074, latin small letter t.<br />

Slot 117 ‘u’<br />

Unicode character U+0075, latin small letter u.<br />

Slot 118 ‘v’<br />

Unicode character U+0076, latin small letter v.<br />

Slot 119 ‘w’<br />

Unicode character U+0077, latin small letter w.<br />

Slot 120 ‘x’<br />

Unicode character U+0078, latin small letter x.<br />

Slot 121 ‘y’<br />

Unicode character U+0079, latin small letter y.<br />

Slot 122 ‘z’<br />

Unicode character U+007A, latin small letter z.<br />

2.2 Greek letters<br />

Slot 0 ‘Gamma’<br />

Unicode character U+0393, greek capital letter<br />

gamma.<br />

Slot 1 ‘Delta’<br />

Unicode character U+0394, greek capital letter<br />

delta.<br />

Slot 2 ‘Theta’<br />

Unicode character U+0398, greek capital letter<br />

theta.<br />

Slot 3 ‘Lambda’<br />

Unicode character U+039B, greek capital letter<br />

lambda.<br />

Slot 4 ‘Xi’<br />

Unicode character U+039E, greek capital letter<br />

xi.<br />

Slot 5 ‘Pi’<br />

Unicode character U+03A0, greek capital letter<br />

pi.<br />

Slot 6 ‘Sigma’<br />

Unicode character U+03A3, greek capital letter<br />

sigma.


Slot 7 ‘Upsilon1’<br />

Unicode character U+03D2, greek upsilon with<br />

hook symbol.<br />

This is primarily a math character and it is should<br />

be visually distinct from the character in slot 89 (Y).<br />

This is not generally the case for the normal Upsilon<br />

U+03A5 (greek capital letter upsilon), which<br />

is by the way what usually has the glyph name<br />

Upsilon. An argument for the latter is however<br />

that all the purely mathematical Upsilons that exist<br />

in the Unicode standard are described as variants of<br />

the non-hook character.<br />

Slot 8 ‘Phi’<br />

Unicode character U+03A6, greek capital letter<br />

phi.<br />

Slot 9 ‘Psi’<br />

Unicode character U+03A8, greek capital letter<br />

psi.<br />

Slot 10 ‘Omega’<br />

Unicode character U+03A9, greek capital letter<br />

omega.<br />

2.3 Digits<br />

Slot 48 ‘zero’<br />

Unicode character U+0030, digit zero.<br />

Slot 49 ‘one’<br />

Unicode character U+0031, digit one.<br />

Slot 50 ‘two’<br />

Unicode character U+0032, digit two.<br />

Slot 51 ‘three’<br />

Unicode character U+0033, digit three.<br />

Slot 52 ‘four’<br />

Unicode character U+0034, digit four.<br />

Slot 53 ‘five’<br />

Unicode character U+0035, digit five.<br />

Slot 54 ‘six’<br />

Unicode character U+0036, digit six.<br />

Slot 55 ‘seven’<br />

Unicode character U+0037, digit seven.<br />

4<br />

Slot 56 ‘eight’<br />

Unicode character U+0038, digit eight.<br />

Slot 57 ‘nine’<br />

Unicode character U+0039, digit nine.<br />

2.4 Accents<br />

Slot 18 ‘grave’<br />

Unicode character U+0300, combining grave<br />

accent.<br />

Slot 19 ‘acute’<br />

Unicode character U+0301, combining acute<br />

accent.<br />

Slot 20 ‘caron’<br />

Unicode character U+030C, combining caron.<br />

The caron or háček accent ‘ˇ’.<br />

Slot 21 ‘breve’<br />

Unicode character U+0306, combining breve.<br />

Slot 22 ‘macron’<br />

Unicode character U+0304, combining macron.<br />

Slot 23 ‘ring’<br />

Unicode character U+030A, combining ring above.<br />

The text definition of ‘\r{A}’ assumes that this<br />

glyph has the same width as that in slot 65 (A).<br />

Slot 24 ‘cedilla’<br />

Unicode character U+0327, combining cedilla.<br />

This slot is not required for math, but is used by<br />

the \c L ATEX command.<br />

Slot 94 ‘circumflex’<br />

Unicode character U+0302, combining circumflex<br />

accent.<br />

Unicode character U+005E, circumflex accent.<br />

If ligaturing > 0 then<br />

Slot 95 ‘dotaccent’<br />

Unicode character U+0307, combining dot above.<br />

Note that the ligaturing = 0 assignment to slot 95<br />

(underscore) is ordinary, not mandatory.<br />

Fi


Slot 126 ‘tilde’<br />

Unicode character U+0303, combining tilde.<br />

Unicode character U+007E, tilde.<br />

Slot 127 ‘dieresis’<br />

Unicode character U+0308, combining diaeresis.<br />

2.5 Symbols<br />

Slot 33 ‘exclam’<br />

Unicode character U+0021, exclamation mark.<br />

Mandatory ligature<br />

exclam ∗ quoteleft → exclamdown<br />

Slot 35 ‘numbersign’<br />

Unicode character U+0023, number sign.<br />

This slot is not used for math.<br />

If italicizing = 0 then<br />

Slot 36 ‘dollar’<br />

Unicode character U+0024, dollar sign.<br />

Else<br />

Slot 36 ‘sterling’<br />

Unicode character U+00A3, pound sign.<br />

Fi<br />

Slot 37 ‘percent’<br />

Unicode character U+0025, percent sign.<br />

This slot is not used for math.<br />

Slot 38 ‘ampersand’<br />

Unicode character U+0026, ampersand.<br />

This slot is not used for math.<br />

Slot 39 ‘quoteright’<br />

Unicode character U+2019, right single<br />

quotation mark.<br />

This slot is not used for math.<br />

If ligaturing > 0 then<br />

Mandatory ligature<br />

quoteright ∗ quoteright → quotedblright<br />

Fi<br />

Slot 40 ‘parenleft’<br />

Unicode character U+0028, left parenthesis.<br />

5<br />

Slot 41 ‘parenright’<br />

Unicode character U+0029, right parenthesis.<br />

Slot 42 ‘asterisk’<br />

Unicode character U+002A, asterisk.<br />

This slot is not used for math.<br />

Slot 43 ‘plus’<br />

Unicode character U+002B, plus sign.<br />

Slot 44 ‘comma’<br />

Unicode character U+002C, comma.<br />

This slot is not used for math.<br />

Slot 45 ‘hyphen’<br />

Unicode character U+002D, hyphen-minus.<br />

This slot is not used for math.<br />

If ligaturing > 0 then<br />

Mandatory ligature hyphen ∗ hyphen → endash<br />

Fi<br />

Slot 46 ‘period’<br />

Unicode character U+002E, full stop.<br />

This slot is not used for math.<br />

Slot 47 ‘slash’<br />

Unicode character U+002F, solidus.<br />

Slot 58 ‘colon’<br />

Unicode character U+003A, colon.<br />

Slot 59 ‘semicolon’<br />

Unicode character U+003B, semicolon.<br />

Slot 61 ‘equal’<br />

Unicode character U+003D, equals sign.<br />

Slot 63 ‘question’<br />

Unicode character U+003F, question mark.<br />

Mandatory ligature<br />

question ∗ quoteleft → questiondown<br />

Slot 64 ‘at’<br />

Unicode character U+0040, commercial at.<br />

This slot is not used for math.<br />

Slot 91 ‘bracketleft’<br />

Unicode character U+005B, left square bracket.<br />

Slot 93 ‘bracketright’<br />

Unicode character U+005D, right square bracket.


Slot 96 ‘quoteleft’<br />

Unicode character U+2018, left single<br />

quotation mark.<br />

This slot is not used for math.<br />

If ligaturing > 0 then<br />

Mandatory ligature<br />

quoteleft ∗ quoteleft → quotedblleft<br />

Fi<br />

3 Ordinary characters<br />

3.1 Letters<br />

If ligaturing = 2 then<br />

Slot 11 ‘ff’<br />

Unicode character U+FB00, latin small ligature<br />

ff.<br />

This glyph should be two characters wide in a<br />

monowidth font.<br />

Ordinary ligature ff ∗ i → ffi<br />

Ordinary ligature ff ∗ l → ffl<br />

Slot 12 ‘fi’<br />

Unicode character U+FB01, latin small ligature<br />

fi.<br />

This glyph should be two characters wide in a<br />

monowidth font.<br />

Slot 13 ‘fl’<br />

Unicode character U+FB02, latin small ligature<br />

fl.<br />

This glyph should be two characters wide in a<br />

monowidth font.<br />

Slot 14 ‘ffi’<br />

Unicode character U+FB03, latin small ligature<br />

ffi.<br />

This glyph should be three characters wide in a<br />

monowidth font.<br />

Slot 15 ‘ffl’<br />

Unicode character U+FB04, latin small ligature<br />

ffl.<br />

This glyph should be three characters wide in a<br />

monowidth font.<br />

Fi<br />

6<br />

Slot 138 ‘Lslash’<br />

Unicode character U+0141, latin capital letter<br />

l with stroke.<br />

The letter ‘̷L’.<br />

Slot 170 ‘lslash’<br />

Unicode character U+0142, latin small letter l<br />

with stroke.<br />

The letter ‘̷l’.<br />

3.2 Accents<br />

If ligaturing > 0 then<br />

Slot 125 ‘hungarumlaut’<br />

Unicode character U+030B, combining double<br />

acute accent.<br />

The long Hungarian umlaut ‘˝’.<br />

Fi<br />

3.3 Symbols<br />

If ligaturing < 2 then<br />

Slot 11 ‘arrowup’<br />

Unicode character U+2191, upwards arrow.<br />

Slot 12 ‘arrowdown’<br />

Unicode character U+2193, downwards arrow.<br />

Slot 13 ‘quotesingle’<br />

Unicode character U+0027, apostrophe.<br />

Slot 14 ‘exclamdown’<br />

Unicode character U+00A1, inverted<br />

exclamation mark.<br />

Slot 15 ‘questiondown’<br />

Unicode character U+00BF, inverted question<br />

mark.<br />

Fi<br />

If ligaturing > 0 then<br />

Slot 32 ‘lslashslash’<br />

When this character is followed by U+004C (latin<br />

capital letter l) then the combined result is<br />

U+0141 (latin capital letter l with stroke);


this is used by the \L command. When this<br />

character is followed by U+006C (latin small<br />

letter l) then the combined result is U+0142<br />

(latin small letter l with stroke); this is<br />

used by the \l command. There are no further<br />

semantics for this character.<br />

Ordinary ligature lslashslash ∗ L → Lslash<br />

Ordinary ligature lslashslash ∗ l → lslash<br />

These ligatures, and the characters they point to,<br />

were added by Alan Jeffrey early in fontinst<br />

development in order to allow the character in this<br />

slot to exhibit the correct behaviour even in fonts<br />

where there is no corresponding glyph.<br />

Else<br />

Slot 32 ‘visiblespace’<br />

Unicode character U+2423, open box.<br />

A visible space glyph ‘ ’.<br />

Fi<br />

If ligaturing > 0 then<br />

Slot 34 ‘quotedblright’<br />

Unicode character U+201D, right double<br />

quotation mark.<br />

Else<br />

Slot 34 ‘quotedbl’<br />

Unicode character U+0022, quotation mark.<br />

Fi<br />

If ligaturing = 2 then<br />

Slot 60 ‘exclamdown’<br />

Unicode character U+00A1, inverted<br />

exclamation mark.<br />

Slot 62 ‘questiondown’<br />

Unicode character U+00BF, inverted question<br />

mark.<br />

Else<br />

Slot 60 ‘less’<br />

Unicode character U+003C, less-than sign.<br />

7<br />

Slot 62 ‘greater’<br />

Unicode character U+003E, greater-than sign.<br />

Fi<br />

If ligaturing > 0 then<br />

Slot 92 ‘quotedblleft’<br />

Unicode character U+201C, left double<br />

quotation mark.<br />

Else<br />

Slot 92 ‘backslash’<br />

Unicode character U+005C, reverse solidus.<br />

Slot 95 ‘underscore’<br />

Unicode character U+005F, low line.<br />

Note that the ligaturing > 0 assignment to slot 95<br />

(dotaccent) is mandatory, not ordinary.<br />

Fi<br />

If ligaturing > 0 then<br />

Slot 123 ‘endash’<br />

Unicode character U+2013, en dash.<br />

Mandatory ligature endash ∗ hyphen → emdash<br />

Slot 124 ‘emdash’<br />

Unicode character U+2014, em dash.<br />

In a monowidth font this character is preferably<br />

given the width of two normal characters.<br />

Else<br />

Slot 123 ‘braceleft’<br />

Unicode character U+007B, left curly bracket.<br />

Slot 124 ‘bar’<br />

Unicode character U+007C, vertical line.<br />

Slot 125 ‘braceright’<br />

Unicode character U+007D, right curly bracket.<br />

Fi


4 Fontdimens<br />

Fontdimen 1 is italicslant<br />

Fontdimen 2 is interword<br />

Fontdimen 3 is stretchword<br />

Fontdimen 4 is shrinkword<br />

Fontdimen 5 is xheight<br />

Fontdimen 6 is quad<br />

Fontdimen 7 is extraspace<br />

5 Coding scheme<br />

If ligaturing = 2 then<br />

Default s(codingscheme) = TEX TEXT<br />

Else if ligaturing = 1 then<br />

Default s(codingscheme) =<br />

TEX TEXT WITHOUT F-LIGATURES<br />

Else<br />

Default s(codingscheme) = TEX TYPEWRITER TEXT<br />

Fi fi<br />

6 Discussion<br />

6.1 <strong>OT1</strong> as math font <strong>encoding</strong><br />

As source for the math requirements on the <strong>OT1</strong> <strong>encoding</strong><br />

was used the standard L ATEX math set-up [4].<br />

Those definitions that make use of <strong>OT1</strong> fonts either<br />

use the operators symbol font/font family 0 or have<br />

math class 7 (\mathalpha).<br />

There are two slots used for math that have different<br />

assignments in different <strong>encoding</strong> variants. Slot<br />

36 (dollar/sterling) depends on the italicizing,<br />

and is used by the commands \mathdollar (called by<br />

\$) and \mathsterling (called by \pounds). This<br />

works because these commands explicitly take the<br />

character from a font with the correct italicizing.<br />

Slot 95 (dotaccent/underline) depends on the<br />

ligaturing and is used by the \dot command. This<br />

command does not work for fonts with ligaturing =<br />

0, and hence e.g. the formula $\mathtt{\dot{x}}$<br />

does not produce a typewriter dotted ‘x’, but rather<br />

8<br />

an ‘x’ and underscore printed on top of each other.<br />

This is thus an error, but due to that it is unlikely<br />

that the error will be missed by the author who encounters<br />

it, the error is probably harmless.<br />

6.2 <strong>OT1</strong> as text font <strong>encoding</strong><br />

The sources used to determine the text requirements<br />

on the <strong>OT1</strong> <strong>encoding</strong> were [1].<br />

It turns out that there are LICR commands which<br />

rely on non-mandatory features of the font. It is<br />

mostly for the ligaturing = 0 fonts that commands<br />

produce incorrect results. The following commands<br />

work with all <strong>OT1</strong> fonts which have ligaturing > 0 but<br />

fail with the others: \L, \l, \., \H, \textemdash,<br />

\textendash, and \textquotedblleft. In addition,<br />

the command \textquotedblright produces a reasonable,<br />

but not exactly correct, result.<br />

Traditionally the \textexclamdown and<br />

\textquestiondown have only worked for fonts<br />

which have ligaturing = 2, but that can be fixed;<br />

see [2].<br />

References<br />

[1] Johannes Braams, David Carlisle, Alan Jeffrey,<br />

Frank Mittelbach, Chris Rowley, and Rainer<br />

Schöpf: ltoutenc.dtx, v 1.91 (2000); the file<br />

ltoutenc.dtx in the L ATEX base distribution.<br />

[2] Lars Hellström: <strong>OT1</strong> def. of \textexclamdown<br />

and \textquestiondown, L ATEX bugs database<br />

entry latex/3368, 2001.<br />

[3] Donald E. Knuth: Computer Modern Typefaces,<br />

volume E of Computers & Typesetting, Addison–<br />

Wesley, 1986, xvi+588 pp.; ISBN 0-201-13446-2.<br />

[4] Frank Mittelbach and Rainer Schöpf: The<br />

fontdef.dtx file, v 2.2x (1999); the file<br />

fontdef.dtx in the L ATEX base distribution.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!