The history of luaTEX 2006–2009 / v 0.50 - Pragma ADE

More documents

Recommendations

Info

collapser was written. In the (pre)loaded Lua tables we have stored information about what combination of characters need to be combined into another character. So, an a followed by an ` becomes à and an e followed by " becomes ë. This process is repeated till no more sequences combine. After a few alternatives we arrived at a solution that is acceptably fast: mere milliseconds per average page. Experiments demonstrated that we can not gain much by implementing this in pure C, but we did gain some speed by using a dedicated loop--over--utf--string function. A second utf related issue is utf-16. This coding scheme comes in two endian variants. We wanted to do the conversion in Lua, but decided to play a bit with a multi--byte le read function. After some experiments we quickly learned that hard coding such methods in TEX was doomed to be complex, and the whole idea behind LuaTEX is to make things less complex. The complexity has to do with the fact that we need some control over the different linebreak triggers, that is, (combinations of) character 10 and/or 13. In the end, the multi--byte readers were removed from the code and we ended up with a pure Lua solution, which could be sped up by using a multi--byte loop--over--string function. Instead of hard coding solutions in LuaTEX a couple of fast loop--over--string functions were added to the Lua string function repertoire and the solutions were coded in Lua. We did extensive timing with huge utf-16 encoded les, and are condent that fast solutions can be found. Keep in mind that reading les is never the bottleneck anyway. The only drawback of an efcient utf-16 reader is that the le is loaded into memory, but this is hardly a problem. Concerning arbitrary input encodings, we can be brief. It's rather easy to loop over a string and replace characters in the 0–255 range by their utf counterparts. All one needs is to maintain conversion tables and TEX macro packages have always done that. Yet another (more obscure) kind of remapping concerns those special TEX characters. If we use a traditional TEX auxiliary le, then we must make sure that for instance percent signs, hashes, dollars and other characters are handled right. If we set the catcode of the percent sign to ‘letter’, then we get into trouble when such a percent sign ends up in the table of contents and is read in under a different catcode regime (and becomes for instance a comment symbol). One way to deal with such situations is to temporarily move the problematic characters into a private Unicode area and deal with them accordingly. In that case they no longer can interfere. Where do we handle such conversions? There are two places where we can hook converters into the input. 1. each time when we read a line from a le, i.e. we can hook conversion code into the read callbacks 2. using the special process_input_buffer callback which is called whenever TEX needs a new line of input 28 Going utf
Because we can overload the standard le open and read functions, we can easily hook the utf collapse function into the readers. The same is true for the utf-16 handler. In ConTEXt, for performance reasons we load such les into memory, which means that we also need to provide a special reader to TEX. When handling utf-16, we don't need to combine characters so that stage is skipped then. So, to summarize this, here is what we do in ConTEXt. Keep in mind that we overload the standard input methods and therefore have complete control over how LuaTEX locates and opens les. 1. When we have a utf le, we will read from that le line by line, and combine characters when collapsing is enabled. 2. When LuaTEX wants to open a le, we look into the rst bytes to see if it is a utf-16 le, in either big or little endian format. When this is the case, we load the le into memory, convert the data to utf-8, identify lines, and provide a reader that will give back the le linewise. 3. When we have been told to recode the input (i.e. when we have enabled an input regime) we use the normal line--by--line reader and convert those lines on the y into valid utf. No collapsing is needed. Because we conduct our experiments in ConTEXt MkIV the code that we provide may look a bit messy and more complex than the previous description may suggest. But keep in mind that a mature macro package needs to adapt to what users are accustomed to. The fact that LuaTEX moved on to utf input does not mean that all the tools that users use and the les that they have produced over decades automagically convert as well. Because we are now living in a utf world, we need to keep that in mind when we do tricky things with sequences of characters, for instance in processing verbatim. When we implement verbatim in pure TEX we can do as before, but when we let Lua kick in, we need to use string methods that are utf-aware. In addition to the linked-in Unicode library, there are dedicated iterator functions added to the string namespace; think of: for c in string.utfcharacters(str) do something_with(c) end Occasionally we need to output raw 8-bit code, for instance to dvi or pdf backends (specials and literals). Of course we could have cooked up a truckload of conversion functions for this, but during one of our travels to a TEX conference, we came up with the following trick. We reserve the top 256 values of the Unicode range, starting at hexadecimal value 0x110000, for byte output. When writing to an output stream, that offset will be subtracted. So, 0x1100A9 is written out as hexadecimal byte value A9, which is the decimal value 169, which in the Latin 1 encoding is the slot for the copyright sign. Going utf 29
Page 1: MKII CONTEXT CONTEXT The history of
Page 4 and 5: XXI The luacation of TEX and ConTEX
Page 6 and 7: 4 Introduction
Page 8 and 9: Keep in mind that the functionality
Page 10 and 11: tricky part of this conversion is t
Page 12 and 13: prologues := 1 ; mpprocset := 1 ; W
Page 14 and 15: code. However, the signicantly larg
Page 16 and 17: \input zip:///somezipfile.zip?name=
Page 18 and 19: ecause we could move all kind of po
Page 20 and 21: If an acrhive is not a real TEX tre
Page 22 and 23: luatools --generate This command wi
Page 24 and 25: I know that it sounds a bit messy,
Page 26 and 27: By combining Lua with TEX, we can d
Page 28 and 29: 26 An example: CalcMath
Page 32 and 33: 30 Going utf
Page 34 and 35: In practice we prefer a more abstra
Page 36 and 37: of making that table accessible for
Page 38 and 39: Therefore, it will be no surprise t
Page 40 and 41: {"special","pdf: 1 0 0 rg"}, {"font
Page 42 and 43: some neat feature of \TeX, once you
Page 44 and 45: OPENTYPE 144 17.571.732 ENC 268 782
Page 46 and 47: The font denition callback intercep
Page 48 and 49: When processing a not so large le b
Page 50 and 51: letter 105 i spacer 32 letter 116 t
Page 52 and 53: egister 2 3190 dimen other_char 50
Page 54 and 55: This example uses an active charact
Page 56 and 57: function tex.printlist(data) callba
Page 58 and 59: 56 Token speak
Page 60 and 61: Interesting is that upto now I didn
Page 62 and 63: do end local function nextbyte(f) r
Page 64 and 65: \ctxlua{collectgarbage("setstepmul"
Page 66 and 67: Collecting tokens can take place in
Page 68 and 69: where TEX has to convert to and fro
Page 70 and 71: When experimenting with the new imp
Page 72 and 73: It will be clear that here we colle
Page 74 and 75: 72 Nodes and attributes t={ attr={
Page 76 and 77: Let's end with an observation. Attr
Page 78 and 79: end end tex.sprint("\\setbox0=\\hbo
Page 80 and 81:
78 Dirty tricks
Page 82 and 83:
Those who read the previous chapter
Page 84 and 85:
In a similar way support for xml wi
Page 86 and 87:
Undoubtely this will have a huge im
Page 88 and 89:
86 Going beta
Page 90 and 91:
We like testing LuaTEX's open type
Page 92 and 93:
k a el--rie am, ld p n th re, l ak
Page 94 and 95:
his or her font). As said, it also
Page 96 and 97:
["tag"]="calt", } }, ["name"]="ks_l
Page 98 and 99:
When testing the implementation we
Page 100 and 101:
Mr. rmn p Of course we have to test
Page 102 and 103:
ه‎ مَ‏ ه‎ لِ‏ دْ‏
Page 104 and 105:
لاۤئِهٰٖا لاۤئِه
Page 106 and 107:
َ غْتَرِفٌُم عْتَر
Page 108 and 109:
وَْر وةَ،‏ ٰ ي كَ‏
Page 110 and 111:
white black t-green a=1.000 t=0.500
Page 112 and 113:
Palets work with names: \definepale
Page 114 and 115:
• support mechanism are under con
Page 116 and 117:
So, effectively we set a box, and e
Page 118 and 119:
116 Colors redone
Page 120 and 121:
• Spaces at the end of the line (
Page 122 and 123:
兡也包因沘氓侷柵苗
Page 124 and 125:
〉 + 々 = 〉々〉々〉々
Page 126 and 127:
124 Chinese, Japanese and Korean, a
Page 128 and 129:
for k, v in pairs(_G) do print(k, v
Page 130 and 131:
} [4] = 200 I mistakingly assumed t
Page 132 and 133:
130 Optimization
Page 134 and 135:
xetex metapost We can also
Page 136 and 137:
\startluacode xml.sprint(xml.first(
Page 138 and 139:
a[position()=5] maybe a[first()] ma
Page 140 and 141:
Now we have: okay okay okay Repla
Page 142 and 143:
\startxmlsetups xml:mysetups \xmlse
Page 144 and 145:
c1d1 c2d2 d3 d4 d5 Here com
Page 146 and 147:
c2 d3 b/c[1] c1 c2 a/c[1] d3 a/c[-1
Page 148 and 149:
The code in traditional TEX that de
Page 150 and 151:
and this is where MkIV hooks in ist
Page 152 and 153:
con-text == [c][o](pre=n-,post=,rep
Page 154 and 155:
luastate_bytes min:37009927, max:87
Page 156 and 157:
dyn_used min:478158, max:554800, pa
Page 158 and 159:
str_ptr min:2131299, max:2136226, p
Page 160 and 161:
if_stack min:0, max:13, pages:326 k
Page 162 and 163:
pdf_refximage min:0, max:7, pages:3
Page 164 and 165:
162 Collecting garbage
Page 166 and 167:
\definefontfeature[coward] [kern=ye
Page 168 and 169:
We do our tests of a variant of Con
Page 170 and 171:
--make or --ini --run or --fmt= --l
Page 172 and 173:
mtxrun --script font --list --patte
Page 174 and 175:
Opening up TEX started with rewriti
Page 176 and 177:
One can question if a properly work
Page 178 and 179:
ound to fonts (instead of being com
Page 180 and 181:
• TEX primitives Access to and co
Page 182 and 183:
• Open an instance using mplib.ne
Page 184 and 185:
is dened in the les whose names sta
Page 186 and 187:
}, }, ["pen"]={ { ["left_x"]=2.4548
Page 188 and 189:
local tx, ty = p.x_coord, p.y_coord
Page 190 and 191:
\startreusableMPgraphic{test} fill
Page 192 and 193:
vardef textext(expr str) = image (
Page 194 and 195:
local factor = 65536*(7200/7227) fu
Page 196 and 197:
end return t This is a typical exam
Page 198 and 199:
The user interface is downward comp
Page 200 and 201:
key concept. There are few accessib
Page 202 and 203:
\directlua 0 { for i=1,10 do tex.pr
Page 204 and 205:
} end And still it's not okay, sinc
Page 206 and 207:
eason why such code normally is not
Page 208 and 209:
Of course we can discuss until eter
Page 210 and 211:
some text some more When using Co
Page 212 and 213:
and the document itself is processe
Page 214 and 215:
setups and functions are assigned n
Page 216:
using some kind of api and history
Page 226 and 227:
In LuaTEX with ConTEXt MkIV support
Page 228 and 229:
• Designers (or programmers) may
Page 230 and 231:
228 OpenType: too open?
Page 232 and 233:
cannot be grabbed and they make the
Page 234 and 235:
232 It works!
Page 236 and 237:
when writing the result to le, TEX
Page 238 and 239:
236 Virtual Reality
Page 240 and 241:
\start \textdir TLT one \bidilro tw
Page 242 and 243:
240 Getting lost
Page 244 and 245:
using a few arabic fonts, some chin
Page 246 and 247:
The core data structure that we nee
Page 248 and 249:
table can contain information about
Page 250 and 251:
The ConTEXt cross reference mechani
Page 252 and 253:
of them. Each lookup has a detailed
Page 254 and 255:
otf.replacements otf.sequences otf.
Page 256 and 257:
features: analyze=yes, calt=yes, cc
Page 258 and 259:
feature mark, lookup ml_arab_l_16_s
Page 260 and 261:
للِ‏ ه‎ ِ feature init, lo
Page 262 and 263:
features: analyze=yes, calt=yes, cc
Page 264 and 265:
esult: ۝ 1: [+TRT] U+6DD:۝ [-TRT]
Page 266 and 267:
2: [+TRT] U+F01DD:۝ [-TRT] ۝ The
Page 268 and 269:
Pro. Dr. Donld E. Knuth feature cal
Page 270 and 271:
268 Tracking
Page 272 and 273:
The order in which this happens now
Page 274 and 275:
normalizers characters words fonts
Page 276 and 277:
local function hyphenation(head,tai
Page 278 and 279:
276 The order of things
Page 280 and 281:
of LuaTEX Taco and I worked in para
Page 282 and 283:
Next we see (scaled) Latin Modern:
Page 284 and 285:
As you can see, it is possible to a
Page 286 and 287:
next: U+FF008 { => U+FF06E { => U+F
Page 288 and 289:
end } commands = { { "slot", 1, bas
Page 290 and 291:
(a,b) = (1.20,3.40) So we don't nee
Page 292 and 293:
} description = "SOLIDUS", directio
Page 294 and 295:
⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩
Page 296 and 297:
840 italic 60 top_accent (0,1020) f
Page 298 and 299:
296 Unicode math
Page 300 and 301:
tex.print("access") else tex.print(
Page 302 and 303:
300 User code
Page 304 and 305:
namespace with code not coming from
Page 306 and 307:
• The code denes a few global tab
Page 308 and 309:
ut this now refers to an index in a
Page 310 and 311:
On the agenda is xing some ‘hyphe
Page 312 and 313:
We also use the opportunity to slow
Page 314 and 315:
Math. Because the TEX Gyre math pro
Page 316 and 317:
Tracing on Taco's machine has shown
Page 318 and 319:
node list callback tasks - 4 unique
Page 320 and 321:
pdftex 1.28 (23) 2.07 (144) 6.96 (2
Page 322 and 323:
July 19, 2009 - The number of files
Page 324 and 325:
July 19, 2009 - The relative number
show all

The history of luaTEX 2006–2009 / v 0.50 - Pragma ADE

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?