29.05.2014 Views

The history of luaTEX 2006–2009 / v 0.50 - Pragma ADE

The history of luaTEX 2006–2009 / v 0.50 - Pragma ADE

The history of luaTEX 2006–2009 / v 0.50 - Pragma ADE

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

do<br />

end<br />

local function nextbyte(f)<br />

return f:read(1)<br />

end<br />

function io.bytes(f)<br />

return nextbyte, f<br />

end<br />

f = io.open("somefile.dat")<br />

for b in io.bytes(f) do<br />

do_something(b)<br />

end<br />

f:close()<br />

Of course in practice one will need to integrate this into one <strong>of</strong> the reader callback, but<br />

the principle stays the same. In case you wonder if calling functions for each byte is fast<br />

enough . . . it's more than fast enough for normal purposes, especially if we keep in mind<br />

that other tasks like reading <strong>of</strong>, preparing <strong>of</strong> and dealing with fonts <strong>of</strong> processing token<br />

lists take way more time. You can be sore that when half a second runtime is spent on<br />

reading a le, processing may take minutes. If one wants to sqeeze more performance<br />

out <strong>of</strong> this part, it's always an option to write special libraries for that, but this is beyond<br />

standard LuaTEX. We found out that the speed <strong>of</strong> loading data from les in Lua is mostly<br />

related to the small size <strong>of</strong> Lua's le buffer. Reading data stored in tables is extremely fast,<br />

and even faster when precompiled into bytecode.<br />

tables<br />

When Taco and I were experimenting with the callbacks that intercept tokens and nodes,<br />

we wondered what the impact would be on performance. Although in MkIV we allocate<br />

quite some memory due to font handling, we were pretty sure that handling TEX's internal<br />

lists also could have their impact. Data related to fonts is not always subjected to garbage<br />

collection, simply because it's to be available permanently. List processing on the other<br />

hand involves a lot <strong>of</strong> temporary allocated tables. During a run a real huge amount <strong>of</strong> tokens<br />

passes the machinery. When digested, they become nodes. For testing we normally<br />

use this document (with the name mk.tex) and at the time <strong>of</strong> writing this, it has some 48<br />

pages.<br />

This document is <strong>of</strong> moderately complexity, but not as complex as the documents that<br />

I normally process; they have with lots <strong>of</strong> graphics, layers, structural elements, maybe a<br />

bit <strong>of</strong> xml parsing, etc. Nevertheless, we're talking <strong>of</strong> some 24 million tokens entering the<br />

60 How about performance

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!