lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Chris Babcock <cbabcock@asciiking.com> wrote:
>
> Since LPeG is concerned with semantics rather than presentation, the
> code point is the right unit for captures and counting. Given the Lua
> implementation of numbers as floats, using UCS-4 for the internal
> representation is probably "the Lua thing to do"...

No, keep strings as UTF-8 blobs. Erlang has found that this is the way to
keep things efficient, since most data you are dealing with does not need
serious per-codepoint analysis. lpeg can parse at the byte level and
identify the parts of the string that need more intensive processing.

Tony.
-- 
f.anthony.n.finch  <dot@dotat.at>  http://dotat.at/
Humber, Thames, Dover, Wight, Portland, Plymouth, North Biscay: Westerly or
southwesterly 3 or 4, increasing 5 to 7 later. Slight or moderate,
occasionally rough in Plymouth and north Biscay. Mainly fair. Moderate or
good.