[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPeg support for utf-8
- From: Tony Finch <dot@...>
- Date: Mon, 4 Apr 2011 11:20:55 +0100
Chris Babcock <cbabcock@asciiking.com> wrote:
>
> Since LPeG is concerned with semantics rather than presentation, the
> code point is the right unit for captures and counting. Given the Lua
> implementation of numbers as floats, using UCS-4 for the internal
> representation is probably "the Lua thing to do"...
No, keep strings as UTF-8 blobs. Erlang has found that this is the way to
keep things efficient, since most data you are dealing with does not need
serious per-codepoint analysis. lpeg can parse at the byte level and
identify the parts of the string that need more intensive processing.
Tony.
--
f.anthony.n.finch <dot@dotat.at> http://dotat.at/
Humber, Thames, Dover, Wight, Portland, Plymouth, North Biscay: Westerly or
southwesterly 3 or 4, increasing 5 to 7 later. Slight or moderate,
occasionally rough in Plymouth and north Biscay. Mainly fair. Moderate or
good.