[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPeg support for utf-8
- From: Tony Finch <dot@...>
- Date: Fri, 8 Apr 2011 12:54:02 +0100
Dimiter "malkia" Stanev <malkia@gmail.com> wrote:
> > No, keep strings as UTF-8 blobs. Erlang has found that this is the way to
> > keep things efficient, since most data you are dealing with does not need
> > serious per-codepoint analysis. lpeg can parse at the byte level and
> > identify the parts of the string that need more intensive processing.
>
> Off topic, but I believe erlang stores each character as 32-bit value,
> according to this:
>
> http://schemecookbook.org/Erlang/StringBasics
Right, hence efficient string handling for bulk data is done with binaries
rather than the usual string type.
http://www.erlang.org/doc/efficiency_guide/binaryhandling.html
Tony.
--
f.anthony.n.finch <dot@dotat.at> http://dotat.at/
Forties, Cromarty, Forth: West 5 to 7, occasionally gale 8 in Cromarty,
becoming variable 3 or 4 later. Slight or moderate, but rough or very rough in
Forties at first. Fair. Moderate or good, occasionally poor later.