Re: LPeg support for utf-8

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: LPeg support for utf-8
From: "Dimiter \"malkia\" Stanev" <malkia@...>
Date: Thu, 07 Apr 2011 15:57:11 -0700

No, keep strings as UTF-8 blobs. Erlang has found that this is the way to
keep things efficient, since most data you are dealing with does not need
serious per-codepoint analysis. lpeg can parse at the byte level and
identify the parts of the string that need more intensive processing.

Tony.

Off topic, but I believe erlang stores each character as 32-bit value,according to this:


http://schemecookbook.org/Erlang/StringBasics

"To understand why Erlang string handling is less efficient than alanguage like Perl, you need to know that each character uses 8 bytes ofmemory. That's right -- 8 bytes, not 8 bits! Erlang stores eachcharacter as a 32-bit integer, with a 32-bit pointer for the next itemin the list (remember, strings are lists of characters.)"

Follow-Ups:
- Re: LPeg support for utf-8, Tony Finch

References:
- LPeg support for utf-8, Roberto Ierusalimschy
- Re: LPeg support for utf-8, Tony Finch
- Re: LPeg support for utf-8, Roberto Ierusalimschy
- Re: LPeg support for utf-8, Chris Babcock
- Re: LPeg support for utf-8, Tony Finch

Prev by Date: Re: LPeg support for utf-8
Next by Date: [ANN] Luatrace - A Lua tracer and profiler
Previous by thread: Re: LPeg support for utf-8
Next by thread: Re: LPeg support for utf-8
Index(es):
- Date
- Thread