|
No, keep strings as UTF-8 blobs. Erlang has found that this is the way to keep things efficient, since most data you are dealing with does not need serious per-codepoint analysis. lpeg can parse at the byte level and identify the parts of the string that need more intensive processing. Tony.
Off topic, but I believe erlang stores each character as 32-bit value, according to this:
http://schemecookbook.org/Erlang/StringBasics"To understand why Erlang string handling is less efficient than a language like Perl, you need to know that each character uses 8 bytes of memory. That's right -- 8 bytes, not 8 bits! Erlang stores each character as a 32-bit integer, with a 32-bit pointer for the next item in the list (remember, strings are lists of characters.)"