Re: Plea for the support of unicode escape sequences

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Plea for the support of unicode escape sequences
From: Marc Balmer <marc@...>
Date: Wed, 29 Jun 2011 19:11:27 +0200

Am 29.06.2011 18:55, schrieb David Kolf:

Edgar Toernig wrote:

I know that Lua's authors try to avoid bloat, but these additional
176 bytes (that's what an implementation of the \u4x/\U8x variant on
x86-32 costs) are IMHO very well spent.


It's not just those 176 bytes. To support UTF-8 properly the string
pattern functions (find, match, ...) would also need to recognize UTF-8
characters as single characters in character sets. And then you would
need to extend all the predefined classes (%a, %c, %g, ...). This
updated pattern matching would break the classic C character handling.

To avoid incompatibilities a second version of the pattern matching
functions would be needed. (Though I guess they could share a lot of
code). Maybe this could be a compile time option.

A half baked solution (just escapes, not patterns) should be avoided in
my opinion, as I guess many users will try something like
string.match(s, "[\u2013\u2026]").


I strongly second this.  Unicode is far more than a few escape sequences...

If you need unicode support, provide the needed functions to Lua throughthe C API, I'd say.

References:
- Plea for the support of unicode escape sequences, Edgar Toernig
- Re: Plea for the support of unicode escape sequences, David Kolf

Prev by Date: Re: Plea for the support of unicode escape sequences
Next by Date: Re: Plea for the support of unicode escape sequences
Previous by thread: Re: Plea for the support of unicode escape sequences
Next by thread: Re: Plea for the support of unicode escape sequences
Index(es):
- Date
- Thread