Re: Plea for the support of unicode escape sequences

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Plea for the support of unicode escape sequences
From: Edgar Toernig <froese@...>
Date: Wed, 29 Jun 2011 23:06:58 +0200

David Kolf wrote:
> 
> I wonder how compact you can store the character classes for the 65k
> codepoints in the BMP and the lowercase/uppercase pairs (for
> string.lower, string.upper).

There's already a UTF-8 version of the string library called slnunicode.
Iirc, it uses the tables from Tcl which are about 13k.  The whole library
is about 32k.

With it and the unicode escape sequences you could write i.e.

    unicode.utf8.gsub(s, "[\uf000-\uffff]", "?")

With Lua 5.1 that would be

    unicode.utf8.gsub(s, "[\239\128\128-\239\191\191]", "?")

Now, which one is cleaner?

> Maybe that can be compressed far enough to be included in official Lua
> (5.3?). That would be great.

I think that's not really necessary.  You need both versions anyway, the
simple byte-oriented variant to parse and match arbitrary bytes sequences
(incl. binary data) and the UTF-8 version for unicode character strings.
An external library would be good enough.  But you want the escape sequences
to make the external library a pleasance to use (s.a.).

Ciao, ET.

Follow-Ups:
- Re: Plea for the support of unicode escape sequences, Petite Abeille

References:
- Plea for the support of unicode escape sequences, Edgar Toernig
- Re: Plea for the support of unicode escape sequences, David Kolf
- Re: Plea for the support of unicode escape sequences, Mike Pall
- Re: Plea for the support of unicode escape sequences, Petite Abeille
- Re: Plea for the support of unicode escape sequences, David Kolf

Prev by Date: uservalue from Lua
Next by Date: Re: Plea for the support of unicode escape sequences
Previous by thread: Re: Plea for the support of unicode escape sequences
Next by thread: Re: Plea for the support of unicode escape sequences
Index(es):
- Date
- Thread