[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: UTF-8 patterns in Lua 5.3
- From: Andrew Starks <andrew.starks@...>
- Date: Sun, 20 Apr 2014 09:30:26 -0500
On Sat, Apr 19, 2014 at 2:36 PM, Hisham <h@hisham.hm> wrote:
> On 19 April 2014 11:03, Dirk Laurie <dirk.laurie@gmail.com> wrote:
>> 2014-04-19 10:20 GMT+02:00 Philipp Janda <siffiejoe@gmx.net>:
>>> Am 19.04.2014 09:47 schröbte Dirk Laurie:
>>>> The proposal allows for customizable character classes. We already
>>>> have that. Nothing (except the vast effort of actually doing it) stops you
>>>> from defining your own locale ...
>>> Do you think that locales were a good idea? We inherited those from C but
>>> there's no reason to make the same mistake again just because C made it
>>> decades ago.
>>
>> Whether they are a good idea or not, they are there, accessible from
>> Lua. And if they are there, somebody will use them.
>>
>> You come from a country (I guess) where people would expect
>> string.upper"über" to come out as "ÜBER". Is that such a very bad idea?
>
> Unfortunately that doesn't work in modern locales, right? At least I
> couldn't get "ÜBER" out of ("über"):upper() here, after trying several
> combinations of values of os.setlocale, $LC_ALL and different terminal
> emulators. I'm sure I could get it to work if I configured my whole
> system (display, encoding, input) to ISO-8859-X, but it's a pain (and
> then other things break).
>
> I think at some point in the future it won't make sense to talk about
> single-byte encodings. (The future doesn't arrive everywhere at the
> same time, of course -- in some places this future has already arrived,
> in others it will take a long time). In this future, three things stop
> making sense in the Lua API:
>
> * string.lower
> * string.upper
> * % character classes in patterns.
>
> As far as I can see these are the only *text* oriented features of the
> string library; the rest of it is an 8-bit clean, locale-agnostic,
> bytestream library. (I say that as a compliment, the fact that this
> list is so small is a testament to the genericity of the string
> library!)
I like this point a lot.
The string library mostly deals with single-byte strings. Strings of
input and output are fundamental to using a language. Can't do much
without some tools to parse what goes in and tools to format what goes
out.
Text is an über :) common application of that facility.
This utf8 discussion is mirroring the math library discussion and it's
doing so for exactly the same reason: neither have much to do with the
language, itself.
And so, the take away that I'm getting from this is that when
application libraries are added to Lua, you set yourself up for a
lifetime of well considered, valid and eventually repetitive and
therefore nauseating discussion on what's missing from it and why it
can't or shouldn't or should be added.
IMHO, of course.
-Andrew
- References:
- UTF-8 patterns in Lua 5.3, Hisham
- Re: UTF-8 patterns in Lua 5.3, Keith Matthews
- Re: UTF-8 patterns in Lua 5.3, Hisham
- Re: UTF-8 patterns in Lua 5.3, Keith Matthews
- Re: UTF-8 patterns in Lua 5.3, Hisham
- Re: UTF-8 patterns in Lua 5.3, Dirk Laurie
- Re: UTF-8 patterns in Lua 5.3, Philipp Janda
- Re: UTF-8 patterns in Lua 5.3, Dirk Laurie
- Re: UTF-8 patterns in Lua 5.3, Philipp Janda
- Re: UTF-8 patterns in Lua 5.3, Dirk Laurie
- Re: UTF-8 patterns in Lua 5.3, Hisham