lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On 2018-07-10 06:20 PM, Gregg Reynolds wrote:
You point being?

I mean, it's a joke, really, but if I were to actually redesign unicode, I'd throw away all those annoying character tables and encode them as part of the bits.

It would solve all practical problems with unicode. But we aren't gonna have that, so we should instead stick with no unicode support for the time being. At least until they finally decide that unicode was a huge mistake and restart the whole thing.


On Tue, Jul 10, 2018, 4:15 PM Soni "They/Them" L. <fakedme@gmail.com <mailto:fakedme@gmail.com>> wrote:



    On 2018-07-10 05:31 PM, Gregg Reynolds wrote:
    >
    >
    > On Tue, Jul 10, 2018, 9:00 AM Dirk Laurie <dirk.laurie@gmail.com
    <mailto:dirk.laurie@gmail.com>
    > <mailto:dirk.laurie@gmail.com <mailto:dirk.laurie@gmail.com>>>
    wrote:
    >
    >     2018-07-10 15:30 GMT+02:00 Lorenzo Donati
    >     <lorenzodonatibz@tiscali.it
    <mailto:lorenzodonatibz@tiscali.it>
    <mailto:lorenzodonatibz@tiscali.it
    <mailto:lorenzodonatibz@tiscali.it>>>:
    >
    >     > Unicode is great for typesetting (I use regularly LaTeX
    and it's
    >     fun to find
    >     > almost every symbol you may imagine, even ancient German runic
    >     scripts!),
    >     > but it sucks (IMHO) for general programming or
    computer-related
    >     stuff. Too
    >     > much mind overhead to use correctly for little gain.
    >
    >     Yes, yes, but — if you will allow me to return to Lua and
    UTF-8 —
    >     there would
    >     be more gain for a programmer if we had (if it is not too
    late already
    >     for Lua 5.4)
    >     utf8 versions of find, sub, match, gsub, gmatch, reverse. Just
    >     those, not asking
    >     for upper/lower, operating only on simple codepoints, no
    combining
    >     characters,
    >     no need for a C library.
    >
    >
    > Utf8 != Unicode. It's an encoding; you don't get to pick a
    subset and
    > still claim Unicode support.
    >
    > "Simple codepoints"? Does Unicode define that? If not, who decides
    > what that means? Zero-width space is pretty simple.
    >
    > No combining chars? Ok, but that would not be Unicode. Practical
    > result: massive confusion and complaining. You cannot accept
    Unicode
    > and reject combining chars.
    >
    >
    >
    >     utf8.find ("Hélène",'n')  --> 5 5
    >     utf8.sub ("Hélène",5)   --> 'ne'
    >     utf8.gsub ("Hélène","[éè]","e")  --> 'Helene' 2
    >     utf8.reverse ("Hélène")   --> 'enèléH'
    >

    https://gist.github.com/SoniEx2/ecd119507f160d9c26e3eabd9e012dc0