Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
From: "Soni \"They/Them\" L." <fakedme@...>
Date: Tue, 10 Jul 2018 18:15:04 -0300



On 2018-07-10 05:31 PM, Gregg Reynolds wrote:

On Tue, Jul 10, 2018, 9:00 AM Dirk Laurie <dirk.laurie@gmail.com<mailto:dirk.laurie@gmail.com>> wrote:
    2018-07-10 15:30 GMT+02:00 Lorenzo Donati
    <lorenzodonatibz@tiscali.it <mailto:lorenzodonatibz@tiscali.it>>:

    > Unicode is great for typesetting (I use regularly LaTeX and it's
    fun to find
    > almost every symbol you may imagine, even ancient German runic
    scripts!),
    > but it sucks (IMHO) for general programming or computer-related
    stuff. Too
    > much mind overhead to use correctly for little gain.

    Yes, yes, but — if you will allow me to return to Lua and UTF-8 —
    there would
    be more gain for a programmer if we had (if it is not too late already
    for Lua 5.4)
    utf8 versions of find, sub, match, gsub, gmatch, reverse. Just
    those, not asking
    for upper/lower, operating only on simple codepoints, no combining
    characters,
    no need for a C library.
Utf8 != Unicode. It's an encoding; you don't get to pick a subset andstill claim Unicode support.
"Simple codepoints"? Does Unicode define that? If not, who decideswhat that means? Zero-width space is pretty simple.
No combining chars? Ok, but that would not be Unicode. Practicalresult: massive confusion and complaining. You cannot accept Unicodeand reject combining chars.
    utf8.find ("Hélène",'n')  --> 5 5
    utf8.sub ("Hélène",5)   --> 'ne'
    utf8.gsub ("Hélène","[éè]","e")  --> 'Helene' 2
    utf8.reverse ("Hélène")   --> 'enèléH'


https://gist.github.com/SoniEx2/ecd119507f160d9c26e3eabd9e012dc0

Follow-Ups:
- Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Gregg Reynolds

References:
- Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Alysson Cunha
- Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Hugo Musso Gualandi
- Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Alysson Cunha
- Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Axel Kittenberger
- Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Lorenzo Donati
- Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Albert Chan
- Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Sean Conner
- Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Lorenzo Donati
- Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Dirk Laurie
- Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8, Gregg Reynolds

Prev by Date: Re: [BUG]Wrong line number In lua 5.4(work 2)
Next by Date: Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
Previous by thread: Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
Next by thread: Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
Index(es):
- Date
- Thread