Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

On Tue, Jul 10, 2018, 4:28 PM Soni "They/Them" L. <fakedme@gmail.com> wrote:

On 2018-07-10 06:20 PM, Gregg Reynolds wrote:
> You point being?

I mean, it's a joke, really, but if I were to actually redesign unicode,
I'd throw away all those annoying character tables and encode them as
part of the bits.

It would solve all practical problems with unicode. But we aren't gonna
have that, so we should instead stick with no unicode support for the
time being. At least until they finally decide that unicode was a huge
mistake and restart the whole thing.

Well, I'll give the Unicode folks credit for playing a bad hand about as good as could be expected. The rtl stuff is an utter monstrosity, but they did not really have the option of fixing it, they had to be compatible with stuff that was already broken (e.g. numbers in ltr scripts).

>
> On Tue, Jul 10, 2018, 4:15 PM Soni "They/Them" L. <fakedme@gmail.com
> <mailto:fakedme@gmail.com>> wrote:
>
>
>
> On 2018-07-10 05:31 PM, Gregg Reynolds wrote:
> >
> >
> > On Tue, Jul 10, 2018, 9:00 AM Dirk Laurie <dirk.laurie@gmail.com
> <mailto:dirk.laurie@gmail.com>
> > <mailto:dirk.laurie@gmail.com <mailto:dirk.laurie@gmail.com>>>
> wrote:
> >
> > 2018-07-10 15:30 GMT+02:00 Lorenzo Donati
> > <lorenzodonatibz@tiscali.it
> <mailto:lorenzodonatibz@tiscali.it>
> <mailto:lorenzodonatibz@tiscali.it
> <mailto:lorenzodonatibz@tiscali.it>>>:
> >
> > > Unicode is great for typesetting (I use regularly LaTeX
> and it's
> > fun to find
> > > almost every symbol you may imagine, even ancient German runic
> > scripts!),
> > > but it sucks (IMHO) for general programming or
> computer-related
> > stuff. Too
> > > much mind overhead to use correctly for little gain.
> >
> > Yes, yes, but — if you will allow me to return to Lua and
> UTF-8 —
> > there would
> > be more gain for a programmer if we had (if it is not too
> late already
> > for Lua 5.4)
> > utf8 versions of find, sub, match, gsub, gmatch, reverse. Just
> > those, not asking
> > for upper/lower, operating only on simple codepoints, no
> combining
> > characters,
> > no need for a C library.
> >
> >
> > Utf8 != Unicode. It's an encoding; you don't get to pick a
> subset and
> > still claim Unicode support.
> >
> > "Simple codepoints"? Does Unicode define that? If not, who decides
> > what that means? Zero-width space is pretty simple.
> >
> > No combining chars? Ok, but that would not be Unicode. Practical
> > result: massive confusion and complaining. You cannot accept
> Unicode
> > and reject combining chars.
> >
> >
> >
> > utf8.find ("Hélène",'n') --> 5 5
> > utf8.sub ("Hélène",5) --> 'ne'
> > utf8.gsub ("Hélène","[éè]","e") --> 'Helene' 2
> > utf8.reverse ("Hélène") --> 'enèléH'
> >
>
> https://gist.github.com/SoniEx2/ecd119507f160d9c26e3eabd9e012dc0
>