Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8

Subject: Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
From: Gregg Reynolds &lt;dev@ ... &gt;
Date: Tue, 10 Jul 2018 15:31:05 -0500

On Tue, Jul 10, 2018, 9:00 AM Dirk Laurie <dirk.laurie@gmail.com> wrote:

2018-07-10 15:30 GMT+02:00 Lorenzo Donati <lorenzodonatibz@tiscali.it>:

> Unicode is great for typesetting (I use regularly LaTeX and it's fun to find
> almost every symbol you may imagine, even ancient German runic scripts!),
> but it sucks (IMHO) for general programming or computer-related stuff. Too
> much mind overhead to use correctly for little gain.

Yes, yes, but — if you will allow me to return to Lua and UTF-8 — there would
be more gain for a programmer if we had (if it is not too late already
for Lua 5.4)
utf8 versions of find, sub, match, gsub, gmatch, reverse. Just those, not asking
for upper/lower, operating only on simple codepoints, no combining characters,
no need for a C library.

Utf8 != Unicode. It's an encoding; you don't get to pick a subset and still claim Unicode support.

"Simple codepoints"? Does Unicode define that? If not, who decides what that means? Zero-width space is pretty simple.

No combining chars? Ok, but that would not be Unicode. Practical result: massive confusion and complaining. You cannot accept Unicode and reject combining chars.

utf8.find ("Hélène",'n') --> 5 5
utf8.sub ("Hélène",5) --> 'ne'
utf8.gsub ("Hélène","[éè]","e") --> 'Helene' 2
utf8.reverse ("Hélène") --> 'enèléH'