On 2018-07-10 05:31 PM, Gregg Reynolds wrote:
> On Tue, Jul 10, 2018, 9:00 AM Dirk Laurie <firstname.lastname@example.org
> <mailto:email@example.com>> wrote:
> 2018-07-10 15:30 GMT+02:00 Lorenzo Donati
> <firstname.lastname@example.org <mailto:email@example.com>>:
> > Unicode is great for typesetting (I use regularly LaTeX and it's
> fun to find
> > almost every symbol you may imagine, even ancient German runic
> > but it sucks (IMHO) for general programming or computer-related
> stuff. Too
> > much mind overhead to use correctly for little gain.
> Yes, yes, but — if you will allow me to return to Lua and UTF-8 —
> there would
> be more gain for a programmer if we had (if it is not too late already
> for Lua 5.4)
> utf8 versions of find, sub, match, gsub, gmatch, reverse. Just
> those, not asking
> for upper/lower, operating only on simple codepoints, no combining
> no need for a C library.
> Utf8 != Unicode. It's an encoding; you don't get to pick a subset and
> still claim Unicode support.
> "Simple codepoints"? Does Unicode define that? If not, who decides
> what that means? Zero-width space is pretty simple.
> No combining chars? Ok, but that would not be Unicode. Practical
> result: massive confusion and complaining. You cannot accept Unicode
> and reject combining chars.
> utf8.find ("Hélène",'n') --> 5 5
> utf8.sub ("Hélène",5) --> 'ne'
> utf8.gsub ("Hélène","[éè]","e") --> 'Helene' 2
> utf8.reverse ("Hélène") --> 'enèléH'