[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Should Lua be more strict about Unicode errors?
- From: Dirk Laurie <dirk.laurie@...>
- Date: Sat, 5 Sep 2015 15:41:26 +0200
2015-09-04 23:23 GMT+02:00 Jay Carlson <email@example.com>:
> I’ll still take the INTERNET STANDARD over some Wikipedia page
> as my appeal to authority.
Oh, sure. Just like some of the abstruse discussions involving
various C standards that we've had here. Someone once
called me a language lawyer on this list, but I humbly doff my
cap to my superiors.
> If you have requirements for UTF-8-like string handling which
> require non-standard behavior, please call the derived format
> something else. “UTF-8” really does mean something.
That has already been done: the Lua library in question is called
"utf8", not "UTF-8". I'll support the notion that the Lua manual should
be careful in its use of the precise name "UTF-8". It already mostly
is, but one could for example rephrase the description of
`utf8.charpattern`, not because the present phrasing is imprecise,
on the contrary, it very careful, but because that implicitly defines
the term "utf8 string", allowing its use elsewhere.
The pattern (a string, not a function)
"[\0-\x7F\xC2-\xF4][\x80-\xBF]*" (see §6.4.1), which matches exactly
one utf8 byte sequence. If the subject is a valid UTF-8 string, the
pattern matches a UTF-8 byte sequence.
On the other hand, my support for this is only lukewarm, and
if the Lua team thinks its target readership cannot be misled by
the documentation as it stands, I'll be equally happy.