[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Should Lua be more strict about Unicode errors?
- From: Jay Carlson <nop@...>
- Date: Fri, 4 Sep 2015 17:23:58 -0400
On 2015-09-02, at 2:59 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
> I estimate that not more than 1% of people who
> have read the Lua manual have also read RFC3629. Quite a
> few more have read the Wikipedia page,
I’ll still take the INTERNET STANDARD over some Wikipedia page as my appeal to authority.
I suppose I could fix the Wikipedia page. This part needs editing, and/or to be moved to the “Derivatives” section:
===
> Whether an actual application should do this is debatable, as it makes
> it impossible to store invalid UTF-16 (that is, UTF-16 with unpaired
> surrogate halves) in a UTF-8 string.
===
It is impossible to represent invalid UTF-16-like sequences as a UTF-8 sequence. UTF-8 and UTF-16 map the same number of codepoints, so where would you put the extra codes in UTF-8?
If you have requirements for UTF-8-like string handling which require non-standard behavior, please call the derived format something else. “UTF-8” really does mean something.
Jay