lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 2015-09-02, at 2:59 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:

> I estimate that not more than 1% of people who
> have read the Lua manual have also read RFC3629. Quite a
> few more have read the Wikipedia page,

I’ll still take the INTERNET STANDARD over some Wikipedia page as my appeal to authority.

I suppose I could fix the Wikipedia page. This part needs editing, and/or to be moved to the “Derivatives” section:

===
> Whether an actual application should do this is debatable, as it makes
> it impossible to store invalid UTF-16 (that is, UTF-16 with unpaired
> surrogate halves) in a UTF-8 string.
===

It is impossible to represent invalid UTF-16-like sequences as a UTF-8 sequence. UTF-8 and UTF-16 map the same number of codepoints, so where would you put the extra codes in UTF-8?

If you have requirements for UTF-8-like string handling which require non-standard behavior, please call the derived format something else. “UTF-8” really does mean something.

Jay