Re: Should Lua be more strict about Unicode errors?

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Should Lua be more strict about Unicode errors?
From: Jay Carlson <nop@...>
Date: Fri, 4 Sep 2015 17:23:58 -0400

On 2015-09-02, at 2:59 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:

> I estimate that not more than 1% of people who
> have read the Lua manual have also read RFC3629. Quite a
> few more have read the Wikipedia page,

I’ll still take the INTERNET STANDARD over some Wikipedia page as my appeal to authority.

I suppose I could fix the Wikipedia page. This part needs editing, and/or to be moved to the “Derivatives” section:

===
> Whether an actual application should do this is debatable, as it makes
> it impossible to store invalid UTF-16 (that is, UTF-16 with unpaired
> surrogate halves) in a UTF-8 string.
===

It is impossible to represent invalid UTF-16-like sequences as a UTF-8 sequence. UTF-8 and UTF-16 map the same number of codepoints, so where would you put the extra codes in UTF-8?

If you have requirements for UTF-8-like string handling which require non-standard behavior, please call the derived format something else. “UTF-8” really does mean something.

Jay

Follow-Ups:
- Re: Should Lua be more strict about Unicode errors?, Coda Highland
- Re: Should Lua be more strict about Unicode errors?, Dirk Laurie

References:
- Re: Should Lua be more strict about Unicode errors?, Jay Carlson
- Re: Should Lua be more strict about Unicode errors?, Dirk Laurie

Prev by Date: Re: [ANN] 'xtable' module updated to 5.3, 64-bit
Next by Date: Re: Should Lua be more strict about Unicode errors?
Previous by thread: Re: Should Lua be more strict about Unicode errors?
Next by thread: Re: Should Lua be more strict about Unicode errors?
Index(es):
- Date
- Thread