RE: Should Lua be more strict about Unicode errors?

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: RE: Should Lua be more strict about Unicode errors?
From: Richter, Jörg <Joerg.Richter@...>
Date: Mon, 31 Aug 2015 07:20:54 +0000

> > For example, "\u{d800}" is valid in Lua 5.3, but not in LuaJIT.
> >
> > Should Lua be more strict about Unicode errors?
> >
> > [1] https://github.com/LuaJIT/LuaJIT/issues/72
> 
>   It depends.  I recently read (although I can't seem to find it now) that
> one way to preserve invalid UTF-8 sequences is to encode the invalid bytes
> in the D880 to D8FF range, and to reserve D800 as an alternative NUL byte
> sequence (another NUL byte sequence is the literal byte sequence 0xC0 0x80
> [1]).  By doing this, you can transform the "fixed" UTF-8 sequence back
> into the original byte stream.

I think you mean "UTF-8B".  Quoting [1] 

"utf-8b is a mapping from byte streams to unicode codepoint streams that provides 
an exceptionally clean handling of garbage (i.e., non-utf-8) bytes (i.e., bytes 
that are not part of a utf-8 encoding) in the input stream. They are mapped to 
256 different, guaranteed undefined, unicode codepoints."

- Jörg

[1] http://hyperreal.org/~est/freeware/

References:
- Should Lua be more strict about Unicode errors?, Soni L.
- Re: Should Lua be more strict about Unicode errors?, Sean Conner

Prev by Date: Re: Think different
Next by Date: [ANN] 'xtable' module updated to 5.3, 64-bit
Previous by thread: Re: Should Lua be more strict about Unicode errors?
Next by thread: [ANN] LuaCov 0.8 - coverage analysis tool
Index(es):
- Date
- Thread