lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 7-Dec-06, at 7:25 PM, David Given wrote:

It will also fail on any encoding that uses low-bit characters as part of an extended sequence. If there's an encoding that uses <high> <low1> <low2> as part of a single character, then <low1> and <low2> may potentially confuse the parser. This scheme would only work on encodings where *all* bytes of an extended character have the top bit set. I believe that includes Shift-JIS as
well as UTF-8.

Actually, both Shift-JIS and Big5 use second bytes in the range 0x40-0xFE (or so, there are a few illegal codes, iirc), and so does GB 18030-2000