lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


>>>>> "Dirk" == Dirk Laurie <dirk.laurie@gmail.com> writes:

 Dirk> Lua in no way even comes close to validating against the current
 Dirk> UTF-8 standard. We've been through this before. Marc Balmer in
 Dirk> particular has been quite trenchant on this point.

Other than the fact that it fails to reject encoded surrogates, what
invalid sequence does the code in lua 5.3.5 accept?

 Dirk> All that Lua does is to verify that a string satisfies the basic
 Dirk> UTF-8 encoding: ASCII or a starting byte whose introductory
 Dirk> string of 1's says how many bytes in total are being encoded,
 Dirk> followed by the right number of 10... bytes.

That's ... not what the 5.3.5 utf8_decode does. Did you read it? Test
it?

-- 
Andrew.