lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On May 11, 2015, at 7:47 AM, Gaspard Bucher <gaspard@teti.ch> wrote:

> xml - very fast xml parser
>     http://github.com/lubyk/xml

Glancing through the source code, I don't see code for rejection of non-UTF-8 sequences when parsing in UTF-8 mode. This is important to some people in Lua, since a lot of UTF-8-related security faults just go away if invalid byte sequences are rejected on input; Prosody is a good example of engineering for this. 

For XML, there's a secondary UTF-8 fault mechanism in references like &#xD800; or &#55296;. 

More generally, XML applications may exhibit bug-like behavior when handed characters not matching the Char production. ( http://www.w3.org/TR/REC-xml/#NT-Char ) An app which just moves input to output unchanged is usually better off blowing up on input--closer to the source of the error--than letting some conforming XML processor, distant in space and time, discover it later.

I have a hobby version of Lua in which my string.* Lua operations blow up when handed invalid UTF-8. This check is not very expensive when handled at the language level. Checks can be memoized, interned strings only need to be checked once, and the concatenation of valid strings is known to be valid. If we are hashing a whole string, it is almost free to check if any high bits are set; if not, the string is ASCII, and therefore valid UTF-8.

Anyway, thank you for the 5.3 support. In an earlier exchange, lhf politely noticed the command "lua" still invokes 5.2 on my machine.... :-)

Jay