lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On May 11, 2015, at 2:34 PM, Tim Hill <drtimhill@gmail.com> wrote:

> 
>> On May 11, 2015, at 10:53 AM, Jay Carlson <nop@nop.com> wrote:
>> 
>> On May 11, 2015, at 7:47 AM, Gaspard Bucher <gaspard@teti.ch> wrote:
>> 
>>> xml - very fast xml parser
>>>   http://github.com/lubyk/xml
>> 
>> Glancing through the source code, I don't see code for rejection of non-UTF-8 sequences when parsing in UTF-8 mode. This is important to some people in Lua, since a lot of UTF-8-related security faults just go away if invalid byte sequences are rejected on input; Prosody is a good example of engineering for this. 
>> 
> 
> Which would you reject?
> — UTF-8 has several non-canonical ways to encode a value (e.g. by using more bytes than needed). I’ve seen (bad) encoders that emit these. Reject/accept?
> — UTF-8 is sometimes used to encode UTF-16 values (such as BOM), some of which are now accepted. Reject/accept?
> — UTF-8 can encode high/low UTF-16 surrogate pairs, which should be invalid but could be converted to a codepoint. Reject/accept?
> .. and so on.

http://tools.ietf.org/html/rfc3629 is one of a relatively few documents labeled full "INTERNET STANDARD" at the top.

Jay