lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 2015-12-09 23:58, Coda Highland wrote:

utf8.len() will return false and the position of the first invalid
byte for an invalid UTF-8 string.

Indeed, however my function's purpose is not testing if a string is valid but the following flow:

[unknown string] => [black box] => [valid string].

in one simple step. This comes from an Unicode's recommendation. After that I know that there are no 4/6-byte backslashes or quotes for a SQLinj and other fancy pitfalls.

Today, non-shortest forms are very dangerous - Lua's utf8_decode is susceptible to this (there is no need to correct this as long as a string is valid). Conciseness of UTF-8 allows to treat strings as plain ASCII ones - it is frequently used and can be very danger.

The first thing to do with an unknown string (just after its length is determined) is to validate it. After you have treated a string by my utf8.validate, you can apply less secure but very efficient functions (like above utf8_decode, for example).

-- best regards

Cezary H. Noweta