|
On 2019-10-03 6:05 p.m., Roberto Ierusalimschy wrote:
> the only concern I have is over existing usage of e.g. > utf8.codes(s:gsub(...)). it would probably be beneficial to make utf8.codes > accept a start index before the lax switch, or otherwise enforce that the > lax switch is not a number. (the start index is more appealing imo.) Accepting a start index would not help in that case, would it? -- Roberto
While it would produce wrong results, that would probably be better than producing unsafe results. Consider a sequence of gsubs that remove bad sequences (and yeah you aren't supposed to do it like this but ppl do things like this all the time - for proper security you should always operate on decoded data where all arguments about overlong encodings and whatever being bad for security can be thrown out the window but that doesn't stop ppl doing it anyway and I could rant about this all day lol).
Anyway, I digress. Consider a sequence of gsubs that remove bad sequences. And then they switch to the new version. Now some things that were invalid are suddenly valid because the 2nd return value from gsub is being used as true and enabling unsafe mode, so some things they're getting rid of are suddenly going through.
In this case, I think it'd be better to just crash or produce broken but safe results than let invalid UTF-8 through. but maybe that's just me.