On Tue, May 12, 2015 at 1:27 AM, Gaspard Bucher
<gaspard@teti.ch> wrote:
local validate = require 'utf8validator'
local orig_load = xml.load
function xml.load(string)
return orig_load(validate(string))
end
The "validate" function could take an optional "invalid char handler" function as argument, letting end users decide what to do on invalid characters instead of blowing.
my 2c...
Is there a reasonable default "sanitize" step that could be mechanically applied to common but technically invalid sequences, or perhaps a small core set of transformations to choose from? Like how %q escapes special characters. Maybe a parameter that lets you choose amongst throw, fixup/re-encode, delete offending chars, or a custom handler?
Would this cause all kinds of havoc? I admit shallow knowledge of the subject.