On Tue, May 12, 2015 at 1:27 AM, Gaspard Bucher <firstname.lastname@example.org>
local validate = require 'utf8validator'
local orig_load = xml.load
The "validate" function could take an optional "invalid char handler" function as argument, letting end users decide what to do on invalid characters instead of blowing.
Is there a reasonable default "sanitize" step that could be mechanically applied to common but technically invalid sequences, or perhaps a small core set of transformations to choose from? Like how %q escapes special characters. Maybe a parameter that lets you choose amongst throw, fixup/re-encode, delete offending chars, or a custom handler?
Would this cause all kinds of havoc? I admit shallow knowledge of the subject.