On Tue, May 12, 2015 at 1:27 AM, Gaspard Bucher <> wrote:
local validate = require 'utf8validator'
local orig_load = xml.load
function xml.load(string)
  return orig_load(validate(string))

The "validate" function could take an optional "invalid char handler" function as argument, letting end users decide what to do on invalid characters instead of blowing.

my 2c...
Is there a reasonable default "sanitize" step that could be mechanically applied to common but technically invalid sequences, or perhaps a small core set of transformations to choose from? Like how %q escapes special characters. Maybe a parameter that lets you choose amongst throw, fixup/re-encode, delete offending chars, or a custom handler?

Would this cause all kinds of havoc? I admit shallow knowledge of the subject.

Brigham Toskin