lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Tue, Sep 2, 2014 at 6:02 PM, Tim Hill <drtimhill@gmail.com> wrote:
>
> On Sep 2, 2014, at 1:02 PM, Coda Highland <chighland@gmail.com> wrote:
>
>> The XML spec says that a UTF-8 BOM is definitely SUPPOSED to be legal.
>> If your XML parser can't handle it, then either the device is
>> generating the BOM incorrectly (it should be 0xEF 0xBB 0xBF), or
>> you're munging it somewhere between receiving it from the socket and
>> writing it to a file (possibly implicitly, for example if something
>> you're using is trying to be too clever and doing a charset conversion
>> when it shouldn't), or your XML parser is in violation of the spec. I
>> suspect the first one is the most likely.
>>
>> /s/ Adam
>>
>
> Technically a BOM is never legal in UTF-8, XML spec notwithstanding.
>
> —Tim

No, according to the Unicode Consortium, it's LEGAL -- just not
required or recommended. It's a SHOULD not, not a MUST not.

The XML spec is taking a neutral ground on what was once a big flame
war, permitting it but not promoting it. Microsoft, on the other hand,
actively promotes the use of the BOM in UTF-8, and Notepad will insert
one if you save a file as UTF-8, and depends on the presence of the
BOM to identify files encoded in UTF-8.

/s/ Adam