lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Jun 16, 2013, at 7:30 AM, David Heiko Kolf wrote:

> The BOM in UTF-8 is mainly annoying for plain ASCII applications where
> UTF-8 should be transparent in strings.  But as far as I remember it is
> not invalid UTF-8 (though its only use is to show that text is indeed
> UTF-8).  An Unicode-aware application can just ignore it.

Yeah, but it should be dropped to avoid ZWNBSPs just randomly littering the interior of concatenated texts, complicating things like search. It's harmless for applications which understand semantics above the codepoint level but I'm proceeding with the assumption Lua will not have those.

I think there may be some advantages to declaring U+FEFF to be not-valid even though it technically is; it has no business being in the interior of any recently generated text. I'll go see if my codebase vomits at the idea....

http://www.unicode.org/faq/utf_bom.html#bom6

Jay