lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Glenn Maynard wrote:
> On Thu, Dec 07, 2006 at 03:44:05PM -0500, Brian Weed wrote:
>> Asko Kauppi wrote:
>> >But there may be some identifier "stamp" that can be used to know a 
>> >file is UTF-8, no?
>> There are two that I know of.  I don't know how "standard" they are.  
>> One is called a BOM Header, which is some binary code in the first 2 
>> bytes of the "text" file.
> 
> Three: 0xEF 0xBB 0xBF.  Don't use that unless you're writing
> Windows-specific stuff and you really need to be compatible with
> other Windows applications that expect it--it's not "binary" any
> more than any other UTF-8 character, but text file encodings do not
> have headers!  (And if you--the reader, not Brian Weed--do use this,
> make it a save-time option and disable it by default if possible.)
> 

I just yesterday broke down and added a UTF-8 BOM (0xEF 0xBB 0xBF)
"handler" to luaL_loadfile() because I foolishly said in my
documentation to save config files (i.e., Lua sources in diguise) as
UTF-8.  The resulting flurry of support requests about errors like
	test-utf8bom.conf:1: `=' expected near `»'
or
	test-utf8bom.conf:1: unexpected symbol near `ï'
because of Notepad being used as the editor made my last few months
reasonably uncomfortable.

All I do now, is upon loading the file, look for those dreaded three
bytes and skip them.

Robby