[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: luaL_loadfile doesn't like the UTF-8 BOM
- From: TNHarris <telliamed@...>
- Date: Mon, 9 Jul 2012 16:00:52 -0400
On Monday, July 09, 2012 03:07:03 PM Owen Shepherd wrote:
> http://unicode.org/faq/utf_bom.html#bom5 says otherwise. They just say
> its use is discouraged or invalid in some cases where the encoding is
> already known.
Isn't design-by-committee wonderful?
The most sensible approach to encoding identification has been prefix lines as
introduced by Emacs and adopted by other editors and languages. Quoting from
Python PEP #263[1]
the first or second line must match the regular
expression "coding[:=]\s*([-\w.]+)". The first group of this
expression is then interpreted as encoding name.
So a Lua script would start with
--*- coding: utf-8 -*-
With or without a BOM. (If there is a BOM, the declared encoding is ignored.)
Of course, as far as Lua is concerned, UTF-8 looks the same as ISO-8859-*.
Strings are just bytes as has often been discussed on the list. But knowing
the source encoding allows your application to apply a transformation before
displaying any strings to the user.
[1] http://www.python.org/dev/peps/pep-0263
--
tom <telliamed@whoopdedo.org>