lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Monday, July 09, 2012 03:07:03 PM Owen Shepherd wrote:
> http://unicode.org/faq/utf_bom.html#bom5  says otherwise. They just say
> its use is discouraged or invalid in some cases where the encoding is
> already known.

Isn't design-by-committee wonderful?

The most sensible approach to encoding identification has been prefix lines as 
introduced by Emacs and adopted by other editors and languages. Quoting from 
Python PEP #263[1]

    the first or second line must match the regular
    expression "coding[:=]\s*([-\w.]+)". The first group of this
    expression is then interpreted as encoding name.

So a Lua script would start with

    --*- coding: utf-8 -*-

With or without a BOM. (If there is a BOM, the declared encoding is ignored.) 

Of course, as far as Lua is concerned, UTF-8 looks the same as ISO-8859-*. 
Strings are just bytes as has often been discussed on the list. But knowing 
the source encoding allows your application to apply a transformation before 
displaying any strings to the user.

[1] http://www.python.org/dev/peps/pep-0263

-- 
tom <telliamed@whoopdedo.org>