Re: Is it possible to add utf-8 lua source file support in lua 5.2?

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Is it possible to add utf-8 lua source file support in lua 5.2?
From: Daniel Silverstone <dsilvers@...>
Date: Tue, 28 Sep 2010 10:49:44 +0100

On Tue, Sep 28, 2010 at 06:38:53AM -0300, Luiz Henrique de Figueiredo wrote:
> > The utf-8 bom, by definition of unicode, is actually a "space" character.
> > Shall we just treat utf-8 bom like a normal space character, instead of
> > strip it off? Is that easier to handle in the lexer?
> 
> In Lua 5.2 you don't even have to patch the lexer: just edit lctype.c
> and say that 0xFF and 0xFE are whitespace. This of course is not the
> perfect solution, because BOM is a 2-byte entity, not a 1-byte one...

Note that the utf-8 representation of the BOM is in fact 0xEF,0xBB,0xBF not
0xFF,0xFE and that those characters are (in iso-8859-1)
lowercase-i-with-diaresis, right-chevron and upside-down-question-mark.  While
uncommon, they're very much not whitespace.

So if Windows Notepad is adding 0xFF, 0xFE then not only is it adding a BOM to
a file encoding which the Unicode standard does not recommend has one; but it's
actually adding the *wrong* marker.

No concessions should be made in Lua for this.  If someone wants to do a
Windows-specific fix for this, they're welcome, but as you say, they should
just patch the Lua core themselves.

All in all, Microsoft should not be encouraged to let this abomination stand.

D.

-- 
Daniel Silverstone                         http://www.digital-scurf.org/
PGP mail accepted and encouraged.            Key Id: 3CCE BABE 206C 3B69

Follow-Ups:
- Re: Is it possible to add utf-8 lua source file support in lua 5.2?, David Kolf
- Re: Is it possible to add utf-8 lua source file support in lua 5.2?, Robert Raschke

References:
- Is it possible to add utf-8 lua source file support in lua 5.2?, Xpol Wan
- Re: Is it possible to add utf-8 lua source file support in lua 5.2?, Pan Shi Zhu
- Re: Is it possible to add utf-8 lua source file support in lua 5.2?, Robert Raschke
- Re: Is it possible to add utf-8 lua source file support in lua 5.2?, J.Jørgen von Bargen
- Re: Is it possible to add utf-8 lua source file support in lua 5.2?, Robert Raschke
- Re: Is it possible to add utf-8 lua source file support in lua 5.2?, Mike Pall
- Re: Is it possible to add utf-8 lua source file support in lua 5.2?, Pan Shi Zhu
- Re: Is it possible to add utf-8 lua source file support in lua 5.2?, Luiz Henrique de Figueiredo

Prev by Date: Re: Is it possible to add utf-8 lua source file support in lua 5.2?
Next by Date: Re: Is it possible to add utf-8 lua source file support in lua 5.2?
Previous by thread: Re: Is it possible to add utf-8 lua source file support in lua 5.2?
Next by thread: Re: Is it possible to add utf-8 lua source file support in lua 5.2?
Index(es):
- Date
- Thread