lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Tim Hill once stated:
> 
> However, no “simple” feature comes without hidden costs. The back-quote
> syntax appears to isolate source code from character coding issues, but
> does it? One approach is to always assume UTF-8 encoding, which is
> consistent across platforms, but may differ from the local encoding. This
> means that `a` ~= string.byte(“a”) on (say) EBCDIC platforms. Another
> approach is to use the local platform encoding, but this also doesn’t work
> since the locale at compile time may differ from the locale at run-time
> (even if the code is run directly after compile).

  It can even change at runtime!  

  One project I've been working on [1] involves parsing email [2] which
involves a lot of character set manipulations (not dealt with in [2]).  The
collection of emails I pull from uses at least a dozen, if not more,
character sets.  

  -spc

[1]	Long term, when I get around to it, not really important, but a fun
	diversion.  That type of project.

[2]	Obligatory email header parsing code:

	https://github.com/spc476/LPeg-Parsers/blob/master/email.lua