lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Monday 20 October 2003 14:20, Roberto Ierusalimschy wrote:
> If the system may use two different representations, the simplest
> solution is to translate to a fixed representation as soon as you read
> something. If you can assume that all relevant utf-8 text can be mapped
> to ISO-8859-1, it is better to use ISO-8859-1 internally. It is easy
> to write a function to translate utf-8 to ISO-8859-1:
>
> function toISO (s)
>   if string.find(s, "[\224-\255]") then error("non-ISO char") end
>   s = string.gsub(s, "([\192-\223])(.)", function (c1, c2)
>         c1 = string.byte(c1) - 192
>         c2 = string.byte(c2) - 128
>         return string.char(c1 * 64 + c2)
>       end)
>   return s
> end

Thanks, this seems in fact to be the easiest way; I like the idea of doing it 
all in Lua without resorting to C or, worse, having to dig for obscure (and 
possibly non-portable) system calls. A small extra complication will be 
user-supplied text files, but I could just add a line at the beginning of the 
file to specify its format (just like email messages or Web pages).

It's a pity there's no way to distinguish between the two types of text files 
by looking at their contents (apart maybe from statistical analysis...).

  Enrico