lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

> I'd like to write an application that operates on text, including string 
> containing accented letters such as "è" (I hope it shows correctly, it's an 
> accented "e").

If the system uses ISO-8859-1 there is no problem at all. If it uses
utf-8 and the program source is also written in utf-8, the comparison
still works correctly. (Both  `s' and "caffè" will have the same
internal representation, with 6 bytes.)

If the system may use two different representations, the simplest
solution is to translate to a fixed representation as soon as you read
something. If you can assume that all relevant utf-8 text can be mapped
to ISO-8859-1, it is better to use ISO-8859-1 internally. It is easy
to write a function to translate utf-8 to ISO-8859-1:

function toISO (s)
  if string.find(s, "[\224-\255]") then error("non-ISO char") end
  s = string.gsub(s, "([\192-\223])(.)", function (c1, c2)
        c1 = string.byte(c1) - 192
        c2 = string.byte(c2) - 128
        return string.char(c1 * 64 + c2)
  return s

-- Roberto