lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 22/02/2014 9.35, Thijs Schreijer wrote:
UTF8 was mentioned as a possible feature to be included in future
versions. If that happens, the arguments to get control characters
handled without data mangling, gets a lot stronger.

I may be mistaken, not being an Unicode expert (to put it mildly) but I am under the impression that using a 'traditional' line input function for UTF-8 (with or without '\0') could open another, larger, can of worms.

The set of line terminators and white space characters seems to be different; for example, U+2028 is a line separator and cannot be recognized by a simple test on the value returned by getc(). An UTF-8 oriented line iterator would probably be needed.

P.S. It is not my intention to start a thread about what a line is :-)

--
  Enrico