lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On Saturday, February 22, 2014, Enrico Colombini <erix@erix.it> wrote:
On 22/02/2014 9.35, Thijs Schreijer wrote:
UTF8 was mentioned as a possible feature to be included in future
versions. If that happens, the arguments to get control characters
handled without data mangling, gets a lot stronger.

I may be mistaken, not being an Unicode expert (to put it mildly) but I am under the impression that using a 'traditional' line input function for UTF-8 (with or without '\0') could open another, larger, can of worms.

The set of line terminators and white space characters seems to be different; for example, U+2028 is a line separator and cannot be recognized by a simple test on the value returned by getc(). An UTF-8 oriented line iterator would probably be needed.

P.S. It is not my intention to start a thread about what a line is :-)

--
  Enrico


And this was the thinking behind my suggestion to deprecate it. 

UTF-8 is becoming more prominent. People also use other encodings. 

Even ASCII has subtle problems with the definition of "line" and "eof"

Making something work for a growing variety of use cases that are entering the realm of legitimate is beyond the scope of core lua.   

Making something work in lua with file:read and the "*n" option is very simple and makes a nice lua 101 tutorial. 

Sometimes you have to code to get stuff done. 

With the code saved in Lua sources, we can get [insert hot button feature request here] or, it will work on even more microwaves! :)

-Andrew