|
On 22/02/2014 9.35, Thijs Schreijer wrote:
UTF8 was mentioned as a possible feature to be included in future versions. If that happens, the arguments to get control characters handled without data mangling, gets a lot stronger.
I may be mistaken, not being an Unicode expert (to put it mildly) but I am under the impression that using a 'traditional' line input function for UTF-8 (with or without '\0') could open another, larger, can of worms.
The set of line terminators and white space characters seems to be different; for example, U+2028 is a line separator and cannot be recognized by a simple test on the value returned by getc(). An UTF-8 oriented line iterator would probably be needed.
P.S. It is not my intention to start a thread about what a line is :-) -- Enrico