On Feb 21, 2014, at 8:34 , Ulrich Schmidt wrote:
Hi all.
I have read the entire thread until now and (i am sorry) i cant find any good idea in here.
What we are discussing about? We are talking about 8-bit charset text streams. Everyone who dealt with - including me - knows: 8-bit char-sets are .... outdated (very friendly spoken). In case you receive a 8-bit text file, you probably know nothing about it.
? UTF-8 is the current hotness, and actually my 8-bit streams are usually UTF-8.
- What codepage was used?
- May be it is a old CP/M textfile where ^Z is used to define the text end. (CP/M file size is a multiple of 128)
- UTF8 extensions in use?
... and much more Questions how to read the text i cant answer.
There is no and there will never exist a fire-and-forget solution for reading 8-bit text streams.
I would like to see a lua version working with UTF16. And if someone want to read 8-bit text, he can convert it - using his knowledge about the text history - to UTF16. And please dont blame lua for this 8-bit-mess.
You can please open an own thread for your UTF-16 vote, please ;-) (many consider UTF-8 superiors, though, more efficient storage, and UTF-16 needs surrogate pairs, as well, and thus is variable length multi byte, likewise)