lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi Francisco,

I do know that. I always use binary mode with files and remove or add '\r' as needed. Example: A Windows users uploads via HTTP a configuration file onto a Linux based device.

Am 25.03.22 um 08:50 schrieb Francisco Olarte:
Hi Olier:

Of course it will not. In Linux this is not a text file, so using
text-file funcions, like C fgets or lua read(*l) will not work quite
right.


I have one provider with codes and recodes and cuts and pastes
mercilessly, so its files contain:
- Latin 1 chars ( win 1252 really )
- Utf8-sequences.
- "bicoded" utf8 ( convert 1 latin1 to 2 utf8 bytes, then treat each
byte as latin 1 and reencode in utf8 ).
- "tricoded" utf8, two pass of the above.
- Optional BOM.
- Optional "bicoded" bom ( so far no tricoded bom )
- Single \r, single \n, \r\n, \r\r\n and \n\r as line delimitters.

All off this ( except BOM, because it must only be one ) on a single
file. ( he seems to open files in different editors, key something,
save it disregarding any previos coding check ). And all can be more
or less detected and compensated and translated to unix-utf8. \r is
the easy part, as at least he does not have embeded \r in lines. I
prefilter them, and if you have to deal with lots of text it normally
pays to do it that way, so you know your files are text-files and you
can use all the text-file oriented routines in your language of
choice.


BOM, another can of worms.