lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Markus wrote:
> How can I match all possible EndOfLines in strfind() with a Lua pattern?
> EndOfLine is \n\r
> or \r\n
> or \n
> or \r
> but \n\n are two EndOfLines with an empty string between
> also \r\r
> Since one hour I try to define a pattern that matches all possible
> EndOfLine definitions.

Except in badly formed text files, or in cases I am not aware of (but I
would like to know, if any), I doubt you will meet the \n\r combination. \r\n is
used in Dos/Windows world, \n in Unix-like world, and \r in Macintosh/Apple
world, but I am not aware of a \n\r combination.

Gavin stated that it is quite rare to see mixed EOL in a text file.
It is true, or actually, it shouldn't occurs.
But it is possible. I saw source files on network that can be accessed from
Unix or Windows boxes, and some editors were able to manage the foreign EOL
(display it correctly) but still inserted the EOL of their system when the
user was hitting Return...
SciTE, among others, automatically detects the EOL used in a file and sets
its EOL mode accordingly, but the user can choose to switch mode without
converting the whole file...

Anyway, I won't answer your question directly, as others did it more
precisely that I could.
But I give my own function that try and detect the EOL mode in a given
string (that could be a whole file content...). Despite what I wrote above, for
ease of processing, I don't manage mixed EOLs...

-- Get the end of line used in the given string and return it.
-- Check only the first one, as we assume the string is consistent.
function GetEol(string)
	local eol1, eol2, eol
	b, _, eol1 = strfind(string, "([\r\n])")
	if b == nil then
		return nil	-- no EOL in this string
	-- Care is taken in case the first line finishes with two EOLs
	eol2 = strsub(string, b+1, b+1)
	if eol1 == '\r' then
		if eol2 == '\n' then
			-- Windows style
			eol = '\r\n'
			-- Mac style
			eol = '\r'
	else -- eol1 == '\n'
		-- Unix style
		eol = '\n'
	return eol

Note: if preserving the content/format of the file isn't critical, and if it
is small enough to fit in memory, you can first replace all \r without \n
around by \n, then remove all remaining \r in the file.
Something like (untested) gsub(file, "[^\n]\r[^\n]", "\n") and gsub(file,
"\r", "\n").


Philippe Lhoste (Paris -- France)
Professional programmer and amateur artist

GMX - Die Kommunikationsplattform im Internet.