lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


As you state, in general you want to process the longest keys first. The simplest way to do this is to construct an auxiliary table containing an array where the values are the keys, and then sort this array by length of string (which is just a normal sort, reverse order). Then, index through THIS array when doing the substitution (which will give you the keys, getting the value is easy).

For example:

tSubs = { …. }	-- Fill in the strings table as you currently have it
tSubsArray = {}
for k, v in pairs(tSubs) do
	table.insert(tSubsArray, k)
end
table.sort(tSubsArray)		-- Sorts strings shortest first

function Replace_Substrings_in_String(s, tSubs, tSubsArray)
	for ix = #tSubsArray, 1, -1 do	-- Reverse order so longest strings (keys) first
		local k = tSubsArray[ix]
		local v = tSubs[k]
		-- Now you have k,v the rest is the same
	end
	return s
end


On May 20, 2013, at 7:11 PM, marbux <marbux@gmail.com> wrote:

> Hi, All,
> 
> I'm working on an autoreplace script and am hoping for a tip that
> might get me past a problem in unpredictability of key names. (Users
> enter key/value pairs in a GUI to build a table of strings that will
> replace other strings.)
> 
> Consider:
> 
> function Replace_Substrings_in_String(s, tSubs)
>  for k, v in pairs(s, tSub) do
>    s = string.gsub(s, k, v)
>  end
> return s
> end -- function
> 
> tSubs = {
>    ["---"] = "—", -- em dash
>    ["--"] = "–",   -- en dash
>    ["sss"] = "§", -- section
>    ["ssss"] = "§§", --- sections
>    ["ppp"] = "¶", -- paragraph
>    ["pppp"] = "¶¶", -- paragraphs
> {
> 
> s = Replace_Substrings_in_String(s, tSubs)
> 
> (Special characters are UTF-8.)
> 
> Because the order in which Lua returns non-array keys is
> unpredictable, this type of substitution is problematic. For example,
> if Lua returns the key for the en dash (two hyphens) before the key
> for the em dash (three hyphens), the script will produce instead of an
> em dash an en dash trailed by a hyphen.
> 
> I'm particularly concerned with this problem because such
> abbreviations are used nearly universally by power users of word
> processors and thus the likelihood is high that my users will create
> them.
> 
> So my question is how I can assure that when multiple abbreviations
> share the same leading sequence of identical characters, the keys are
> processed in longest to shortest order?  (I don't anticipate any
> problems if all keys were processed in longest to shortest order.)
> 
> Thanks in advance,
> 
> Paul
>