[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Feature request: plain option for gsub
- From: Jonas Thiem <jonasthiem@...>
- Date: Thu, 21 Aug 2014 16:54:03 +0200
Also is it a good idea to depend on a "sane locale" for something that
is possibly security relevant? (not a rhetoric question, I am not sure
what a "not sane" locale is and how easy it can be butchered by an
attacker)
Because of my lack of knowledge on things, I wrote up this lengthy
function which is supposed to behave like a gsub with plain=True:
string.replace = function(base, search, replace)
--[[-- A replace function that will replace all occurances of the search
string in the given base string with the replace string.
Unlike string:gsub, no special regex pattern processing will take
place - this stupidly searches exactly for the search string as is,
and only if it is found character by character exactly as written,
the occurance will be replaced. ]]
if #search == 0 then
return base
end
local startindex = 1
while true do
local index = base:find(search, startindex, true)
if index ~= nil then
local result = ""
if index > 1 then
result = result .. base:sub(1, index-1)
end
result = result .. replace
-- check if there is something left after the replaced piece:
if index - 1 + #replace < #base + #replace - #search then
-- there is!
assert(#result >= startindex)
startindex = #result
result = result .. base:sub(index + #search)
base = result
-- since stuff is left at the end, continue searching!
else
-- we reached the end:
return result
end
else
-- no further occurances left!
return base
end
end
end
I would rather not do that, but I felt it was a better idea than not
being sure if everything was always escaped properly in all cases.
With a find option for gsub, I would just have used that one instead.
About the %p pattern... I still cannot judge if it is safe for all
circumstances. If it is, then that would be a great solution too!
(certainly shorter)
Regards,
Jonas Thiem
On Thu, Aug 21, 2014 at 4:23 PM, Jonas Thiem <jonasthiem@googlemail.com> wrote:
> Hm I see. Still, wouldn't a plain option be more consistent with find
> and easier to use for beginners?
>
> On Thu, Aug 21, 2014 at 4:21 PM, Roberto Ierusalimschy
> <roberto@inf.puc-rio.br> wrote:
>>> I think this already demonstrates my point. Coming up with a regex
>>> that is safe and escapes everything is not trivial.
>>>
>>> [...]
>>> >
>>> > On my system, '%p' does not match '[+$^]', so '%p' should become '[%p+$^]'.
>>
>> This seems like a bug in his system (or else he is using some weird
>> locale...). '%p' corresponds to 'ispunct', and the C standard says this:
>>
>> In an implementation that uses the seven-bit US ASCII character set, the
>> printing characters are those whose values lie from 0x20 (space) through
>> 0x7E (tilde);
>>
>> [...]
>>
>> In the "C" locale, ispunct returns true for every printing character for
>> which neither isspace nor isalnum is true.
>>
>> So, '[+$^]' must be all punctuations (and therefore match '%p').
>>
>> If you assume a correct libC and a sane locale, '%p' is all you need.
>>
>> -- Roberto
>>