lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


ideally the pattern should be anchored at start '^' and end '$', to avoid matching in the middle (e.g. "https://example.org/some.md/something.md#section": the first ".md" is not the expected match; so such problem for the link anchor starting always by the first occurrence of '#' (if a parent path starts by '.md' in a folder I don't think such transform is valid, yuou would transform the parent folder but not the extension at end of the path.

function Link (link)
  link.target = link.target:gsub('^(.+)%.md%#(.+)$', '%1.html#%2')
  return link
end 

In such case, the first or second alternatives given are not equivalent (with '^... $' added), and you still need the '?' quantifier after the capturing group for the link anchor (including '#').

In Lua, it is always best to anchor the patterns, especially for technical things (that may potentially have security issues): anchoring a pattern (notably from the start) is also always faster to avoid false positives: it must matches at the 1st character or will not match immediately, scanning the text stops instantly; anchoring the end also avoid premature matching when you want to make sure you'll scan the full text (in this case it will be slower, however Lua patterns are not scanning the text with backtrailing, only in sequential read, so the text is scanned only once in the forward direction, and does not require internal buffering for backtrailing (Lua just uses a stack of candidate states for quantifiers like '?', '+' or '*', the depth of this stack is limited to the total number of quantifiers in the pattern, where they can coexist in a given state of the matching engine)


Le jeu. 21 janv. 2021 à 18:06, Albert Krewinkel <albert+lua@zeitkraut.de> a écrit :
Hello Peter,

Peter Matulis <pmatulis@gmail.com> writes:

> Hi, I am very new to Lua

Welcome :)

> but I do have two tiny Pandoc Lua filters that
> work. They are extremely similar and I would like to combine them. I'm just
> not sure how to introduce the conditional aspect.
>
> 1. This script simply exchanges the '.md' file extension for the '.html'
> extension:
>
>
> function Link (link)
>   link.target = link.target:gsub('(.+)%.md', '%1.html')
>   return link
> end
>
>
> 2. This script does the same except it handles the case where the link
> contains an anchor:
>
>
> function Link (link)
>   link.target = link.target:gsub('(.+)%.md%#(.+)', '%1.html#%2')
>   return link
> end

As far as I can tell, the first version should also work for the second
use case. A pattern may match anywhere in the string (unless "start of
string" or "end of string" are explicitly matched in the pattern).

BTW, we are also happy to answer pandoc-related issues over at the
pandoc-discuss mailing list:
https://groups.google.com/forum/#!forum/pandoc-discuss

Cheers,
Albert


--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124