lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> On 25. Dec 2019, at 22:44, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
> 
> It matches too many things […]

When dealing with _valid HTML,_ heuristically, any string that starts with 'http://', ends with '.mp3' and doesn't contain spaces is almost certainly exactly a URL pointing at (something that claims to be) an MP3. (The other pattern works, too.) [So a somewhat better pattern than what I initially suggested would be "http://%S+%.mp3"; – also excluding line breaks.]

When you're not dealing with random / adversarial strings, that is good enough and you don't have to care about all those intricacies. From what I gathered, the goal is one-off semi-manual extraction of links from HTML generated by some other party, so even potential errors don't really matter… (The human in the loop can notice / fix things.)

-- nobody