lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hello,

I am writing a small natural language processing module in Lua and was considering using just gsub() for my pattern-matching needs (i.e., instead of wrapping regex functions in C), but I wasn't able to implement a 'or' in gsub() patterns (like the char | in regex).
Particularly, I have code like:

local word_table = { n = 0 }
gsub(text, pattern, function (word)
                         tinsert(%word_table, word)
                       end)

for separating tokens, but I couldn't find out a way to split strings like "Hello, world!" into ["Hello", ",", "world", "!"] -- for what I'd use a pattern like "\w+|[^\w\s]+".
Any suggestions about how to implement it?

Tiago Tresoldi
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.459 / Virus Database: 258 - Release Date: 25/02/03