lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Apr 17, 2014 at 1:32 AM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
This is not so easy to achieve in "pure" Lua. My poor attempt to code it
myself has been described on this list as bogus, wrong, etc. I wish
someone with more knowledge than me had done it — someone who
not only can throw around words like normalization and glyph, but knows
exactly what they mean.

I don't claim to have that much knowledge, but you may find the ustring library I wrote for MediaWiki's Scribunto extension interesting:

https://git.wikimedia.org/tree/mediawiki%2Fextensions%2FScribunto.git/master/engines%2FLuaCommon%2Flualib%2Fustring

It has normalization, and also pattern matching that treats character classes as equivalent to the similar Unicode categories (e.g. %a = "Letter", %d = "Decimal_Number"). Contributions welcome, but note it has to remain Lua 5.1 compatible.


There's also some PHP code elsewhere in that source tree to convert a Lua (5.1) pattern into a PCRE regex, which people already using a PCRE module might find helpful.

https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FScribunto.git/b6b66fb9b569cae8eb18fadcf5b683a9713a7431/engines%2FLuaCommon%2FUstringLibrary.php#L240


--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation