lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Hello Lua users,

I have been using Lua for a few years now and thought I knew its regular expressions pretty well, but this one caught me out:

> print (string.match ("luaXusers", "lua[.]users"))

I was trying to match any character between "lua" and "users". Now I know this works:

> print (string.match ("luaXusers", "lua.users"))

However I put the dot inside the brackets to make it more obvious to the reader that I was matching any character and not just a dot (as in lua.users) which a casual read of the regular expression might make you think.

Referring to the documentation in Programming In Lua (2nd edition), I see this (page 180):

The following table lists all character classes:

.  all characters
%a letters
%c control characters

... and so on ...

Thus, the character "." is defined as a "character class".

Moving onto page 181, the book says:

"A char-set allows you to create your own character classes, combining different character classes and single characters between square brackets".

Thus the regular expression "[.]" should match any single character. It consists of a char-set, and inside the char-set is a character class, namely ".". If you want to match a period, you should really use this: "[%.]".

After all, the documentation states that a "." is a "magic character" and should be escaped with a "%" in order to have its natural meaning.

I acknowledge that changing Lua to do this may break a whole heap of regular expressions currently in use, but perhaps the documentation could be clarified to make it clear that a period inside a char-set is "itself" and not "all characters".

- Nick Gammon