lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 21-Dec-05, at 6:43 PM, PA wrote:

Hello,

I'm trying to implement a "natural order" comparator as described here:

http://www.naturalordersort.org/
http://sourcefrog.net/projects/natsort/

Something along these lines:

local aList = { "rfc1000.txt", "rfc2086.txt", "rfc822.txt" }

table.sort( aList, compare )

The main reason the code you present is going to be slow is that you split up every string on every comparison (both strings in fact). Quicksort (the algorithm used by table.sort) does quite a few comparisons, so each string is going to be split up quite a few times. You'd do better by doing it only once, and then sorting an array of indices into the string/canonicalised string array.

In particular, is there a way the construct the regular expression "(%d*%.?%d*)(%D*)" so it returns only one capture, either one made only of digits or characters?

Not easily. However, except for cases where you have version-number stuff like this:

  lua-5.0.2.tar.gz

you're going to have strict alternation between strings and numbers. So it shouldn't be a problem that you capture a number/string pair. (You would need to have two cases, one for strings starting with a digit and one for strings not starting with a digit. You can then normalize all the vectors so that they start with a text string, by inserting "" at the beginning of the translation of a string which starts with a digit.)

I think dotted numbers are a difficult case. Should 5.9 come before or after 5.10 ? What about 192.168.0.9 and 192.168.0.10 ? Or 2005.12.9 and 2005.12.10 ? I don't have a solution, though. If you're going to take the '.' as always starting a decimal fraction, you can treat .%d* as a text string, whereas if you take the '.' as always being a separator you can treat '.' as a text string.

In any event,