lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2013/5/10 Dirk Laurie <dirk.laurie@gmail.com>:

> I have simply used your strategy with my C merge subroutine, and
> it's a little slower than my implementation of Fisher's strategy.
> That is to be expected, since that routine does not call anything
> smaller than 12. I'll experiment with doing say 8-element or
> 16-element bins with insert sort.

There are two possible strategies:

1. Top-down, as Fisher does, bisecting each time, with recursion.
2. Bottom-up, as you do, doubling each time, with iteration.

For each, there is an minimum chunk size below which one
should use a non-merge routine.

There is also the question of merge shortcuts.

A. No shortcut, as Fisher does.
B. Test whether the two halves are already sorted, as Decker does.
   This can be implemented directly at the level of the merge
   routine (which is in C).

That gives us four possibilities, with minimum chunk size fixed
a priori.  Here are the results (chunk size 12 for Strategy 1,
16 for Strategy 2 — but there is no reason why it must be a power
of 2, actually).

== test1: 1000000 duples with 20 distinct values
== test2: 1000000 duples with approximately 20 equals of every value
== test3: 1000000 random duples
== test4: 1000000 cut deck
== test5: 1000000 sorted elements with 1000 random swaps

                   Time in seconds, negative time means sort was unstable

                                  #1     #2     #3     #4     #5    Total
                    table.sort  -6.53  -6.43   6.71   2.60   1.43   23.70
         power-of-2 merge sort   7.66   4.59   7.18   1.18   1.48   22.09
    ditto, with shortcut check   7.46   4.22   7.31   0.28   1.19   20.46
                    stablesort   6.71   4.70   6.75   1.25   1.60   21.01
    ditto, with shortcut check   6.62   4.01   6.75   0.41   1.25   19.04

It must be stressed that most of the reason for the much faster times
in $4 and #5 is that the sort criterion is just x<y, done at the C
level with no function call.

These are five good sort routines, with not much to choose in terms
of speed. The question is which virtue one prefers, negligible storage
overheads (table.sort) or stability (the others).

Strategy B wins eight times out of ten, with one tie. The merge routines
win when there are very many equals (that is a well-known hard case for
two-way quicksort, which can be cured by using a three-way variation).

The attached zipfile contains the code.

There is still more tuning possible, e.g. size-dependent minimum
chunk length a la Timsort, or implementing the supervising routine
in C too.

Attachment: stablesort-1.1.zip
Description: Zip archive