lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


lhf wrote:

I've written a version of ldump.c that saves only a single
instance of any string (of course, I've also written the
corresponding lundump.c):
http://www.tecgraf.puc-rio.br/~lhf/tmp/dump.tar.gz

I've been dealing with some automatically generated data
files with lots of duplicate strings, but also lots of
unique strings.  For example, file-b.lua has 1923663
strings, or 283801 not counting duplicates.   The 8
most-duplicated distinct strings count for about half of the
counting-duplicates figure; 37590 strings are unique.  (The
average length is 7 characters.)

All four .luac files were generated with the -s switch.
There's not a massive decrease in size:

33997307 file-a.lua
30841144 file-a-old-dump.luac
30689172 file-a-new-dump.luac
37521826 file-b.lua
35676747 file-b-old-dump.luac
35209112 file-b-new-dump.luac

Here some typical timings, with the .lua files included for
comparison.  The difference between the the two undumps
looks invisible in the noise:

$ time lua file-a.lua

real    0m16.934s
user    0m9.743s
sys     0m0.996s
$ time lua file-a-old-dump.luac

real    0m9.407s
user    0m3.552s
sys     0m0.798s
$ time src/*lhf*/src/lua file-a-new-dump.luac

real    0m10.202s
user    0m3.692s
sys     0m0.719s
$ time lua file-b.lua

real    0m17.402s
user    0m10.776s
sys     0m0.953s
$ time lua file-b-old-dump.luac

real    0m10.454s
user    0m3.914s
sys     0m0.827s
$ time src/*lhf*/src/lua file-b-new-dump.luac

real    0m9.929s
user    0m3.980s
sys     0m0.840s

Hope that's helpful.

--
Aaron
http://arundelo.com/