[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Load large amount of data fast
- From: Alexander Gladysh <agladysh@...>
- Date: Sun, 17 Oct 2010 06:10:06 +0400
On Sun, Oct 17, 2010 at 05:56, Alexander Gladysh <agladysh@gmail.com> wrote:
> Jerome, Petite, list,
>>> I'm trying to load that 3M entries in to Lua table in memory faster
>>> than I do it now.
> Thanks for answers.
> Sorry, it is 6 AM now, I'll try your solutions next evening.
Small addendum:
1. The proper solution to my whole task is to use some DB (like Tokyo
Cabinet). But the question still stays -- (a) I'm quite curious about
this and (b) rewriting code to DB will take time which I don't have
now. (That is another reason for my homework remark -- thanks for
help, guys!)
2. What I want to try (feel free to do this for me if you want):
A. Try to distribute loaded data to buckets, provide proxy for
accessing them (have to use newproxy() since I need __len, but that is
OK here.)
B. Play with GC -- perhaps more aggressive setting will help.
C. Measure what happens with plain Lua (I use LuaJIT2 beta 5).
D. Try to read data in large chunks (4096 bytes), and
search-and-replace {...}; to _A{...};, where _A is append function,
available in environment. The problem is how to write proper regexp so
if table is in two chunks it is still wrapped correctly.
E. BTW, I tried to loadstring() each line separately -- it is much
slower. Need to loadstring, say, 1000 lines.
F. Custom parsers -- no fun. But try to convert data to luabins format
line-by-line (one line -> size_t (en, luabins blob) and load that.
Z. Proper profiling.
Alexander.
P.S. Progress to date:
at line 100000 : Sun Oct 17 04:47:10 2010
at line 200000 : Sun Oct 17 04:48:29 2010
at line 300000 : Sun Oct 17 04:50:18 2010
at line 400000 : Sun Oct 17 04:53:02 2010
at line 500000 : Sun Oct 17 04:55:52 2010
at line 600000 : Sun Oct 17 04:58:55 2010
at line 700000 : Sun Oct 17 05:01:26 2010
at line 800000 : Sun Oct 17 05:07:00 2010
at line 900000 : Sun Oct 17 05:10:18 2010
at line 1000000 : Sun Oct 17 05:15:32 2010
at line 1100000 : Sun Oct 17 05:21:47 2010
at line 1200000 : Sun Oct 17 05:26:03 2010
at line 1300000 : Sun Oct 17 05:33:27 2010
at line 1400000 : Sun Oct 17 05:36:38 2010
at line 1500000 : Sun Oct 17 05:43:18 2010
at line 1600000 : Sun Oct 17 05:52:59 2010
at line 1700000 : Sun Oct 17 06:00:46 2010
at line 1800000 : Sun Oct 17 06:05:41 2010