[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Poor md5 module performance
- From: Nilson <nilson.brazil@...>
- Date: Sun, 7 Nov 2010 22:39:17 -0200
On Sun, Nov 7, 2010 at 7:44 PM, Marcin Jurczuk <mjurczuk@gmail.com> wrote:
>> return fh and md5.sumhexa(io.open(fname,"rb"):read(10485760)) or nil
>
> Try using read("*a").
>
> This will read whole file - a those files have > 500MB usually.
> I need only first 10MB to calculate checksum..
>
If I understood the implementation correctly, on each LUAL_BUFFERSIZE
* LUA_MINSTACK/2 ( BUFSIZ x 10K in out of box configuration) bytes
read, Lua concatenates the buffers (one for each fread) into a single
buffer with (BUFSIZE x 10K) bytes.
BUFSIZ is a C constant usually between 128 and 16K (bytes). In
Windows, BUFSIZ is 512. Let's assume a 4K value (larger is better).
Therefore each single buffer will have 40KB (4KB x 10).
Calculating... 10MB / 40KB requires 250 concatenations, and for each
<i> concatenation a new string with ( 40K x <i> ) bytes is created.
So, the total amount of memory allocated to read 10MB from file is
given by S = n(a1+an) / 2 => S = 250 x (0 + 10MB) / 2 = 1250MB, or
1.25Gb ! As you have only 64MB, Lua needs to perform lots of garbage
collection what cause degradation of performance.
How to improve it? Many ways:
Simple solutions (constant change):
a) Enlarge LUA_MINSTACK from 20 to 1024 or even 4096. If you don't
create many stacks, the impact would be minimum.
b) Enlarge LUAL_BUFFERSIZE to 1024K but this will affect all files and
consume memory.
More Complex solutions (code change)
c) Create a trigger constant (LARGE_BUFFERSIZE) that automatically
enlarge the buffer size.
d) Signalize to io.read function that you want a LARGE MEMORY BUFFER,
using a NEGATIVE value in io.read call - like this "
fh:read(-10485760) "
Non portable solution
f) Use the value inside FILE* structure to discover the current size
of buffer for a FILE.
Then in the function
static int read_chars (lua_State *L, FILE *f, size_t n) {
change the line
rlen = LUAL_BUFFERSIZE; /* try to read that much each time */
to something like (depend on FILE definition in your C compiler):
rlen = f->_bufsize; /* Get the current file buffer size */
and before call io.read use io.setvbuf to set the buffer size.
Fully specific and dynamic solution.
--
Nilson