lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sun, Nov 7, 2010 at 7:44 PM, Marcin Jurczuk <mjurczuk@gmail.com> wrote:
>> return fh and md5.sumhexa(io.open(fname,"rb"):read(10485760)) or nil
>
> Try using read("*a").
>
> This will read whole file - a those files have > 500MB usually.
> I need only first 10MB to calculate checksum..
>

If I understood the implementation correctly, on each LUAL_BUFFERSIZE
* LUA_MINSTACK/2 ( BUFSIZ x 10K in out of box configuration) bytes
read, Lua concatenates the buffers (one for each fread) into a single
buffer with (BUFSIZE x 10K) bytes.

BUFSIZ is a C constant usually between 128 and 16K (bytes). In
Windows, BUFSIZ is 512. Let's assume a 4K value (larger is better).
Therefore each single buffer will have 40KB (4KB x 10).

Calculating... 10MB / 40KB requires 250 concatenations, and for each
<i> concatenation a new string with ( 40K x <i> ) bytes is created.
So, the total amount of memory allocated to read 10MB from file is
given by S = n(a1+an) / 2 => S = 250 x (0 + 10MB) / 2  = 1250MB, or
1.25Gb !  As you have only 64MB, Lua needs to perform lots of garbage
collection what cause degradation of performance.

How to improve it? Many ways:

Simple solutions (constant change):

a) Enlarge LUA_MINSTACK from 20 to 1024 or even 4096. If you don't
create many stacks, the impact would be minimum.

b) Enlarge LUAL_BUFFERSIZE to 1024K but this will affect all files and
consume memory.

More Complex solutions (code change)

c) Create a trigger constant (LARGE_BUFFERSIZE) that automatically
enlarge the buffer size.

d) Signalize to io.read function that you want a LARGE MEMORY BUFFER,
using a NEGATIVE value in io.read call - like this "
fh:read(-10485760) "

Non portable solution

f) Use the value inside FILE* structure to discover the current size
of buffer for a FILE.

Then in the function

  static int read_chars (lua_State *L, FILE *f, size_t n) {

change the line

  rlen = LUAL_BUFFERSIZE;  /* try to read that much each time */

 to something like (depend on FILE definition in your C compiler):

  rlen = f->_bufsize; /* Get the current file buffer size */

and before call io.read use io.setvbuf to set the buffer size.

Fully specific and dynamic solution.


-- 
Nilson