lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

the bitching part of Lua is even worse...

Imagine you want to read in  one huge array of doubles which you will
need in a C Array The best you can do is the following

Define 
function load_array(lua_array)
  local n=table.getn(lua_array)
  local c_array=c_array_create(n)
  local i
  for i=1,n do c_array_set(c_array,i,lua_array[i]) end
  return c_array
end

This uses C code bound to Lua (e.g. using tolua).

c_array * c_array_create(int n);
void c_array_set(c_array *a, int i, double val);

Now you can write in the data file

a=load_array{1,2,3,5,12.33}
... lua code using a

But what happens when this file is read by Lua ?
1) The whole file is loaded into memory (?)
2) The file is translated to byte code doubling all the data
3) The bytecode is executed. Only then data gets where it is needed.

So all data enter the memory three (?, at least two) times while it is
needed only once.  We speak here about 10^6 ... 10^7 values.


What about binary data in  strings ? IMHO the proposed ascii representation
is much too long for this case. One could go with base64, though. But this
has IMHO considerable decoding overhead.

My workaround so far is a mechanism wich subdivides input files
into chunks separated by $ characters.
So the example above would be

a=c_array_create(5)
Data{a}
$
1
2
3
5
12.33
$
... lua code using a
[EOF]

When  executed, the first  chunk is  loaded, byte  compiled, executed.
The data statement  internally tells how to parse  the next chunk, and
where to  put the data.  Then,  the middle chunk is  parsed by another
parser (written by hand...) directly  transferring the data into the C
array.  The  last chunk  is  again  handled  by Lua.   Lua5  perfectly
supports  this chunk handling.  For Lua4  I published  the lua_dolines
patch on the wiki.


To handle binary data, I do the following

a=c_array_create(5)
Data{a, encoding="native"}
$
/=)EPEJDPDJP°D!"
$
... lua code using a
[EOF]

where /=)EPEJDPDJP°D!" is _pure_ (not base64, but xdr) encoded binary
data.  It is read directly read in  by fread() without any overhead.


If you want portable binary files,  you can use xdr encoding instead of
native.  One could imagine base64 as well.

You also can write

Data{a, encoding="native", file="f", pos=12334}

Then data is taken from another file by the very same mechanism.

In reality, data sets a linehandler or a binhandler used to  handle
the next chunk, which can be written in C or Lua.

[[ ]] strings instead of these chunks would be stored in memory,thus
doubling the needed  data space.

Please note that I researched XML for these topics, it gives no better
solution because you are left  alone with pure ascii data chunks. Pure
binary (not base64) is even impossible.

Matlab and co  IMHO have slow parsers. Some  communities speak CDF and
HDF which IMHO are incredibly bloated and intransparent. I don't know
about perl, python and ruby as I _love_ Lua.
 
While I  see my approach  more as a  workaround than as a  solution, I
really think  that Lua could win  from being able to  handle huge data
without bloat. My code is part  of a larger system. Time permitting, I
could try to cut out the basics and to make them available.
 
Juergen



Juergen Fuhrmann
 __  __  __  __                  Numerical Mathematics & Scientific Computing
|W  |I  |A  |S     Weierstrass Institute for Applied Analysis and Stochastics
Mohrenstr. 39 10117 Berlin      fon:+49 30 20372560        fax:+49 30 2044975
http://www.wias-berlin.de/~fuhrmann            mailto:fuhrmann@wias-berlin.de