[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Repeated processing of large datasets
- From: Geoff Leyland <geoff_leyland@...>
- Date: Wed, 29 Mar 2017 12:36:53 +1300
> On 29/03/2017, at 12:21 PM, John Logsdon <firstname.lastname@example.org> wrote:
> Thanks Geoff
> Yes, they are local to the function. cdata is a good idea although I am
> not sure how that would fit with sqlite which may not be able to store
> binary data. In essence though if stored as cdata I could just read the
> whole table in a single read. Hmm. Interesting.
Can you explain your use case a little further?
If you're provided with a CSV file, which you read once, and then process once, then I don't see a problem with reading the CSV into memory in lua and then processing it in memory (possibly using cdata) without getting SQLite involved.
If you read and process the CSV file multiple times, then it might be worth converting the CSV into something quicker to read, if the CSV reading is taking a significant amount of time (which I doubt if you're running some statistical/machine learning/optimisation model on the data).
My current trick with large data I want to read fast is to mmap it , so that loading is more or less instant. So you'd read the CSV file once, write it to a memory-mapped file, and then for each processing run, mmap the data back in.
(I have nothing against SQLite, I use it often, I'm just not sure it's offers anything for what I imagine you're doing)
 Shameless plug for https://github.com/geoffleyland/lua-mmapfile