[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Using Lua to combine CSV files
- From: "chandan datta" <datta.chandan@...>
- Date: Tue, 26 Jun 2007 12:38:47 -0400
Hi Shane,
I'm interested in using Lua to make things faster and optimized.
To be more lucid the logfile1.csv has data like :
1166212618.66,Fri Dec 15 14:56:58,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
3.2,3.2,-3.2,-3.2,0,0,1,-1,1,1,1,0,0,1,1,1,0,1,0,2,82,0,9,4,3902,3.79,0,318.2,0,0,1,1
1166212618.72,Fri Dec 15 14:56:58,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
3.2,3.2,-3.2,-3.2,0,0,1,-1,1,1,1,0,0,1,1,1,0,1,0,2,82,0,9,4,3902,3.79,0,318.2,0,0,1,1
1166212618.78,Fri Dec 15 14:56:58,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
3.2,3.2,-3.2,-3.2,0,0,1,-1,1,1,1,0,0,1,1,1,0,1,0,2,82,0,9,4,3902,3.79,0,318.2,0,0,1,1
So the first field is the timestamp(Linux format) .
logfile10.csv similarly has the same type of data:
1166212618.84
,Fri Dec 15 14:56:58,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.2,3.2,-3.2,-3.2,0,0,1,-1,1,1,1,0,0,1,1,1,0,1,0,2,82,0,9,4,3902,3.79,0,318.3
,0,0,1,1
1166212618.91,Fri Dec 15 14:56:58,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.2,3.2,-3.2,-3.2,0,0,1,-1,1,1,1,0,0,1,1,1,0,1,0,2,82,0,9,4,3902,
3.79,0,318.3,0,0,1,1
1166212618.97,Fri Dec 15 14:56:58,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.2,3.2,-3.2,-3.2,0,0,1,-1,1,1,1,0,0,1,1,1,0,1,0,2,82,0,9,4,3902,
3.79,0,318.3,0,0,1,1
1166212619.03,Fri Dec 15 14:56:59,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.2,3.2,-3.2,-3.2,0,0,1,-1,1,1,1,0,0,1,1,1,0,1,0,2,82,0,9,4,3902,
3.79,0,318.6,0,0,1,1
1166212619.09,Fri Dec 15 14:56:59,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,1,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.2,3.2,-3.2,-3.2,0,0,1,-1,1,1,1,0,0,1,1,1,0,1,0,2,82,0,9,4,3902,
3.79,0,318.6,0,0,1,1
One
advantage I have is that my logfiles are named sequentially in the
directory like logfile1.csv ,logfile2.csv and so on.So when I want to
combine all the data between logfile1.csv and logfile10.csv -its has
sequentially increasing timestamps(like logfile10 would always have
timestamps of higher value than logfile1.csv) .
Now
when timestamp1=1166212618.72 ( match in logfile1.csv) and
timestamp2=1166212619.03
( match in logfile10.csv) then I would take all the data after the
timestamp1's record from logfile1.csv ,all data from logfile2.csv
,logfile3.csv ...upto the timestamp2's record from logfile10.csv into
the new CSV outlogfile.csv
My problem is the logfiles are huge -I have a
few which are 1-4 GB in size,so the searching for timestamp and then
corresponding record copying has to be fast and efficient.As of now I'm
using a MATLAB script to do so,but its very slow and inefficient.
Please suggest a way which is fast and efficient.
Thanks Shmuel for the code,but it would really help if someone comes up a faster way-or if at all doing this in Lua is a good idea.
Keep those suggestions coming in !!!
--
Regards,
Chandan