Lpeg and malformed input / Lpeg and subjects that do not fit into memory

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Lpeg and malformed input / Lpeg and subjects that do not fit into memory
From: Aladdin Lampé <genio570@...>
Date: Thu, 18 Sep 2008 11:42:37 +0200

Hi List!
I've recently been "playing" with the Lpeg library, which is IMHO a great piece of code. I have two questions, which are related to each other:


1. Lpeg is great when the subject follows strictly a given grammar. But how to parse *malformed* CSV files, for instance? (and maybe generate "warnings" or "errors")

In my case, it is a big CSV-like file of 5 Go:
- in the general case, the records have only one line, so I can use Lpeg to parse with: subject = current line
- the exception is that some of my record span on an unknown number of lines because there are sometimes embedded '\n' in some *unquoted* fields...
Counting the number of fields helps to detect most errors... but if the last field is unquoted and contains some '\n', I get the right number of fields but the remaining part of the last field is on the next line(s)...
Basically, I think I need to parse successfully the next record to be sure that the current one is OK (weird, I know, if you have a better idea, you're welcome :-) ).


2. Is there is a way of designing patterns and grammars so that it becomes possible to parse a subject that does not fit entierely into lua memory?

What I would like to do with Lpeg is the following:
subject = read a chunk of N=4096 bytes of my big CSV file, when |!.| matches in the defined grammar (ie. 'end of subject'), use a Lpeg callback to see if:
- more input is needed (because the record is not matched yet)
- or if the match was successful
- or if it is the real end of subject (ie. 'end of file')
Is that possible with current version of Lpeg or is another way of solving this planned in a near future?


Any ideas, suggestions, pointers? (Lpeg samples would be great!)
Thanks,
Aladdin

_________________________________________________________________
Installez gratuitement les 20 émôticones Windows Live Messenger les plus fous ! Cliquez ici !
http://www.emoticones-messenger.fr/

Follow-Ups:
- Re: Lpeg and malformed input / Lpeg and subjects that do not fit into memory, Roberto Ierusalimschy

Prev by Date: Re: The probability of returning zero by math.random is doubled.
Next by Date: Re: Lua string library partially reimplemented in Lua
Previous by thread: Re: The probability of returning zero by math.random is doubled.
Next by thread: Re: Lpeg and malformed input / Lpeg and subjects that do not fit into memory
Index(es):
- Date
- Thread