[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Pattern matching among multiple chunks
- From: Sam Pagenkopf <ssaammp@...>
- Date: Wed, 16 Jan 2019 21:58:39 -0600
- Use a heuristic to figure out if you match hit a buffer boundary, if possible. A good example would be checking for unmatched braces.
- Chunk the file by a delimiter, if you can, that guarantees it will be parse-able. The "lines" method might solve your problem. This could impose format limits.
- Is this for parsing something being sent over a network? Try parsing it before sending it over.
- Make your own finite-state machine that can work on chunks of input.
- Make a library that generates the above automatically using lua-style regex.
On 1/16/19 11:16 AM, pocomane wrote:
> I have a working parser that acts on a single string, but now I would
> like to extend it to get the input splitted among several chunks,
> without waiting for the whole data unless it is necessary.
> We consider, for example, the pattern
> to match against the following input:
> We suppose it is splitted in two chunks right in the middle.
> On the first chunk the match fails. But it may match if I wait for
> another chunk. How I can check for this? 
I think you need to implement finite state automata if
you need to know parsing state at any character of input.
Alternatively, you may limit maximum size of "valid" caption.
Say, to 8K. And then implement sliding window buffer of that
size with stock pattern matching.