[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: [Announce] Alpha release of a Lua debugger
- From: Dave Dodge <dododge@...>
- Date: Fri, 13 Apr 2007 12:23:44 -0400
On Fri, Apr 13, 2007 at 04:31:54PM +0100, David Given wrote:
> C is notoriously irritating to parse because the parse rules for a token can
> change during compilation.
Aside: C's even worse than that. In one subtle but very ugly case,
even basic tokenization is sensitive to higher-level context. In
translation phase 3 you decompose the source file into preprocessing
tokens and whitespace; so let's say you see this:
Is that a string-literal or a header-name token? It matches the
syntax for both. The only way to know which one you've got is to
examine the preceding tokens to see if this is part of an #include
directive, even though preprocessing directives aren't really supposed
to be dealt with until phase 4.
But wait, it's worse! You might think you can just default to reading
it in as a string literal, and fix it up in phase 4 when you have the
context. But suppose you see this:
One of the reasons a header-name is a distinct token from a
string-literal is that the character sets work differently between the
quotes. When reading that as a string-literal, it represents a
newline; but if it's a header-name, the backslash has no special
meaning and this should be read as two characters. Ugh, what a mess.