lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 18 Apr 2007, at 03:48, Dave Dodge wrote:

On Mon, Apr 16, 2007 at 11:14:27AM +0200, Philippe Lhoste wrote:
Really? I am not a compiler expert, far from it, but as a C programmer I
thought preprocessing phase takes place before anything else.

The C standard (in section 5.1.1.2) breaks translation into 8 distinct
phases.  The early phases are probably a lot more fine-grained than
you're used to thinking about.  Paraphrased:

1) trigraphs and multibyte sourcecode
2) splicing of lines ending with backslashes
3) comment removal, and sourcecode is converted to a sequence
   of preprocessing tokens
4) preprocessing directives are executed, included files are read and
   run through stages 1-4.
5) character escape sequences
6) adjacent string literals are concatenated
7) preprocessing tokens are coverted to tokens, which are compiled as
   a translation unit
8) linking

Of course most compilers actually implement multiple phases at once,
but the important thing is that the compiler must produce results
consistent with the phases being done separately and in the above
order.

The problem is that when you follow the grammar, phase 3 ends up
needing a little bit of knowledge from phase 4.  It's pretty subtle,
and not the sort of thing you notice until you actually try to
implement them separately.

Phase 3 does not need a little bit of knowledge from Phase 4.

The problem you referred to earlier was that of differentiating the string "foo\nar" from the included file "foo\nar". In stage 3 there is no distinction, it's all just pp-tokens. You can create a problem for yourself if you decide that your frontmost lexer can distinguish strings from included files, but really the C standard says that strings don't become strings as we know them until stage 5.

Yes this makes syntax highlighting tricky.

A similar issue exists for the difference between pp-number and numbers. The following is a valid program:

#define NDEBUG
#include <assert.h>

int main(void) {
  assert(1e1e1e);
  return 0;
}

but becomes invalid if the initial #define NDEBUG is removed, because the conversion of the pp-token 1e1e1e to a token fails in phase 7 (that pp-token never reaches phase 7 if NDEBUG is defined).

drj