farblog

by Malcolm Rowe

Tricky Preprocessors

You learn something every day. A quick quiz from a mailing list I’m on: what does this C program print?

#include <stdio.h>

#define MINUS(x) -x

int main()
{
  int n = 5;
  printf("%d\n", MINUS(-n));
}

I like this one: there are three possible ways of coming up with an answer, depending upon how deeply you look into it. The obvious approach is to say “well, MINUS() returns the negative value of whatever it’s passed, and we’re passing -5, so the answer is 5”. And you’d be right, but for the wrong reason.

Slightly more thought, and you might come to the conclusion that the macro expands with two leading minus signs, like this:

printf("%d\n", --n);

And --n is obviously the prefix decrement operator applied to n, so the printed value is the result of n after that decrement, or 4 (and due to a bug we’ll get to in a second, this happens to be what you get if you compile with Microsoft’s Visual C).

But that’s not the whole story. If you compile the program using gcc, you’ll see that the answer printed (and the correct answer) is 5. So what’s going on? Well, the C/C++ standards say that the output of the preprocessing stage is a stream of tokens, not characters, so the macro expands to ‘operator-negation operator-negation 5’. Visual C doesn’t happen to follow this part of the standard, so it takes the character string --n from the output of the preprocessor and interprets it as the prefix decrement operator.

If you’ve used gcc’s preprocessor directly, you’ll know that it also outputs a character stream. So how does gcc work? If you take a look at the preprocessed source (gcc -E -x c), you’ll see that the output for the relevant line is actually:

printf("%d\n", - -n);

What’s going on here is that gcc’s preprocessor inserts just enough extra whitespace to ensure that the two negation operators can’t be mistaken for the prefix decrement operator. In other words, while the preprocessor outputs a stream of characters, it manipulates the output so that (from the point of view of a consumer), it behaves ‘as if’ it had output a stream of tokens. (And if you replace the definition of MINUS with the more typical -(x) (making the expansion -(-5)), you’ll see that gcc no longer adds in that extra whitespace — it’s only added when necessary to affect the result).