3.8 Preprocessing directives

For an overview of the philosophy behind the preprocessor, see §2.1.1.2.

Different implementations have had different notions about whether white space is permissible before and/or after the # signalling a preprocessor line. The Committee decided to allow any white space before the #, and horizontal white space (spaces or tabs) between the # and the directive, since the white space introduces no ambiguity, causes no particular processing problems, and allows maximum flexibility in coding style. Note that similar considerations apply for comments, which are reduced to white space early in the phases of translation (§2.1.1.2):

         /* here a comment */ #if BLAH
        #/* there a comment */ if BLAH
        # if /* every-
            where a comment */ BLAH

The lines all illustrate legitimate placement of comments.

3.8.1 Conditional inclusion

For a discussion of evaluation of expressions following #if, see §3.4.

The operator defined has been added to make possible writing boolean combinations of defined flags with one another and with other inclusion conditions. If the identifier defined were to be defined as a macro, defined(X) would mean the macro expansion in C text proper and the operator expression in a preprocessing directive (or else that the operator would no longer be available). To avoid this problem, such a definition is not permitted (§3.8.8).

#elif has been added to minimize the stacking of #endif directives in multi-way conditionals.

Processing of skipped material is defined such that an implementation need only examine a logical line for the # and then for a directive name. Thus, assuming that xxx is undefined, in this example:

        # ifndef xxx
        # define xxx "abc"
        # elif xxx > 0
            /* ... */
        # endif

an implementation is not required to diagnose an error for the elif statement, even though if it were processed, a syntactic error would be detected.

Various proposals were considered for permitting text other than comments at the end of directives, particularly #endif and #else, presumably to label them for easier matchup with their corresponding #if directives. The Committee rejected all such proposals because of the difficulty of specifying exactly what would be permitted, and how the translator would have to process it.

Various proposals were considered for permitting additional unary expressions to be used for the purpose of testing for the system type, testing for the presence of a file before #include, and other extensions to the preprocessing language. These proposals were all rejected on the grounds of insufficient prior art and/or insufficient utility.

3.8.2 Source file inclusion

Specification of the #include directive raises distinctive grammatical problems because the file name is conventionally parsed quite differently than an ``ordinary'' token sequence:

The angle brackets are not operators, but delimiters.
The double quotes do not delimit a string literal with all its defined escape sequences. (In some systems, backslash is a legitimate character in a filename.) The construct just looks like a string literal.
White space or characters not in the C repertoire may be permissible and significant within either or both forms.

These points in the description of phases of translation are of particular relevance to the parse of the #include directive:

Any character otherwise unrecognized during tokenization is an instance of an ``invalid token.'' As with valid tokens, the spelling is retained so that later phases can, if necessary, map a token sequence (back) into a sequence of characters.
Preprocessing phases must maintain the spelling of preprocessing tokens; the filename is based on the original spelling of the tokens, not on any interpretation of escape sequences.
The filename on the #include (and #line) directive, if it does not begin with " or <, is macro expanded prior to execution of the directive. Allowing macros in the include directive facilitates the parameterization of include file names, an important issue in transportability.

The file search rules used for the filename in the #include directive were left as implementation-defined. The Standard intends that the rules which are eventually provided by the implementor correspond as closely as possible to the original K&R rules. The primary reason that explicit rules were not included in the Standard is the infeasibility of describing a portable file system structure. It was considered unacceptable to include UNIX-like directory rules due to significant differences between this structure and other popular commercial file system structures.

Nested include files raise an issue of interpreting the file search rules. In UNIX C an include statement found within an include file entails a search for the named file relative to the file system directory that holds the outer #include. Other implementations, including the earlier UNIX C described in K&R, always search relative to the same current directory. The Committee decided, in principle, in favor of the K&R approach, but was unable to provide explicit search rules as explained above.

The Standard specifies a set of include file names which must map onto distinct host file names. In the absence of such a requirement, it would be impossible to write portable programs using include files.

Section §2.2.4.1 on translation limits contains the required number of nesting levels for include files. The limits chosen were intended to reflect reasonable needs for users constrained by reasonable system resources available to implementors.

By defining a failure to read an include file as a syntax error, the Standard requires that the failure be diagnosed. More than one proposal was presented for some form of conditional include, or a directive such as #ifincludable, but none were accepted by the Committee due to lack of prior art.

3.8.3 Macro replacement

The specification of macro definition and replacement in the Standard was based on these principles:

Interfere with existing code as little as possible.
Keep the preprocessing model simple and uniform.
Allow macros to be used wherever functions can be.
Define macro expansion such that it produces the same token sequence whether the macro calls appear in open text, in macro arguments, or in macro definitions.

Preprocessing is specified in such a way that it can be implemented as a separate (text-to-text) pre-pass or as a (token-oriented) portion of the compiler itself. Thus, the preprocessing grammar is specified in terms of tokens.

However, the new-line character must be a token during preprocessing, because the preprocessing grammar is line-oriented. The presence or absence of white space is also important in several contexts, such as between the macro name and a following parenthesis in a #define directive. To avoid overly constraining the implementation, the Standard allows the preservation of each white space character (which is easy for a text-to-text pre-pass) or the mapping of white space into a single ``white space'' token (which is easier for token-oriented translators).

The Committee desired to disallow ``pernicious redefinitions'' such as

(in header1.h)

        #define NBUFS 10

(in header2.h)

        #define NBUFS 12

which are clearly invitations to serious bugs in a program. There remained, however, the question of ``benign redefinitions,'' such as

(in header1.h)

        #define NULL_DEV 0

(in header2.h)

        #define NULL_DEV 0

The Committee concluded that safe programming practice is better served by allowing benign redefinition where the definitions are the same. This allows independent headers to specify their understanding of the proper value for a symbol of interest to each, with diagnostics generated only if the definitions differ.

The definitions are considered ``the same'' if the identifier-lists, token sequences, and occurrences of white-space (ignoring the spelling of white-space) in the two definitions are identical.

Existing implementations have differed on whether keywords can be redefined by macro definitions. The Committee has decided to allow this usage; it sees such redefinition as useful during the transition from existing to Standard-conforming translators.

These definitions illustrate possible uses:

        # define char   signed char
        # define sizeof (int) sizeof
        # define const

The first case might be useful in moving extant code from a signed-char implementation to one in which char is unsigned. The second case might be useful in adapting code which assumes that sizeof results in an int value. The redefinition of const could be useful in retrofitting more modern C code to an older implementation.

As with any other powerful language feature, keyword redefinition is subject to abuse. Users cannot expect any meaningful behavior to come about from source files starting with

        #define int double
        #include <stdio.h>

or similar subversions of common sense.

3.8.3.1 Argument substitution

3.8.3.2 The `#` operator

Some implementations have decided to replace identifiers found within a string literal if they match a macro argument name. The replacement text is a ``stringized'' form of the actual argument token sequence. This practice appears to be contrary to the definition, in K&R, of preprocessing in terms of token sequences. The Committee declined to elaborate the syntax of string literals to the point where this practice could be condoned. However, since the facility provided by this mechanism seems to be widely used, the Committee introduced a more tractable mechanism of comparable power.

The # operator has been introduced for stringizing. It may only be used in a #define expansion. It causes the formal parameter name following to be replaced by a string literal formed by stringizing the actual argument token sequence. In conjunction with string literal concatenation (see §3.1.4), use of this operator permits the construction of strings as effectively as by identifier replacement within a string. An example in the Standard illustrates this feature.

One problem with defining the effect of stringizing is the treatment of white space occurring in macro definitions. Where this could be discarded in the past, now upwards of one logical line worth (over 500 characters) may have to be retained. As a compromise between token-based and character-based preprocessing disciplines, the Committee decided to permit white space to be retained as one bit of information: none or one. Arbitrary white space is replaced in the string by one space character.

The remaining problem with stringizing was to associate a ``spelling'' with each token. (The problem arises in token-based preprocessors, which might, for instance, convert a numeric literal to a canonical or internal representation, losing information about base, leading 0's, etc.) In the interest of simplicity, the Committee decided that each token should expand to just those characters used to specify it in the original source text.

QUIET CHANGE

3.8.3.3 The `##` operator

Another facility relied on in much current practice but not specified in the Base Document is ``token pasting,'' or building a new token by macro argument substitution. One existing implementation is to replace a comment within a macro expansion by zero characters, instead of the single space called for in K&R. The Committee considered this practice unacceptable.

As with ``stringizing,'' the facility was considered desirable, but not the extant implementation of this facility, so the Committee invented another preprocessing operator. The ## operator within a macro expansion causes concatenation of the tokens on either side of it into a new composite token. The specification of this pasting operator is based on these principles:

Paste operations are explicit in the source.
The ## operator is associative.
A formal parameter as an operand for ## is not expanded before pasting. (The actual is substituted for the formal, but the actual is not expanded:
```
        #define a(n) aaa ## n
        #define b    2
```
Given these definitions, the expansion of a(b) is aaab, not aaa2 or aaan.)
A normal operand for ## is not expanded before pasting.
Pasting does not cross macro replacement boundaries.
The token resulting from a paste operation is subject to further macro expansion.

These principles codify the essential features of prior art, and are consistent with the specification of the stringizing operator.

3.8.3.4 Rescanning and further replacement

A problem faced by most current preprocessors is how to use a macro name in its expansion without suffering ``recursive death.'' The Committee agreed simply to turn off the definition of a macro for the duration of the expansion of that macro. An example of this feature is included in the Standard.

The rescanning rules incorporate an ambiguity. Given the definitions

        #define  f(a)  a*g
        #define  g     f

it is clear (or at least unambiguous) that the expansion of f(2)(9) is 2*f(9) --- the f in the result clearly was introduced during the expansion of the original f, so is not further expanded.

However, given the definitions

        #define f(a)  a*g
        #define g(a)  f(a)

the expansion rules allow the result to be either 2*f(9) or 2*9*g --- it is unclear whether the f(9) token string (resulting from the initial expansion of f and the examination of the rest of the source file) should be considered as nested within the expansion of f or not. The Committee intentionally left this behavior ambiguous: it saw no useful purpose in specifying all the quirks of preprocessing for such questionably useful constructs.

3.8.3.5 Scope of macro definitions

Some pre-Standard implementations maintain a stack of #define instances for each identifier; #undef simply pops the stack. The Committee agreed that more than one level of #define was more prone to error than utility.

It is explicitly permitted to #undef a macro that has no current definition. This capability is exploited in conjunction with the standard library (see §4.1.3).

3.8.4 Line control

Aside from giving values to __LINE__ and __FILE__ (see §3.8.8), the effect of #line is unspecified. A good implementation will presumably provide line and file information in conjunction with most diagnostics.

3.8.5 Error directive

The directive #error has been introduced to provide an explicit mechanism for forcing translation to fail under certain conditions. (Formally the Standard only requires, can only require, that a diagnostic be issued when the #error directive is effected. It is the intent of the Committee, however, that translation cease immediately upon encountering this directive, if this is feasible in the implementation; further diagnostics on text beyond the directive are apt to be of little value.) Traditionally such failure has had to be forced by inserting text so ill-formed that the translator gagged on it.

3.8.6 Pragma directive

The #pragma directive has been added as the universal method for extending the space of directives.

3.8.7 Null directive

The existing practice of using empty # lines for spacing is supported in the Standard.

3.8.8 Predefined macro names

The rule that these macros may not be redefined or undefined reduces the complexity of the name space that the programmer and implementor must understand; it recognizes that these macros have special built-in properties.

The macros __DATE__ and __TIME__ have been added to make available the time of translation. A particular format for the expansion of these macros has been specified to aid in parsing strings initialized by them.

The macros __LINE__ and __FILE__ have been added to give programmers access to the source line number and file name.

The macro __STDC__ allows for conditional translation on whether the translator claims to be standard-conforming or not. It is defined as having value 1; future versions of the Standard could define it as 2, 3, ..., to allow for conditional compilation on which version of the Standard a translator conforms to. This macro should be of use in the transition toward conformance to the Standard.

3.7 External definitions

ANSI C Rationale

3.9 Future directions Index