For an overview of the philosophy behind the preprocessor, see §2.1.1.2.
Different implementations have had different notions about whether
white space is permissible before and/or after the #
signalling a preprocessor line.
The Committee decided to allow
any white space before the #
,
and horizontal white space (spaces or tabs)
between the #
and the directive,
since the white space introduces no ambiguity,
causes no particular processing problems,
and allows maximum flexibility in coding style.
Note that similar considerations apply for comments,
which are reduced to white space early in the phases of
translation (§2.1.1.2):
/* here a comment */ #if BLAH #/* there a comment */ if BLAH # if /* every- where a comment */ BLAHThe lines all illustrate legitimate placement of comments.
For a discussion of evaluation of expressions following #if
, see §3.4.
The operator defined
has been added
to make possible writing
boolean combinations of defined flags with
one another and with other inclusion conditions.
If the identifier defined
were to be defined as a macro,
defined(X)
would mean the macro expansion in C text proper
and the operator expression in a preprocessing directive
(or else that the operator would no longer be available).
To avoid this problem, such a definition is not permitted (§3.8.8).
#elif
has been added to minimize the stacking of
#endif
directives in multi-way conditionals.
Processing of skipped material is defined such that an implementation
need only examine a logical line for the #
and then for a directive
name.
Thus, assuming that xxx
is undefined, in this example:
# ifndef xxx # define xxx "abc" # elif xxx > 0 /* ... */ # endifan implementation is not required to diagnose an error for the
elif
statement,
even though if it were processed,
a syntactic error would be detected.
Various proposals were considered for permitting text other than comments
at the end of directives,
particularly #endif
and #else
,
presumably to label them for easier matchup with their corresponding
#if
directives.
The Committee rejected all such proposals
because of the difficulty of specifying exactly what would be permitted,
and how the translator would have to process it.
Various proposals were considered for permitting additional unary
expressions to be used for the purpose of testing for the system type,
testing for the presence of a file before
#include
,
and other extensions to the preprocessing language.
These proposals were all rejected on the grounds of insufficient
prior art and/or insufficient utility.
Specification of the #include
directive raises distinctive grammatical problems
because the file name is conventionally parsed quite differently than an
``ordinary'' token sequence:
#include
directive:
#include
(and #line
) directive,
if it does not begin with "
or <
,
is macro expanded prior to execution of the directive.
Allowing macros in the include
directive facilitates the parameterization of include file names,
an important issue in transportability.
#include
directive were left as implementation-defined.
The Standard intends that the rules which are eventually provided by
the implementor correspond as closely as possible to the original K&R rules.
The primary reason that explicit rules were not included in the Standard
is the infeasibility of describing a portable file system structure.
It was considered unacceptable to include UNIX-like directory rules due
to significant differences between this structure and other popular
commercial file system structures.
Nested include files raise an issue of interpreting the file search rules.
In UNIX C an include statement found within an include file
entails a search for the named file relative to the file system
directory that holds the outer #include
.
Other implementations, including the earlier UNIX C described in K&R,
always search relative to the same current directory.
The Committee decided, in principle, in favor of the K&R approach,
but was unable to provide explicit search rules as explained above.
The Standard specifies a set of include file names which must map onto distinct host file names. In the absence of such a requirement, it would be impossible to write portable programs using include files.
Section §2.2.4.1 on translation limits contains the required number of nesting levels for include files. The limits chosen were intended to reflect reasonable needs for users constrained by reasonable system resources available to implementors.
By defining a failure to read an include file as a syntax error,
the Standard requires that the failure be diagnosed.
More than one proposal was presented for some form of conditional include,
or a directive such as #ifincludable
,
but none were accepted by the Committee
due to lack of prior art.
The specification of macro definition and replacement in the Standard was based on these principles:
However, the new-line character must be a token during preprocessing,
because the preprocessing grammar is line-oriented.
The presence or absence of white space is also important in several contexts,
such as between the macro name and a following
parenthesis in a #define
directive.
To avoid overly constraining the implementation,
the Standard allows the preservation of each white space character
(which is easy for a text-to-text pre-pass)
or the mapping of white space into a single ``white space'' token
(which is easier for token-oriented translators).
The Committee desired to disallow ``pernicious redefinitions'' such as
(in header1.h)
#define NBUFS 10
(in header2.h)
#define NBUFS 12which are clearly invitations to serious bugs in a program. There remained, however, the question of ``benign redefinitions,'' such as
(in header1.h)
#define NULL_DEV 0
(in header2.h)
#define NULL_DEV 0The Committee concluded that safe programming practice is better served by allowing benign redefinition where the definitions are the same. This allows independent headers to specify their understanding of the proper value for a symbol of interest to each, with diagnostics generated only if the definitions differ.
The definitions are considered ``the same'' if the identifier-lists, token sequences, and occurrences of white-space (ignoring the spelling of white-space) in the two definitions are identical.
Existing implementations have differed on whether keywords can be redefined by macro definitions. The Committee has decided to allow this usage; it sees such redefinition as useful during the transition from existing to Standard-conforming translators.
These definitions illustrate possible uses:
# define char signed char # define sizeof (int) sizeof # define constThe first case might be useful in moving extant code from a signed-char implementation to one in which
char
is unsigned.
The second case might be useful in adapting code which assumes that
sizeof
results in an int
value.
The redefinition of const
could be useful in
retrofitting more modern C code to an older implementation.
As with any other powerful language feature, keyword redefinition is subject to abuse. Users cannot expect any meaningful behavior to come about from source files starting with
#define int double #include <stdio.h>or similar subversions of common sense.
#
operatorSome implementations have decided to replace identifiers found within a string literal if they match a macro argument name. The replacement text is a ``stringized'' form of the actual argument token sequence. This practice appears to be contrary to the definition, in K&R, of preprocessing in terms of token sequences. The Committee declined to elaborate the syntax of string literals to the point where this practice could be condoned. However, since the facility provided by this mechanism seems to be widely used, the Committee introduced a more tractable mechanism of comparable power.
The #
operator has been introduced for stringizing.
It may only be used in a #define
expansion.
It causes the formal parameter name following to be replaced by
a string literal formed by stringizing the actual argument token
sequence.
In conjunction with string literal concatenation (see §3.1.4),
use of this operator
permits the construction of strings as effectively as by identifier
replacement within a string.
An example in the Standard illustrates this feature.
One problem with defining the effect of stringizing is the treatment of white space occurring in macro definitions. Where this could be discarded in the past, now upwards of one logical line worth (over 500 characters) may have to be retained. As a compromise between token-based and character-based preprocessing disciplines, the Committee decided to permit white space to be retained as one bit of information: none or one. Arbitrary white space is replaced in the string by one space character.
The remaining problem with stringizing was to associate a ``spelling'' with each token. (The problem arises in token-based preprocessors, which might, for instance, convert a numeric literal to a canonical or internal representation, losing information about base, leading 0's, etc.) In the interest of simplicity, the Committee decided that each token should expand to just those characters used to specify it in the original source text.
##
operatorAnother facility relied on in much current practice but not specified in the Base Document is ``token pasting,'' or building a new token by macro argument substitution. One existing implementation is to replace a comment within a macro expansion by zero characters, instead of the single space called for in K&R. The Committee considered this practice unacceptable.
As with ``stringizing,''
the facility was considered desirable,
but not the extant implementation of this facility,
so the Committee invented another preprocessing operator.
The ##
operator within a macro expansion causes concatenation
of the tokens on either side of it into a new composite token.
The specification of this pasting operator is based on these
principles:
##
operator is associative.
##
is not expanded
before pasting. (The actual is substituted for the formal,
but the actual is not expanded:
#define a(n) aaa ## n #define b 2Given these definitions, the expansion of
a(b)
is
aaab
, not aaa2
or aaan
.)
##
is not expanded before pasting.
A problem faced by most current preprocessors is how to use a macro name in its expansion without suffering ``recursive death.'' The Committee agreed simply to turn off the definition of a macro for the duration of the expansion of that macro. An example of this feature is included in the Standard.
The rescanning rules incorporate an ambiguity. Given the definitions
#define f(a) a*g #define g fit is clear (or at least unambiguous) that the expansion of
f(2)(9)
is 2*f(9)
--- the f
in the result clearly
was introduced during the expansion of the original f
, so
is not further expanded.
However, given the definitions
#define f(a) a*g #define g(a) f(a)the expansion rules allow the result to be either
2*f(9)
or
2*9*g
--- it is unclear whether the f(9)
token string
(resulting from the initial expansion of f
and the examination
of the rest of the source file) should be considered as nested
within the expansion of f
or not.
The Committee intentionally left this behavior ambiguous:
it saw no useful purpose in specifying all the quirks of preprocessing
for such questionably useful constructs.
Some pre-Standard implementations maintain a stack of #define
instances for each identifier;
#undef
simply pops the stack.
The Committee agreed that more than one level of #define
was more prone to error than utility.
It is explicitly permitted to #undef
a macro that has no current definition.
This capability is exploited in conjunction with the standard library
(see §4.1.3).
__LINE__
and __FILE__
(see §3.8.8),
the effect of #line
is unspecified.
A good implementation will presumably provide line
and file information in conjunction with most diagnostics.
The directive #error
has been introduced to provide an explicit mechanism for
forcing translation to fail
under certain conditions.
(Formally the Standard only requires, can only require, that
a diagnostic be issued
when the #error
directive is effected.
It is the intent of the Committee, however,
that translation cease immediately upon encountering this directive,
if this is feasible in the implementation;
further diagnostics on text beyond the directive are apt to be
of little value.)
Traditionally such failure has had to be forced by inserting text
so ill-formed that the translator gagged on it.
The #pragma
directive has been added as the universal method for extending
the space of directives.
The existing practice of using empty #
lines
for spacing is supported in the Standard.
The rule that these macros may not be redefined or undefined reduces the complexity of the name space that the programmer and implementor must understand; it recognizes that these macros have special built-in properties.
The macros __DATE__
and __TIME__
have been added to make available the time of translation.
A particular format for the expansion of these macros has been specified
to aid in parsing strings initialized by them.
The macros __LINE__
and __FILE__
have been added
to give programmers access to the source line number and file name.
The macro __STDC__
allows for conditional translation on whether the translator claims
to be standard-conforming or not.
It is defined as having value 1;
future versions of the Standard could define it as 2, 3, ...,
to allow for conditional compilation on which version of the Standard
a translator conforms to.
This macro should be of use in the transition toward conformance
to the Standard.