Several closely-related topics are involved in the precise specification of expression evaluation: precedence, associativity, grouping, sequence points, agreement points, order of evaluation, and interleaving. The latter three terms are discussed in §2.1.2.3.
The rules of precedence are encoded into the syntactic rules for each operator. For example, the syntax for additive-expression includes the rule
+
multiplicative-expression
a+b*c
parses as
a+(b*c)
.
The rules of
associativity
are similarly encoded into the syntactic rules.
For example, the syntax for
assignment-expression
includes the rule
a=b=c
parses as
a=(b=c)
.
With rules of precedence and associativity thus embodied in the syntax rules, the Standard specifies, in general, the grouping (association of operands with operators) in an expression.
The Base Document describes C as a language in which the operands of successive identical commutative associative operators can be regrouped. The Committee has decided to remove this license from the Standard, thus bringing C into accord with most other major high-level languages.
This change was motivated primarily by the desire to
make C more suitable for floating point programming.
Floating point arithmetic does not obey many of the mathematical
rules that real arithmetic does.
For instance, the two expressions
(a+b)+c
and a+(b+c)
may well yield different results:
suppose that b
is greater than 0,
a
equals -b
, and c
is positive but substantially
smaller than b
.
(That is, suppose c/b
is less than DBL_EPSILON
.)
Then (a+b)+c
is 0+c
, or c
,
while a+(b+c)
equals a+b
, or 0
.
That is to say, floating point addition (and multiplication)
is not associative.
The Base Document's rule imposes a high cost on translation of numerical code to C. Much numerical code is written in FORTRAN, which does provide a no-regrouping guarantee; indeed, this is the normal semantic interpretation in most high-level languages other than C. The Base Document's advice, ``rewrite using explicit temporaries,'' is burdensome to those with tens or hundreds of thousands of lines of code to convert, a conversion which in most other respects could be done automatically.
Elimination of the regrouping rule does not in fact prohibit much regrouping of integer expressions. The bitwise logical operators can be arbitrarily regrouped, since any regrouping gives the same result as if the expression had not been regrouped. This is also true of integer addition and multiplication in implementations with twos-complement arithmetic and silent wraparound on overflow. Indeed, in any implementation, regroupings which do not introduce overflows behave as if no regrouping had occurred. (Results may also differ in such an implementation if the expression as written results in overflows: in such a case the behavior is undefined, so any regrouping couldn't be any worse.)
The types of lvalues that may be used to access an object have been restricted so that an optimizer is not required to make worst-case aliasing assumptions.
In practice, aliasing arises with the use of pointers. A contrived example to illustrate the issues is
int a; void f(int * b) { a = 1; *b = 2; g(a); }It is tempting to generate the call to
g
as if the source expression
were g(1)
, but b
might point to a
, so this optimization
is not safe.
On the other hand, consider
int a; void f( double * b ) { a = 1; *b = 2.0; g(a); }Again the optimization is incorrect only if
b
points to a
.
However, this would only have come about if the address of a
were somewhere cast to (double*)
.
The Committee has decided that such dubious possibilities need not be
allowed for.
In principle, then, aliasing only need be allowed for when the lvalues all have the same type. In practice, the Committee has recognized certain prevalent exceptions:
struct fi{ float f; int i;}; void f( struct fi * fip, int * ip ) { static struct fi a = {2.0, 1}; *ip = 2; *fip = a; g(*ip); *fip = a; *ip = 2; g(fip->i); }It is not safe to optimize the first call to
g
as g(2)
,
or the second as g(1)
, since the call to f
could quite legitimately
have been
struct fi x; f( &x, &x.i );These observations explain the other exception to the same-type principle.
A primary expression may be void
(parenthesized call to a function returning void
),
a function designator (identifier or parenthesized function designator),
an lvalue (identifier or parenthesized lvalue),
or simply a value expression.
Constraints ensure that a void
primary expression is no part
of a further expression,
except that a void expression may be cast to void,
may be the second or third operand of a conditional operator,
or may be an operand of a comma operator.
The Committee found no reason to disallow the symmetry that permits
a[i]
to be written as i[a]
.
The syntax and semantics of multidimensional arrays follow logically from the definition of arrays and the subscripting operation. The material in the Standard on multidimensional arrays introduces no new language features, but clarifies the C treatment of this important abstract data type.
Pointers to functions may be used either as (*pf)()
or as pf()
.
The latter construct, not sanctioned in the Base Document,
appears in some present versions of C, is unambiguous, invalidates
no old code, and can be an important shorthand.
The shorthand is useful for packages that present only one external name,
which designates a structure full of pointers to objects and functions:
member functions can be called as
graphics.open(file)
instead of
(*graphics.open)(file)
.
The treatment of function designators can lead to some curious, but valid, syntactic forms. Given the declarations:
int f(), (*pf)();then all of the following expressions are valid function calls:
(&f)(); f(); (*f)(); (**f)(); (***f)(); pf(); (*pf)(); (**pf)(); (***pf)();The first expression on each line was discussed in the previous paragraph. The second is conventional usage. All subsequent expressions take advantage of the implicit conversion of a function designator to a pointer value, in nearly all expression contexts. The Committee saw no real harm in allowing these forms; outlawing forms like
(*f)()
,
while still permitting *a
(for int a[]
),
simply seemed more trouble than it was worth.
The rule for implicit declaration of functions has been retained, but various past ambiguities have been resolved by describing this usage in terms of a corresponding explicit declaration.
For compatibility with past practice,
all argument promotions
occur as described in the Base Document
in the absence of a prototype declaration,
including the (not always desirable) promotion of
float
to double
.
A prototype gives the implementor explicit license to pass a float
as a float
rather than a double
,
or a char
as a char
rather than an int
,
or an argument in a special register, etc.
If the definition of a function in the presence of a prototype
would cause the function to expect other than the default promotion types,
then clearly the calls to this function must be made in the
presence of a compatible prototype.
To clarify this and other relationships between function calls and function definitions, the Standard describes an equivalence between a function call or definition which does occur in the presence of a prototype and one that does not.
Thus a prototyped function with no ``narrow'' types and no variable argument list must be callable in the absence of a prototype, since the types actually passed in a call are equivalent to the explicit function definition prototype. This constraint is necessary to retain compatibility with past usage of library functions. (See §4.1.3.)
This provision constrains the latitude of an implementor because the parameter passing conventions of prototype and non-prototype function calls must be the same for functions accepting a fixed number of arguments. Implementations in environments where efficient function calling mechanisms are available must, in effect, use the efficient calling sequence either in all ``fixed argument list'' calls or in none. Since efficient calling sequences often do not allow for variable argument functions, the fixed part of a variable argument list may be passed in a completely different fashion than in a fixed argument list with the same number and type of arguments.
The existing practice of omitting trailing parameters in a call if it is known that the parameters will not be used has consistently been discouraged. Since omission of such parameters creates an inequivalence between the call and the declaration, the behavior in such cases is undefined, and a maximally portable program will avoid this usage. Hence an implementation is free to implement a function calling mechanism for fixed argument lists which would (perhaps fatally) fail if the wrong number or type of arguments were to be provided.
Strictly speaking then,
calls to printf
are obliged to be in the scope of a prototype
(as by #include <stdio.h>
),
but implementations are not obliged to fail on such a lapse.
(The behavior is undefined).
Since the language now permits structure parameters, structure assignment and functions returning structures, the concept of a structure expression is now part of the C language. A structure value can be produced by an assignment, by a function call, by a comma operator expression or by a conditional operator expression:
s1 = (s2 = s3) sf(x) (x, s1) x ? s1 : s2In these cases, the result is not an lvalue; hence it cannot be assigned to nor can its address be taken.
Similarly, x.y
is an lvalue only if x
is an lvalue.
Thus none of the following valid expressions are lvalues:
sf(3).a (s1=s2).a ((i==6)?s1:s2).a (x,s1).aEven when
x.y
is an lvalue, it may not be modifiable:
const struct S s1; s1.a = 3; /* invalid */
The Standard requires that an implementation diagnose a
constraint error
in the case that the member of a structure or union
designated by the identifier following a member selection operator
(.
or ->
)
does not appear in the type of the structure or union designated by
the first operand.
The Base Document is unclear on this point.
The Committee has not endorsed the practice in some implementations of considering post-increment and post-decrement operator expressions to be lvalues.
See §3.3.2.4.
Some implementations have not allowed the &
operator
to be applied to an array or a function.
(The construct was permitted in early versions of C, then later made optional.)
The Committee has endorsed the construct
since it is unambiguous,
and since data abstraction is enhanced by allowing
the important &
operator to apply uniformly to any addressable entity.
Unary plus was adopted by the Committee from several implementations, for symmetry with unary minus.
The bitwise complement operator ~
,
and the other bitwise operators,
have now been defined arithmetically for unsigned operands.
Such operations are well-defined because of the restriction of integral
representations to ``binary numeration systems.''
(See §3.1.2.5.)
sizeof
operator
It is fundamental to the correct usage of functions such as
malloc
and fread
that sizeof
(char)
be exactly one.
In practice, this means that a byte
in C terms is the smallest unit of storage,
even if this unit is 36 bits wide;
and all objects are comprised of an integral number of these smallest units.
(See §1.6.)
The Standard, like the Base Document,
defines the result of the
sizeof
operator to be a constant of an unsigned integral type.
Common implementations, and common usage,
have often presumed that the resulting type is int
.
Old code that depends on this behavior has
never been portable to implementations that define
the result to be a type other than int
.
The Committee did not feel it was proper to change the language to protect
incorrect code.
The type of sizeof
,
whatever it is, is published
(in the library header <stddef.h>
)
as size_t
,
since it is useful for the programmer to be able to refer to this
type.
This requirement implicitly restricts size_t
to be a synonym for an existing unsigned integer type,
thus quashing any notion that the largest
declarable object might be too big to span even with an
unsigned long
.
This also restricts the maximum number of
elements that may be declared in an array,
since for any array a
of N
elements,
N == sizeof(a)/sizeof(a[0])Thus
size_t
is also a convenient type for array sizes,
and is so used in several library functions.
(See §4.9.8.1, §4.9.8.2, §4.10.3.1, etc.)
The Standard specifies that the argument to
sizeof
can be any value except a bit field,
a void expression,
or a function designator.
This generality allows for interesting environmental enquiries;
given the declarations
int *p, *q;these expressions determine the size of the type used for ...
sizeof(F(x)) /* ... F's return value */ sizeof(p-q) /* ... pointer difference */(The last type is of course available as
ptrdiff_t
in <stddef.h>
.)
A (void)
cast is explicitly permitted,
more for documentation than for utility.
Nothing portable can be said about casting integers to pointers, or vice versa, since the two are now incommensurate.
The definition of these conversions adopted in the Standard resembles that in the Base Document, but with several significant differences. The Base Document required that a pointer successfully converted to an integer must be guaranteed to be convertible back to the same pointer. This integer-to-pointer conversion is now specified as implementation-defined. While a high-quality implementation would preserve the same address value whenever possible, it was considered impractical to require that the identical representation be preserved. The Committee noted that, on some current machine implementations, identical representations are required for efficient code generation for pointer comparisons and arithmetic operations.
The conversion of the integer constant 0 to a pointer is defined similarly to the Base Document. The resulting pointer must not address any object, must appear to be equal to an integer value of 0, and may be assigned to or compared for equality with any other pointer. This definition does not necessarily imply a representation by a bit pattern of all zeros: an implementation could, for instance, use some address which causes a hardware trap when dereferenced.
The type char
must have the least strict alignment of any type,
so char *
has often been used as a portable type for representing arbitrary
object pointers.
This usage creates an unfortunate confusion between the ideas of
arbitrary pointer
and
character or string pointer.
The new type void *
,
which has the same representation as char *
,
is therefore preferable for arbitrary pointers.
It is possible to cast a pointer of some qualified type (§3.5.3) to an unqualified version of that type. Since the qualifier defines some special access or aliasing property, however, any dereference of the cast pointer results in undefined behavior.
The Standard (§3.2.1.4) requires that a cast of one floating point type
to another (e.g., double
to float
)
results in an actual conversion.
There was considerable sentiment for giving more portable semantics
to division (and hence remainder) by specifying some way of giving
less machine dependent results for negative operands.
Few Committee members wanted to require this by default,
lest existing fast code be gravely slowed.
One suggestion was to make signed int
a type distinct from plain
int
,
and require better-defined semantics for signed int
division and remainder.
This suggestion was opposed on the grounds that effectively adding several
types would have consequences out of proportion to the benefit
to be obtained; the Committee twice rejected this approach.
Instead the Committee has adopted new library functions
div
and ldiv
which produce integral quotient and remainder
with well-defined sign semantics.
(See §4.10.6.2, §4.10.6.3.)
The Committee rejected extending the %
operator
to work on floating types;
such usage would duplicate the facility provided by
fmod
.
(See §4.5.6.5.)
As with the sizeof
operator,
implementations have taken different approaches in defining a type for
the difference between two pointers (see §3.3.3.4).
It is important that this type be signed,
in order to obtain proper algebraic ordering
when dealing with pointers within the same array.
However,
the magnitude of a pointer difference can be as large
as the size of the largest object that can be declared.
(And since that is an unsigned type, the difference between two
pointers may cause an overflow.)
The type of pointer minus pointer
is defined to be int
in K&R.
The Standard defines the result of this operation to be a signed integer,
the size of which is implementation-defined.
The type is published as ptrdiff_t
,
in the standard header <stddef.h>
.
Old code recompiled by a conforming compiler may no longer work
if the implementation defines the result of such an
operation to be a type other than int
and if the program depended on the result to be of type int
.
This behavior was considered by the Committee to be correctable.
Overflow was considered not to break old code since it was undefined by K&R.
Mismatch of types between actual and formal argument
declarations is correctable by including a properly defined function
prototype in the scope of the function invocation.
An important endorsement of widespread practice is the requirement that a pointer can always be incremented to just past the end of an array, with no fear of overflow or wraparound:
SOMETYPE array[SPAN]; /* ... */ for (p = &array[0]; p < &array[SPAN]; p++)This stipulation merely requires that every object be followed by one byte whose address is representable. That byte can be the first byte of the next object declared for all but the last object located in a contiguous segment of memory. (In the example, the address
&array[SPAN]
must address a byte following the highest element of array
.)
Since the pointer expression p+1
need not (and should not) be dereferenced,
it is unnecessary to leave room for a complete object of size
sizeof(*p)
.
In the case of p-1
, on the other hand,
an entire object would have to be allocated
prior to the array of objects that p
traverses,
so decrement loops that run off the bottom of an array may fail.
This restriction allows segmented architectures, for instance,
to place objects at the start of a range of addressable memory.
See §3.3.3.3 for a discussion of the arithmetic definition of these operators.
The description of shift operators in K&R suggests that shifting by a
long
count should force the left operand to be widened to
long
before being shifted.
A more intuitive practice, endorsed by the Committee, is that the type
of the shift count has no bearing on the type of the result.
long
count no longer coerces the shifted operand to long
.
For an explanation of why the pointer comparison of the object pointer P
with the pointer expression P+1
is always safe, see Rationale §3.3.6.
The Committee considered, on more than one occasion, permitting comparison of structures for equality. Such proposals foundered on the problem of holes in structures. A byte-wise comparison of two structures would require that the holes assuredly be set to zero so that all holes would compare equal, a difficult task for automatic or dynamically allocated variables. (The possibility of union-type elements in a structure raises insuperable problems with this approach.) Otherwise the implementation would have to be prepared to break a structure comparison into an arbitrary number of member comparisons; a seemingly simple expression could thus expand into a substantial stretch of code, which is contrary to the spirit of C.
In pointer comparisons,
one of the operands may be of type void *
.
In particular, this allows NULL
,
which can be defined as (void
*)0
,
to be compared to any object pointer.
The syntactic restrictions on the middle operand of the conditional operator have been relaxed to include more than just logical-OR-expression: several extant implementations have adopted this practice.
The type of a conditional operator expression can be void
,
a structure, or a union;
most other operators do not deal with such types.
The rules for balancing type between pointer and integer have,
however, been tightened, since now only the constant 0 can portably
be coerced to pointer.
The Standard allows one of the second or third operands to be
of type void *
, if the other is a pointer type.
Since the result of such a conditional expression is void
*
,
an appropriate cast must be used.
Certain syntactic forms of assignment operators have been discontinued, and others tightened up (see §3.1.5).
The storage assignment need not take place until the next sequence point.
(A restriction in earlier drafts that the storage take place before
the value of the expression is used has been removed.)
As a consequence, a straightforward syntactic test for
ambiguous expressions can be stated.
Some definitions:
A side effect is a storage to any data object,
or a read of a volatile object.
An ambiguous expression is one whose value depends upon the
order in which side effects are evaluated.
A pure function is one with no side effects;
an impure function is any other.
A sequenced expression is one whose major operator
defines a sequence point: comma, &&
, ||
, or conditional operator;
an unsequenced expression is any other.
We can then say that an unsequenced expression is ambiguous
if more than one operand invokes any impure function,
or if more than one operand contains an lvalue referencing the
same object and one or more operands specify a side-effect to
that object.
Further, any expression containing an ambiguous expression
is ambiguous.
The optimization rules for factoring out assignments can also
be stated.
Let X(i,S)
be an expression which contains no impure functions
or sequenced operators, and suppose that X
contains a storage
S(i)
to i
.
The storage expressions, and related expressions, are
S(i): Sval(i): Snew(i): ++i i+1 i+1 i++ i i+1 --i i-1 i-1 i-- i i-1 i = y y y i op= y i op y i op yThen
X(i,S)
can be replaced by either
(T = i, i = Snew(i), X(T,Sval))or
(T = X(i,Sval), i = Snew(i), T)provided that neither
i
nor y
have side effects themselves.
Structure assignment has been added: its use was foreshadowed even in K&R, and many existing implementations already support it.
The rules for type compatibility in assignment also apply to argument compatibility between actual argument expressions and their corresponding argument types in a function prototype.
An implementation need not correctly perform an assignment between overlapping operands. Overlapping operands occur most naturally in a union, where assigning one field to another is often desirable to effect a type conversion in place; the assignment may well work properly in all simple cases, but it is not maximally portable. Maximally portable code should use a temporary variable as an intermediate in such an assignment.
The importance of requiring that the left operand lvalue be evaluated only once is not a question of efficiency, although that is one compelling reason for using the compound assignment operators. Rather, it is to assure that any side effects of evaluating the left operand are predictable.
The left operand of a comma operator may be void
,
since only the right-hand operator is relevant to the type of
the expression.
The example in the Standard clarifies that commas separating arguments ``bind'' tighter than the comma operator in expressions.