3.3 Expressions

Several closely-related topics are involved in the precise specification of expression evaluation: precedence, associativity, grouping, sequence points, agreement points, order of evaluation, and interleaving. The latter three terms are discussed in §2.1.2.3.

The rules of precedence are encoded into the syntactic rules for each operator. For example, the syntax for additive-expression includes the rule

additive-expression

+

multiplicative-expression

which implies that a+b*c parses as a+(b*c). The rules of associativity are similarly encoded into the syntactic rules. For example, the syntax for assignment-expression includes the rule

unary-expression assignment-operator assignment-expression

which implies that a=b=c parses as a=(b=c).

With rules of precedence and associativity thus embodied in the syntax rules, the Standard specifies, in general, the grouping (association of operands with operators) in an expression.

The Base Document describes C as a language in which the operands of successive identical commutative associative operators can be regrouped. The Committee has decided to remove this license from the Standard, thus bringing C into accord with most other major high-level languages.

This change was motivated primarily by the desire to make C more suitable for floating point programming. Floating point arithmetic does not obey many of the mathematical rules that real arithmetic does. For instance, the two expressions (a+b)+c and a+(b+c) may well yield different results: suppose that b is greater than 0, a equals -b, and c is positive but substantially smaller than b. (That is, suppose c/b is less than DBL_EPSILON.) Then (a+b)+c is 0+c, or c, while a+(b+c) equals a+b, or 0. That is to say, floating point addition (and multiplication) is not associative.

The Base Document's rule imposes a high cost on translation of numerical code to C. Much numerical code is written in FORTRAN, which does provide a no-regrouping guarantee; indeed, this is the normal semantic interpretation in most high-level languages other than C. The Base Document's advice, ``rewrite using explicit temporaries,'' is burdensome to those with tens or hundreds of thousands of lines of code to convert, a conversion which in most other respects could be done automatically.

Elimination of the regrouping rule does not in fact prohibit much regrouping of integer expressions. The bitwise logical operators can be arbitrarily regrouped, since any regrouping gives the same result as if the expression had not been regrouped. This is also true of integer addition and multiplication in implementations with twos-complement arithmetic and silent wraparound on overflow. Indeed, in any implementation, regroupings which do not introduce overflows behave as if no regrouping had occurred. (Results may also differ in such an implementation if the expression as written results in overflows: in such a case the behavior is undefined, so any regrouping couldn't be any worse.)

The types of lvalues that may be used to access an object have been restricted so that an optimizer is not required to make worst-case aliasing assumptions.

In practice, aliasing arises with the use of pointers. A contrived example to illustrate the issues is

        int a;

        void f(int * b)
        {
            a = 1;
            *b = 2;
            g(a);
        }

It is tempting to generate the call to g as if the source expression were g(1), but b might point to a, so this optimization is not safe. On the other hand, consider

        int a;
        void f( double * b )
        {
            a = 1;
            *b = 2.0;
            g(a);
        }

Again the optimization is incorrect only if b points to a. However, this would only have come about if the address of a were somewhere cast to (double*). The Committee has decided that such dubious possibilities need not be allowed for.

In principle, then, aliasing only need be allowed for when the lvalues all have the same type. In practice, the Committee has recognized certain prevalent exceptions:

The lvalue types may differ in signedness. In the common range, a signed integral type and its unsigned variant have the same representation; it was felt that an appreciable body of existing code is not ``strictly typed'' in this area.
Character pointer types are often used in the bytewise manipulation of objects; a byte stored through such a character pointer may well end up in an object of any type.
A qualified version of the object's type, though formally a different type, provides the same interpretation of the value of the object.

Structure and union types also have problematic aliasing properties:

        struct fi{ float f; int i;};

        void f( struct fi * fip, int * ip )
        {
            static struct fi a = {2.0, 1};
            *ip = 2;
            *fip = a;
            g(*ip);

            *fip = a;
            *ip = 2;
            g(fip->i);
        }

It is not safe to optimize the first call to g as g(2), or the second as g(1), since the call to f could quite legitimately have been

        struct fi  x;
        f( &x, &x.i );

These observations explain the other exception to the same-type principle.

3.3.1 Primary expressions

A primary expression may be void (parenthesized call to a function returning void), a function designator (identifier or parenthesized function designator), an lvalue (identifier or parenthesized lvalue), or simply a value expression. Constraints ensure that a void primary expression is no part of a further expression, except that a void expression may be cast to void, may be the second or third operand of a conditional operator, or may be an operand of a comma operator.

3.3.2 Postfix operators

3.3.2.1 Array subscripting

The Committee found no reason to disallow the symmetry that permits a[i] to be written as i[a].

The syntax and semantics of multidimensional arrays follow logically from the definition of arrays and the subscripting operation. The material in the Standard on multidimensional arrays introduces no new language features, but clarifies the C treatment of this important abstract data type.

3.3.2.2 Function calls

Pointers to functions may be used either as (*pf)() or as pf(). The latter construct, not sanctioned in the Base Document, appears in some present versions of C, is unambiguous, invalidates no old code, and can be an important shorthand. The shorthand is useful for packages that present only one external name, which designates a structure full of pointers to objects and functions: member functions can be called as graphics.open(file) instead of (*graphics.open)(file).

The treatment of function designators can lead to some curious, but valid, syntactic forms. Given the declarations:

        int f(), (*pf)();

then all of the following expressions are valid function calls:

        (&f)(); f(); (*f)(); (**f)(); (***f)();
        pf();   (*pf)(); (**pf)(); (***pf)();

The first expression on each line was discussed in the previous paragraph. The second is conventional usage. All subsequent expressions take advantage of the implicit conversion of a function designator to a pointer value, in nearly all expression contexts. The Committee saw no real harm in allowing these forms; outlawing forms like (*f)(), while still permitting *a (for int a[]), simply seemed more trouble than it was worth.

The rule for implicit declaration of functions has been retained, but various past ambiguities have been resolved by describing this usage in terms of a corresponding explicit declaration.

For compatibility with past practice, all argument promotions occur as described in the Base Document in the absence of a prototype declaration, including the (not always desirable) promotion of float to double. A prototype gives the implementor explicit license to pass a float as a float rather than a double, or a char as a char rather than an int, or an argument in a special register, etc. If the definition of a function in the presence of a prototype would cause the function to expect other than the default promotion types, then clearly the calls to this function must be made in the presence of a compatible prototype.

To clarify this and other relationships between function calls and function definitions, the Standard describes an equivalence between a function call or definition which does occur in the presence of a prototype and one that does not.

Thus a prototyped function with no ``narrow'' types and no variable argument list must be callable in the absence of a prototype, since the types actually passed in a call are equivalent to the explicit function definition prototype. This constraint is necessary to retain compatibility with past usage of library functions. (See §4.1.3.)

This provision constrains the latitude of an implementor because the parameter passing conventions of prototype and non-prototype function calls must be the same for functions accepting a fixed number of arguments. Implementations in environments where efficient function calling mechanisms are available must, in effect, use the efficient calling sequence either in all ``fixed argument list'' calls or in none. Since efficient calling sequences often do not allow for variable argument functions, the fixed part of a variable argument list may be passed in a completely different fashion than in a fixed argument list with the same number and type of arguments.

The existing practice of omitting trailing parameters in a call if it is known that the parameters will not be used has consistently been discouraged. Since omission of such parameters creates an inequivalence between the call and the declaration, the behavior in such cases is undefined, and a maximally portable program will avoid this usage. Hence an implementation is free to implement a function calling mechanism for fixed argument lists which would (perhaps fatally) fail if the wrong number or type of arguments were to be provided.

Strictly speaking then, calls to printf are obliged to be in the scope of a prototype (as by #include <stdio.h>), but implementations are not obliged to fail on such a lapse. (The behavior is undefined).

3.3.2.3 Structure and union members

Since the language now permits structure parameters, structure assignment and functions returning structures, the concept of a structure expression is now part of the C language. A structure value can be produced by an assignment, by a function call, by a comma operator expression or by a conditional operator expression:

        s1 = (s2 = s3)
        sf(x)
        (x, s1)
        x ? s1 : s2

In these cases, the result is not an lvalue; hence it cannot be assigned to nor can its address be taken.

Similarly, x.y is an lvalue only if x is an lvalue. Thus none of the following valid expressions are lvalues:

        sf(3).a
        (s1=s2).a
        ((i==6)?s1:s2).a
        (x,s1).a

Even when x.y is an lvalue, it may not be modifiable:

        const struct S s1;
        s1.a = 3;          /* invalid */

The Standard requires that an implementation diagnose a constraint error in the case that the member of a structure or union designated by the identifier following a member selection operator (. or ->) does not appear in the type of the structure or union designated by the first operand. The Base Document is unclear on this point.

3.3.2.4 Postfix increment and decrement operators

The Committee has not endorsed the practice in some implementations of considering post-increment and post-decrement operator expressions to be lvalues.

3.3.3 Unary operators

3.3.3.1 Prefix increment and decrement operators

See §3.3.2.4.

3.3.3.2 Address and indirection operators

Some implementations have not allowed the & operator to be applied to an array or a function. (The construct was permitted in early versions of C, then later made optional.) The Committee has endorsed the construct since it is unambiguous, and since data abstraction is enhanced by allowing the important & operator to apply uniformly to any addressable entity.

3.3.3.3 Unary arithmetic operators

Unary plus was adopted by the Committee from several implementations, for symmetry with unary minus.

The bitwise complement operator ~, and the other bitwise operators, have now been defined arithmetically for unsigned operands. Such operations are well-defined because of the restriction of integral representations to ``binary numeration systems.'' (See §3.1.2.5.)

3.3.3.4 The `sizeof` operator

It is fundamental to the correct usage of functions such as malloc and fread that sizeof (char) be exactly one. In practice, this means that a byte in C terms is the smallest unit of storage, even if this unit is 36 bits wide; and all objects are comprised of an integral number of these smallest units. (See §1.6.)

The Standard, like the Base Document, defines the result of the sizeof operator to be a constant of an unsigned integral type. Common implementations, and common usage, have often presumed that the resulting type is int. Old code that depends on this behavior has never been portable to implementations that define the result to be a type other than int. The Committee did not feel it was proper to change the language to protect incorrect code.

The type of sizeof, whatever it is, is published (in the library header <stddef.h>) as size_t, since it is useful for the programmer to be able to refer to this type. This requirement implicitly restricts size_t to be a synonym for an existing unsigned integer type, thus quashing any notion that the largest declarable object might be too big to span even with an unsigned long. This also restricts the maximum number of elements that may be declared in an array, since for any array a of N elements,

        N == sizeof(a)/sizeof(a[0])

Thus size_t is also a convenient type for array sizes, and is so used in several library functions. (See §4.9.8.1, §4.9.8.2, §4.10.3.1, etc.)

The Standard specifies that the argument to sizeof can be any value except a bit field, a void expression, or a function designator. This generality allows for interesting environmental enquiries; given the declarations

        int  *p, *q;

these expressions determine the size of the type used for ...

        sizeof(F(x))    /* ... F's return value */
        sizeof(p-q)     /* ... pointer difference */

(The last type is of course available as ptrdiff_t in <stddef.h>.)

3.3.4 Cast operators

A (void) cast is explicitly permitted, more for documentation than for utility.

Nothing portable can be said about casting integers to pointers, or vice versa, since the two are now incommensurate.

The definition of these conversions adopted in the Standard resembles that in the Base Document, but with several significant differences. The Base Document required that a pointer successfully converted to an integer must be guaranteed to be convertible back to the same pointer. This integer-to-pointer conversion is now specified as implementation-defined. While a high-quality implementation would preserve the same address value whenever possible, it was considered impractical to require that the identical representation be preserved. The Committee noted that, on some current machine implementations, identical representations are required for efficient code generation for pointer comparisons and arithmetic operations.

The conversion of the integer constant 0 to a pointer is defined similarly to the Base Document. The resulting pointer must not address any object, must appear to be equal to an integer value of 0, and may be assigned to or compared for equality with any other pointer. This definition does not necessarily imply a representation by a bit pattern of all zeros: an implementation could, for instance, use some address which causes a hardware trap when dereferenced.

The type char must have the least strict alignment of any type, so char * has often been used as a portable type for representing arbitrary object pointers. This usage creates an unfortunate confusion between the ideas of arbitrary pointer and character or string pointer. The new type void *, which has the same representation as char *, is therefore preferable for arbitrary pointers.

It is possible to cast a pointer of some qualified type (§3.5.3) to an unqualified version of that type. Since the qualifier defines some special access or aliasing property, however, any dereference of the cast pointer results in undefined behavior.

The Standard (§3.2.1.4) requires that a cast of one floating point type to another (e.g., double to float) results in an actual conversion.

3.3.5 Multiplicative operators

There was considerable sentiment for giving more portable semantics to division (and hence remainder) by specifying some way of giving less machine dependent results for negative operands. Few Committee members wanted to require this by default, lest existing fast code be gravely slowed. One suggestion was to make signed int a type distinct from plain int, and require better-defined semantics for signed int division and remainder. This suggestion was opposed on the grounds that effectively adding several types would have consequences out of proportion to the benefit to be obtained; the Committee twice rejected this approach. Instead the Committee has adopted new library functions div and ldiv which produce integral quotient and remainder with well-defined sign semantics. (See §4.10.6.2, §4.10.6.3.)

The Committee rejected extending the % operator to work on floating types; such usage would duplicate the facility provided by fmod. (See §4.5.6.5.)

3.3.6 Additive operators

As with the sizeof operator, implementations have taken different approaches in defining a type for the difference between two pointers (see §3.3.3.4). It is important that this type be signed, in order to obtain proper algebraic ordering when dealing with pointers within the same array. However, the magnitude of a pointer difference can be as large as the size of the largest object that can be declared. (And since that is an unsigned type, the difference between two pointers may cause an overflow.)

The type of pointer minus pointer is defined to be int in K&R. The Standard defines the result of this operation to be a signed integer, the size of which is implementation-defined. The type is published as ptrdiff_t, in the standard header <stddef.h>. Old code recompiled by a conforming compiler may no longer work if the implementation defines the result of such an operation to be a type other than int and if the program depended on the result to be of type int. This behavior was considered by the Committee to be correctable. Overflow was considered not to break old code since it was undefined by K&R. Mismatch of types between actual and formal argument declarations is correctable by including a properly defined function prototype in the scope of the function invocation.

An important endorsement of widespread practice is the requirement that a pointer can always be incremented to just past the end of an array, with no fear of overflow or wraparound:

        SOMETYPE array[SPAN];
        /* ... */
        for (p = &array[0]; p < &array[SPAN]; p++)

This stipulation merely requires that every object be followed by one byte whose address is representable. That byte can be the first byte of the next object declared for all but the last object located in a contiguous segment of memory. (In the example, the address &array[SPAN] must address a byte following the highest element of array.) Since the pointer expression p+1 need not (and should not) be dereferenced, it is unnecessary to leave room for a complete object of size sizeof(*p).

In the case of p-1, on the other hand, an entire object would have to be allocated prior to the array of objects that p traverses, so decrement loops that run off the bottom of an array may fail. This restriction allows segmented architectures, for instance, to place objects at the start of a range of addressable memory.

3.3.7 Bitwise shift operators

See §3.3.3.3 for a discussion of the arithmetic definition of these operators.

The description of shift operators in K&R suggests that shifting by a long count should force the left operand to be widened to long before being shifted. A more intuitive practice, endorsed by the Committee, is that the type of the shift count has no bearing on the type of the result.

QUIET CHANGE

long

The Committee has affirmed the freedom in implementation granted by the Base Document in not requiring the signed right shift operation to sign extend, since such a requirement might slow down fast code and since the usefulness of sign extended shifts is marginal. (Shifting a negative twos-complement integer arithmetically right one place is not the same as dividing by two!)

3.3.8 Relational operators

For an explanation of why the pointer comparison of the object pointer P with the pointer expression P+1 is always safe, see Rationale §3.3.6.

3.3.9 Equality operators

The Committee considered, on more than one occasion, permitting comparison of structures for equality. Such proposals foundered on the problem of holes in structures. A byte-wise comparison of two structures would require that the holes assuredly be set to zero so that all holes would compare equal, a difficult task for automatic or dynamically allocated variables. (The possibility of union-type elements in a structure raises insuperable problems with this approach.) Otherwise the implementation would have to be prepared to break a structure comparison into an arbitrary number of member comparisons; a seemingly simple expression could thus expand into a substantial stretch of code, which is contrary to the spirit of C.

In pointer comparisons, one of the operands may be of type void *. In particular, this allows NULL, which can be defined as (void *)0, to be compared to any object pointer.

3.3.10 Bitwise AND operator

See §3.3.3.3 for a discussion of the arithmetic definition of the bitwise operators.

3.3.11 Bitwise exclusive OR operator

See §3.3.3.3.

3.3.12 Bitwise inclusive OR operator

See §3.3.3.3.

3.3.13 Logical AND operator

3.3.14 Logical OR operator

3.3.15 Conditional operator

The syntactic restrictions on the middle operand of the conditional operator have been relaxed to include more than just logical-OR-expression: several extant implementations have adopted this practice.

The type of a conditional operator expression can be void, a structure, or a union; most other operators do not deal with such types. The rules for balancing type between pointer and integer have, however, been tightened, since now only the constant 0 can portably be coerced to pointer.

The Standard allows one of the second or third operands to be of type void *, if the other is a pointer type. Since the result of such a conditional expression is void *, an appropriate cast must be used.

3.3.16 Assignment operators

Certain syntactic forms of assignment operators have been discontinued, and others tightened up (see §3.1.5).

The storage assignment need not take place until the next sequence point. (A restriction in earlier drafts that the storage take place before the value of the expression is used has been removed.) As a consequence, a straightforward syntactic test for ambiguous expressions can be stated. Some definitions: A side effect is a storage to any data object, or a read of a volatile object. An ambiguous expression is one whose value depends upon the order in which side effects are evaluated. A pure function is one with no side effects; an impure function is any other. A sequenced expression is one whose major operator defines a sequence point: comma, &&, ||, or conditional operator; an unsequenced expression is any other. We can then say that an unsequenced expression is ambiguous if more than one operand invokes any impure function, or if more than one operand contains an lvalue referencing the same object and one or more operands specify a side-effect to that object. Further, any expression containing an ambiguous expression is ambiguous.

The optimization rules for factoring out assignments can also be stated. Let X(i,S) be an expression which contains no impure functions or sequenced operators, and suppose that X contains a storage S(i) to i. The storage expressions, and related expressions, are

        S(i):      Sval(i):       Snew(i):
        ++i        i+1            i+1
        i++        i              i+1
        --i        i-1            i-1
        i--        i              i-1
        i = y      y              y
        i op= y    i op y         i op y

Then X(i,S) can be replaced by either

        (T = i, i = Snew(i), X(T,Sval))

        (T = X(i,Sval), i = Snew(i), T)

provided that neither i nor y have side effects themselves.

3.3.16.1 Simple assignment

Structure assignment has been added: its use was foreshadowed even in K&R, and many existing implementations already support it.

The rules for type compatibility in assignment also apply to argument compatibility between actual argument expressions and their corresponding argument types in a function prototype.

An implementation need not correctly perform an assignment between overlapping operands. Overlapping operands occur most naturally in a union, where assigning one field to another is often desirable to effect a type conversion in place; the assignment may well work properly in all simple cases, but it is not maximally portable. Maximally portable code should use a temporary variable as an intermediate in such an assignment.

3.3.16.2 Compound assignment

The importance of requiring that the left operand lvalue be evaluated only once is not a question of efficiency, although that is one compelling reason for using the compound assignment operators. Rather, it is to assure that any side effects of evaluating the left operand are predictable.

3.3.17 Comma operator

The left operand of a comma operator may be void, since only the right-hand operator is relevant to the type of the expression.

The example in the Standard clarifies that commas separating arguments ``bind'' tighter than the comma operator in expressions.

3.2 Conversions

ANSI C Rationale

3.4 Constant Expressions Index