3.3 Expressions Index

3.2 Conversions

3.2.1 Arithmetic operands

3.2.1.1 Characters and integers

Since the publication of K&R, a serious divergence has occurred among implementations of C in the evolution of integral promotion rules. Implementations fall into two major camps, which may be characterized as unsigned preserving and value preserving. The difference between these approaches centers on the treatment of unsigned char and unsigned short, when widened by the integral promotions, but the decision has an impact on the typing of constants as well (see §3.1.3.2).

The unsigned preserving approach calls for promoting the two smaller unsigned types to unsigned int. This is a simple rule, and yields a type which is independent of execution environment.

The value preserving approach calls for promoting those types to signed int, if that type can properly represent all the values of the original type, and otherwise for promoting those types to unsigned int. Thus, if the execution environment represents short as something smaller than int, unsigned short becomes int; otherwise it becomes unsigned int.

Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with twos-complement arithmetic and quiet wraparound on signed overflow --- that is, in most current implementations. In such implementations, differences between the two only appear when these two conditions are both true:

An expression involving an unsigned char or unsigned short produces an int-wide result in which the sign bit is set: i.e., either a unary operation on such a type, or a binary operation in which the other operand is an int or ``narrower'' type.
The result of the preceding expression is used in a context in which its signedness is significant:
- sizeof(int) < sizeof(long) and it is in a context where it must be widened to a long type, or
- it is the left operand of the right-shift operator (in an implementation where this shift is defined as arithmetic), or
- it is either operand of /, %, <, <=, >, or >=.

In such circumstances a genuine ambiguity of interpretation arises. The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. (Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation.) Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising --- or it may be exactly what is desired by a knowledgable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.

One of the important outcomes of exploring this problem is the understanding that high-quality compilers might do well to look for such questionable code and offer (optional) diagnostics, and that conscientious instructors might do well to warn programmers of the problems of implicit type conversions.

The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.

QUIET CHANGE

The Standard clarifies that the integral promotion rules also apply to bit-fields.

3.2.1.2 Signed and unsigned integers

Precise rules are now provided for converting to and from unsigned integers. On a twos-complement machine, the operation is still virtual (no change of representation is required), but the rules are now stated independent of representation.

3.2.1.3 Floating and integral

There was strong agreement that floating values should truncate toward zero when converted to an integral type, the specification adopted in the Standard. Although the Base Document permitted negative floating values to truncate away from zero, no Committee member knew of current hardware that functions in such a manner. [Footnote: We have since been informed of one such implementation.]

3.2.1.4 Floating types

The Standard, unlike the Base Document, does not require rounding in the double to float conversion. Some widely used IEEE floating point processor chips control floating to integral conversion with the same mode bits as for double-precision to single-precision conversion; since truncation-toward-zero is the appropriate setting for C in the former case, it would be expensive to require such implementations to round to float.

3.2.1.5 Usual arithmetic conversions

The rules in the Standard for these conversions are slight modifications of those in the Base Document: the modifications accommodate the added types and the value preserving rules (see §3.2.1.1). Explicit license has been added to perform calculations in a ``wider'' type than absolutely necessary, since this can sometimes produce smaller and faster code (not to mention the correct answer more often). Calculations can also be performed in a ``narrower'' type, by the as if rule, so long as the same end result is obtained. Explicit casting can always be used to obtain exactly the intermediate types required.

The Committee relaxed the requirement that float operands be converted to double. An implementation may still choose to convert.

QUIET CHANGE

float

double

3.2.2 Other operands

3.2.2.1 Lvalues and function designators

A difference of opinion within the C community has centered around the meaning of lvalue, one group considering an lvalue to be any kind of object locator, another group holding that an lvalue is meaningful on the left side of an assigning operator. The Committee has adopted the definition of lvalue as an object locator. The term modifiable lvalue is used for the second of the above concepts.

The role of array objects has been a classic source of confusion in C, in large part because of the numerous contexts in which an array reference is converted to a pointer to its first element. While this conversion neatly handles the semantics of subscripting, the fact that a[i] is itself a modifiable lvalue while a is not has puzzled many students of the language. A more precise description has therefore been incorporated in the Standard, in the hopes of combatting this confusion.

3.2.2.2 `void`

The description of operators and expressions is simplified by saying that void yields a value, with the understanding that the value has no representation, hence requires no storage.

3.2.2.3 Pointers

C has now been implemented on a wide range of architectures. While some of these architectures feature uniform pointers which are the size of some integer type, maximally portable code may not assume any necessary correspondence between different pointer types and the integral types.

The use of void * (``pointer to void'') as a generic object pointer type is an invention of the Committee. Adoption of this type was stimulated by the desire to specify function prototype arguments that either quietly convert arbitrary pointers (as in fread) or complain if the argument type does not exactly match (as in strcmp). Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.

Since pointers and integers are now considered incommensurate, the only integer that can be safely converted to a pointer is the constant 0. The result of converting any other integer to a pointer is machine dependent.

Consequences of the treatment of pointer types in the Standard include:

A pointer to void may be converted to a pointer to an object of any type.
A pointer to any object of any type may be converted to a pointer to void.
If a pointer to an object is converted to a pointer to void and back again to the original pointer type, the result compares equal to original pointer.
It is invalid to convert a pointer to an object of any type to a pointer to an object of a different type without an explicit cast.
Even with an explicit cast, it is invalid to convert a function pointer to an object pointer or a pointer to void, or vice-versa.
It is invalid to convert a pointer to a function of one type to a pointer to a function of a different type without a cast.
Pointers to functions that have different parameter-type information (including the ``old-style'' absence of parameter-type information) are different types.

Implicit in the Standard is the notion of invalid pointers. In discussing pointers, the Standard typically refers to ``a pointer to an object'' or ``a pointer to a function'' or ``a null pointer.'' A special case in address arithmetic allows for a pointer to just past the end of an array. Any other pointer is invalid.

An invalid pointer might be created in several ways. An arbitrary value can be assigned (via a cast) to a pointer variable. (This could even create a valid pointer, depending on the value.) A pointer to an object becomes invalid if the memory containing the object is deallocated. Pointer arithmetic can produce pointers outside the range of an array.

Regardless how an invalid pointer is created, any use of it yields undefined behavior. Even assignment, comparison with a null pointer constant, or comparison with itself, might on some systems result in an exception.

Consider a hypothetical segmented architecture, on which pointers comprise a segment descriptor and an offset. Suppose that segments are relatively small, so that large arrays are allocated in multiple segments. While the segments are valid (allocated, mapped to real memory), the hardware, operating system, or C implementation can make these multiple segments behave like a single object: pointer arithmetic and relational operators use the defined mapping to impose the proper order on the elements of the array. Once the memory is deallocated, the mapping is no longer guaranteed to exist; use of the segment descriptor might now cause an exception, or the hardware addressing logic might return meaningless data.

3.1 Lexical Elements

ANSI C Rationale