Since the publication of K&R,
a serious divergence has occurred among implementations of C
in the evolution of integral promotion rules.
Implementations fall into two major camps,
which may be characterized as
unsigned preserving
and
value preserving.
The difference between these approaches
centers on the treatment of unsigned char
and unsigned short
,
when widened by the integral promotions,
but the decision has an impact on
the typing of constants as well (see §3.1.3.2).
The unsigned preserving
approach calls for promoting the two smaller
unsigned types to unsigned int
.
This is a simple rule,
and yields a type which is independent of execution environment.
The value preserving
approach calls for promoting those types to signed int
,
if that type can properly represent all the values of the original type,
and otherwise for promoting those types to unsigned int
.
Thus, if the execution environment represents short
as something
smaller than int
, unsigned short
becomes int
;
otherwise it becomes unsigned int
.
Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with twos-complement arithmetic and quiet wraparound on signed overflow --- that is, in most current implementations. In such implementations, differences between the two only appear when these two conditions are both true:
unsigned char
or unsigned short
produces an int
-wide
result in which the sign bit is set:
i.e., either a unary operation on such a type,
or
a binary operation in which the other operand is an
int
or ``narrower'' type.
sizeof(int) < sizeof(long)
and it is in a context where it must be widened to a long type,
or
/
,
%
,
<
,
<=
,
>
,
or
>=
.
In such circumstances a genuine ambiguity of interpretation arises.
The result must be dubbed questionably signed,
since a case can be made
for either the signed or unsigned interpretation.
Exactly the same ambiguity arises whenever an
unsigned int
confronts a signed int
across an operator,
and the signed int
has a negative value.
(Neither scheme does any better, or any worse, in resolving the ambiguity
of this confrontation.)
Suddenly, the negative signed int
becomes a very large unsigned int
,
which may be surprising ---
or it may be exactly what is desired by a knowledgable programmer.
Of course,
all of these ambiguities can be avoided by a judicious use of casts.
One of the important outcomes of exploring this problem is the understanding that high-quality compilers might do well to look for such questionable code and offer (optional) diagnostics, and that conscientious instructors might do well to warn programmers of the problems of implicit type conversions.
The unsigned preserving rules greatly increase the number of situations
where unsigned int
confronts signed int
to yield a questionably signed result,
whereas the value preserving rules minimize such confrontations.
Thus, the value preserving rules were considered to be safer for the novice,
or unwary, programmer.
After much discussion, the Committee decided in favor of value preserving
rules, despite the fact that the UNIX C compilers had evolved in the
direction of unsigned preserving.
Precise rules are now provided for converting to and from unsigned integers. On a twos-complement machine, the operation is still virtual (no change of representation is required), but the rules are now stated independent of representation.
There was strong agreement that floating values should truncate toward zero when converted to an integral type, the specification adopted in the Standard. Although the Base Document permitted negative floating values to truncate away from zero, no Committee member knew of current hardware that functions in such a manner. [Footnote: We have since been informed of one such implementation.]
The Standard, unlike the Base Document, does not require rounding in the
double
to float
conversion.
Some widely used IEEE floating point processor chips control
floating to integral conversion with the same mode bits
as for double-precision to single-precision conversion;
since truncation-toward-zero is the appropriate setting
for C in the former case,
it would be expensive to require such implementations to round to float
.
The rules in the Standard for these conversions are slight modifications of those in the Base Document: the modifications accommodate the added types and the value preserving rules (see §3.2.1.1). Explicit license has been added to perform calculations in a ``wider'' type than absolutely necessary, since this can sometimes produce smaller and faster code (not to mention the correct answer more often). Calculations can also be performed in a ``narrower'' type, by the as if rule, so long as the same end result is obtained. Explicit casting can always be used to obtain exactly the intermediate types required.
The Committee relaxed the requirement that
float
operands be converted to double
.
An implementation may still choose to convert.
float
operands may now be computed at lower precision.
The Base Document specified that all floating point operations be done in
double
.
A difference of opinion within the C community has centered around the meaning of lvalue, one group considering an lvalue to be any kind of object locator, another group holding that an lvalue is meaningful on the left side of an assigning operator. The Committee has adopted the definition of lvalue as an object locator. The term modifiable lvalue is used for the second of the above concepts.
The role of array objects has been a classic source of confusion in C,
in large part because of the numerous contexts in which an
array reference is converted to a pointer to its first element.
While this conversion neatly handles the semantics of subscripting,
the fact that
a[i]
is itself a modifiable lvalue while a
is not has puzzled many students of the language.
A more precise description has therefore been incorporated in the
Standard, in the hopes of combatting this confusion.
void
The description of operators and expressions is simplified by saying that
void
yields a value,
with the understanding that the value has no representation,
hence requires no storage.
C has now been implemented on a wide range of architectures. While some of these architectures feature uniform pointers which are the size of some integer type, maximally portable code may not assume any necessary correspondence between different pointer types and the integral types.
The use of void *
(``pointer to void'')
as a generic object pointer type
is an invention of the Committee.
Adoption of this type was stimulated by
the desire to specify function prototype arguments
that either quietly convert arbitrary pointers (as in fread
)
or complain if the argument type does not exactly match
(as in strcmp
).
Nothing is said about pointers to functions,
which may be incommensurate with object pointers and/or integers.
Since pointers and integers are now considered incommensurate, the only integer that can be safely converted to a pointer is the constant 0. The result of converting any other integer to a pointer is machine dependent.
Consequences of the treatment of pointer types in the Standard include:
An invalid pointer might be created in several ways. An arbitrary value can be assigned (via a cast) to a pointer variable. (This could even create a valid pointer, depending on the value.) A pointer to an object becomes invalid if the memory containing the object is deallocated. Pointer arithmetic can produce pointers outside the range of an array.
Regardless how an invalid pointer is created, any use of it yields undefined behavior. Even assignment, comparison with a null pointer constant, or comparison with itself, might on some systems result in an exception.
Consider a hypothetical segmented architecture, on which pointers comprise a segment descriptor and an offset. Suppose that segments are relatively small, so that large arrays are allocated in multiple segments. While the segments are valid (allocated, mapped to real memory), the hardware, operating system, or C implementation can make these multiple segments behave like a single object: pointer arithmetic and relational operators use the defined mapping to impose the proper order on the elements of the array. Once the memory is deallocated, the mapping is no longer guaranteed to exist; use of the segment descriptor might now cause an exception, or the hardware addressing logic might return meaningless data.