Many implementations of the C runtime environment
(most notably the UNIX operating system)
provide, aside from the standard I/O library
(fopen
,
fclose
,
fread
,
fwrite
,
fseek
),
a set of unbuffered I/O services
(open
, close
, read
, write
, lseek
).
The Committee has decided not to standardize the latter set
of functions.
A suggested semantics for these functions in the UNIX world may be found in the emerging IEEE P1003 standard. The standard I/O library functions use a file pointer for referring to the desired I/O stream. The unbuffered I/O services use a file descriptor (a small integer) to refer to the desired I/O stream.
Due to weak implementations of the standard I/O library,
many implementors have assumed
that the standard I/O library was used for small records
and that the unbuffered I/O library was used for large records.
However, a good implementation of the standard I/O library can match
the performance of the unbuffered services on large records.
The user also has the capability of tuning the performance of the standard
I/O library (with setvbuf
) to suit the application.
Some subtle differences between the two sets of services can make the implementation of the unbuffered I/O services difficult:
The macros
_IOFBF
, _IOLBF
, _IONBF
are enumerations of the third argument to setvbuf
,
a function adopted from UNIX System V.
SEEK_CUR
, SEEK_END
, and SEEK_SET
have been moved to <stdio.h>
from a header specified in the Base Document and not retained in the Standard.
FOPEN_MAX
and
TMP_MAX
are added environmental limits of some interest to
programs that manipulate multiple temporary files.
FILENAME_MAX
is provided so that buffers to hold file
names can be conveniently declared. If the target system supports
arbitrarily long filenames, the implementor should provide some
reasonable value (80?, 255?, 509?) rather than something unusable
like USHRT_MAX
.
C inherited its notion of text streams from the UNIX environment in which it was born. Having each line delimited by a single new-line character, regardless of the characteristics of the actual terminal, supported a simple model of text as a sort of arbitrary length scroll or ``galley.'' Having a channel that is ``transparent'' (no file structure or reserved data encodings) eliminated the need for a distinction between text and binary streams.
Many other environments have different properties, however. If a program written in C is to produce a text file digestible by other programs, by text editors in particular, it must conform to the text formatting conventions of that environment.
The I/O facilities defined by the Standard are both more complex and more restrictive than the ancestral I/O facilities of UNIX. This is justified on pragmatic grounds: most of the differences, restrictions and omissions exist to permit C I/O implementations in environments which differ from the UNIX I/O model.
Troublesome aspects of the stream concept include:
Some environments represent text lines as blank-filled fixed-length records. Thus the Standard specifies that it is implementation-defined whether trailing blanks are removed from a line on input. (This specification also addresses the problems of environments which represent text as variable-length records, but do not allow a record length of 0: an empty line may be written as a one-character record containing a blank, and the blank is stripped on input.)
ftell
returns a file position indicator,
which has no necessary interpretation except that an fseek
operation with that indicator value will position the file to
the same place.
Thus an implementation may encode whatever file positioning
information is most appropriate for a text file,
subject only to the constraint that the encoding be representable
as a long
.
Use of fgetpos
and fsetpos
removes even this
constraint.
setbuf
and setvbuf
functions,
but permitting great latitude in their implementation.
A conforming library need neither
attempt the impossible nor respond to a program attempt to improve
efficiency by introducing additional overhead.
Thus, the Standard imposes a clear distinction between text streams, which must be mapped to suit local custom, and binary streams, for which no mapping takes place. Local custom on UNIX (and related) systems is of course to treat the two sorts of streams identically, and nothing in the Standard requires any changes to this practice.
Even the specification of binary streams requires some changes to accommodate a wide range of systems. Because many systems do not keep track of the length of a file to the nearest byte, an arbitrary number of characters may appear on the end of a binary stream directed to a file. The Standard cannot forbid this implementation, but does require that this padding consist only of null characters. The alternative would be to restrict C to producing binary files digestible only by other C programs; this alternative runs counter to the spirit of C.
The set of characters required to be preserved in text stream I/O are those needed for writing C programs; the intent is the Standard should permit a C translator to be written in a maximally portable fashion. Control characters such as backspace are not required for this purpose, so their handling in text streams is not mandated.
It was agreed that some minimum maximum line length must be mandated; 254 was chosen.
The as if
principle is once again invoked to define the nature of input and output
in terms of just two functions,
fgetc
and fputc
.
The actual primitives in a given system may be quite different.
Buffering, and unbuffering, is defined in a way suggesting the desired interactive behavior; but an implementation may still be conforming even if delays (in a network or terminal controller) prevent output from appearing in time. It is the intent that matters here.
No constraints are imposed upon file names, except that they must be representable as strings (with no embedded null characters).
remove
function
The Base Document provides the unlink
system call to remove files.
The UNIX-specific definition of this function prompted
the Committee to replace it with a portable function.
rename
function
This function has been added to provide
a system-independent atomic operation
to change the name of an existing file;
the Base Document only provided the link
system call,
which gives the file a new name without removing the old one,
and which is extremely system-dependent.
The Committee considered a proposal that rename
should quietly copy a file if simple
renaming couldn't be performed in some context,
but rejected this as potentially too expensive at execution time.
rename
is meant to give access to an underlying facility of
the execution environment's operating system.
When the new name is the name of an existing file,
some systems allow the renaming
(and delete the old file or make it inaccessible by that name),
while others prohibit the operation.
The effect of rename
is thus implementation-defined.
tmpfile
function
The tmpfile
function is intended to allow users to create binary
``scratch'' files.
The as if principle implies that the information in such a file need
never actually be stored on a file-structured device.
The temporary file is created in binary update mode, because it will presumably be first written and then read as transparently as possible. Trailing null-character padding may cause problems for some existing programs.
tmpnam
function
This function allows for more control than tmpfile
:
a file can be opened in binary mode or text mode,
and files are not erased at completion.
There is always some time between the call to tmpnam
and the use
(in fopen
) of the returned name.
Hence it is conceivable that in some implementations
the name, which named no file at the call to tmpnam
,
has been used as a filename by the time of the call to fopen
.
Implementations should devise name-generation strategies which minimize
this possibility, but users should allow for this possibility.
fclose
functionOn some operating systems it is difficult, or impossible, to create a file unless something is written to the file. A maximally portable program which relies on a file being created must write something to the associated stream before closing it.
fflush
function
The fflush
function ensures that output has been
forced out of internal I/O buffers for a specified stream.
Occasionally, however, it is necessary to ensure that all
output is forced out, and the programmer may not conveniently be
able to specify all the currently-open streams (perhaps because
some streams are manipulated within library packages). [Footnote: For instance, on a system (such as UNIX) which supports
process forks, it is usually necessary to flush all output buffers just
prior to the fork.]
To provide an implementation-independent method of flushing all
output buffers, the Standard specifies that this is the result of
calling fflush
with a NULL argument.
fopen
function
The b
type modifier has been added to deal with the text/binary dichotomy
(see §4.9.2).
Because of the limited ability to seek within text files (see §4.9.9.1),
an implementation is at liberty to treat the old update
+
modes as
if b
were also specified.
Table 4.1 tabulates the capabilities and actions
associated with the various specified mode string arguments to
fopen
.
r w a r+ w+ a+ file must exist before open x - - x - - old file contents discarded on open - x - - x - stream can be read x - - x x x stream can be written - x x x x x stream can be written only at end - - x - - x
fopen
modes
setvbuf
function.
(See §4.9.5.6.)
An implementation may choose to allow additional file
specifications as part of the mode
string argument.
For instance,
file1 = fopen(file1name,"wb,reclen=80");might be a reasonable way, on a system which provides record-oriented binary files, for an implementation to allow a programmer to specify record length.
A change of input/output direction on an update file
is only allowed following a
fsetpos
,
fseek
,
rewind
,
or fflush
operation,
since these are precisely the functions
which assure that the I/O buffer has been flushed.
The Standard (§4.9.2) imposes the requirement that binary files
not be truncated when they are updated.
This rule does not preclude an implementation from supporting additional
file types that do truncate when written to,
even when they are opened with the same sort of fopen
call.
Magnetic tape files are an example of a file type that must be
handled this way. (On most tape hardware it is impossible to write
to a tape without destroying immediately following data.)
Hence tape files are not ``binary files'' within the meaning of
the Standard.
A conforming hosted implementation must provide (and document) at
least one file type (on disk, most likely) that behaves exactly
as specified in the Standard.
freopen
function
setbuf
function
setbuf
is subsumed by
setvbuf
,
but has been retained for compatibility with old code.
setvbuf
function
setvbuf
has been adopted from UNIX System V,
both to control the nature of stream buffering
and to specify the size of I/O buffers.
An implementation is not required to make actual use of a buffer
provided for a stream,
so a program must never expect the buffer's contents to reflect I/O
operations.
Further, the Standard does not require that the requested buffering
be implemented;
it merely mandates a standard mechanism for requesting whatever buffering
services might be provided.
Although three types of buffering are defined, an implementation may choose to make one or more of them equivalent. For example, a library may choose to implement line-buffering for binary files as equivalent to unbuffered I/O or may choose to always implement full-buffering as equivalent to line-buffering.
The general principle is to provide portable code with a means of requesting the most appropriate popular buffering style, but not to require an implementation to support these styles.
fprintf
function
Use of the L
modifier with floating conversions has been added
to deal with formatted output of the new type long double
.
Note that the %X
and %x
formats
expect a corresponding int
argument;
%lX
or %lx
must be supplied with a long int
argument.
The conversion specification %p
has been added for pointer conversion,
since the size of a pointer is not necessarily the same as the size of an
int
.
Because an implementation may support more than one size of pointer,
the corresponding argument is expected to be a
(void *)
pointer.
The %n
format has been added to permit ascertaining the number
of characters converted up to that point in the current invocation of the
formatter.
Some pre-Standard implementations switch formats for %g
at an exponent of -3 instead of (the Standard's) -4:
existing code which requires the format switch at -3 will
have to be changed.
Some existing implementations provide %D
and %O
as synonyms or replacements for %ld
and %lo
.
The Committee considered the latter notation preferable.
The Committee has reserved lower case conversion specifiers for future standardization.
The use of leading zero in field widths to specify zero padding has been superseded by a precision field. The older mechanism has been retained.
Some implementations have provided the format %r
as a means of indirectly passing a variable-length argument list.
The functions vfprintf
, etc.,
are considered to be a more controlled method of effecting this indirection,
so %r
was not adopted in the Standard.
(See §4.9.6.7.)
The printing formats for numbers is not entirely specified. The requirements of the Standard are loose enough to allow implementations to handle such cases as signed zero, not-a-number, and infinity in an appropriate fashion.
fscanf
functionfscanf
is based in part on these principles:
fscanf
.
Given the invalid field ``-.x
'',
the characters ``-.
'' are not
pushed back.
fscanf
are compatible with
those performed by strtod
and strtol
.
%p
has been added,
although it is obviously risky, for symmetry with fprintf
.
The %i
format
has been added to permit the scanner to determine the radix of the
number in the input stream;
the %n
format
has been added to make available the number of characters
scanned thus far in the current invocation of the scanner.
White space is now defined by the isspace
function.
(See §4.3.1.9.)
An implementation must not use the ungetc
function
to perform the necessary one-character pushback.
In particular, since the unmatched text is left ``unread,''
the file position indicator as reported by the ftell
function
must be the position of the character remaining to be read.
Furthermore, if the unread characters were themselves pushed back
via ungetc
calls, the pushback in fscanf
must not affect
the push-back stack in ungetc
. A scanf
call that matches
N characters from a stream must leave the stream in the same state
as if N consecutive getc
calls had been issued.
printf
functionSee comments of section §4.9.6.1 above.
scanf
functionSee comments in section §4.9.6.2 above.
sprintf
functionSee §4.9.6.1 for comments on output formatting.
In the interests of minimizing redundancy,
sprintf
has subsumed the older, rather uncommon,
ecvt
, fcvt
, and gcvt
.
sscanf
function
The behavior of sscanf
on encountering end of string has been clarified.
See also comments in section §4.9.6.2 above.
vfprintf
function
The functions vfprintf
, vprintf
, and vsprintf
have been adopted from UNIX System V
to facilitate writing special purpose formatted output functions.
vprintf
functionSee §4.9.6.7.
vsprintf
functionSee §4.9.6.7.
fgetc
function
Because much existing code assumes that fgetc
and fputc
are the actual functions equivalent to the macros getc
and putc
,
the Standard requires that they not be implemented as macros.
fgets
function
This function subsumes gets
,
which has no limit to prevent storage
overwrite on arbitrary input (see §4.9.7.7).
fputc
functionSee §4.9.7.1.
fputs
function
getc
function
getc
and putc
have often been implemented as unsafe macros,
since it is difficult in such a macro to touch the
stream
argument only once.
Since this danger is common in prior art, these two functions
are explicitly permitted to evaluate stream
more than once.
getchar
function
gets
functionSee §4.9.7.2.
putc
functionSee §4.9.7.5.
putchar
function
puts
function
puts(s)
is not exactly equivalent to fputs(stdout,s)
;
puts
also writes a new line after the argument string.
This incompatibility reflects existing practice.
ungetc
function
The Base Document requires that at least one character be read before
ungetc
is called, in certain implementation-specific cases.
The Committee has removed this requirement,
thus obliging a FILE
structure
to have room to store one character of pushback regardless of the state
of the buffer;
it felt that this degree of generality makes clearer the ways
in which the function may be used.
It is permissible to push back a different character than that which was read;
this accords with common existing practice.
The last-in, first-out nature of ungetc
has been clarified.
ungetc
is typically used to handle algorithms, such as tokenization,
which involve one-character lookahead in text files.
fseek
and ftell
are used for random access, typically in binary files.
So that these disparate file-handling disciplines are not unnecessarily linked,
the value of a text file's file position indicator immediately after ungetc
has been specified as indeterminate.
Existing practice relies on two different models of the effect of ungetc
.
One model can be characterized as writing the pushed-back character
``on top of'' the previous character.
This model implies an implementation in which the pushed-back characters are
stored within the file buffer and bookkeeping is performed by
setting the file position indicator to the previous character position.
(Care must be taken in this model to recover the overwritten character
values when the pushed-back characters are discarded as a result of
other operations on the stream.)
The other model can be characterized as pushing the character ``between''
the current character and the previous character.
This implies an implementation in which the pushed-back characters
are specially buffered (within the FILE structure, say) and accounted
for by a flag or count.
In this model it is natural not to move the file position
indicator.
The indeterminacy of the file position indicator while pushed-back
characters exist accommodates both models.
Mandating either model
(by specifying the effect of ungetc
on
a text file's file position indicator)
creates problems with implementations that have assumed the other model.
Requiring the file position indicator not to change after ungetc
would necessitate changes in programs which combine random access
and tokenization on text files,
and rely on the file position indicator marking the end of a token
even after pushback.
Requiring the file position indicator to back up would create severe
implementation problems in certain environments,
since in some file organizations it can be impossible to find the previous
input character position without having read the file sequentially to
the point in question. [Footnote:
Consider, for instance, a sequential file of variable-length records in which
a line is represented as a count field followed by the characters in the line.
The file position indicator must encode a character position
as the position of the count field plus an offset into the line;
from the position of the count field and the length of the line,
the next count field can be found.
Insufficient information is available for finding the previous
count field, so backing up from the first character of a line necessitates,
in the general case, a sequential read from the start of the file.]
fread
function
size_t
is the appropriate type both for an object size and for an array
bound (see §3.3.3.4),
so this is the type of size
and nelem
.
fwrite
functionSee §4.9.8.1.
fgetpos
function
fgetpos
and fsetpos
have been added to allow random access
operations on files which are too large to handle with fseek
and ftell
.
fseek
function
Whereas a binary file can be treated as an ordered sequence of bytes,
counting from zero, a text file need not map one-to-one to its
internal representation (see §4.9.2).
Thus, only seeks to an earlier reported position are permitted for
text files.
The need to encode both record position and position within a record
in a long
value may constrain the size of text files
upon which fseek
-ftell
can be used
to be considerably smaller than the size of binary files.
Given these restrictions,
the Committee still felt that this function has enough utility,
and is used in sufficient existing code,
to warrant its retention in the Standard.
fgetpos
and fsetpos
have been added to deal with files
which are too large to handle with fseek
and ftell
.
The fseek
function will reset the end-of-file flag for the stream;
the error flag is not changed unless an error occurs, when it will be set.
fsetpos
function
ftell
function
ftell
can fail for at least two reasons:
long int
.
ftell
to report failure has been specified.
See also §4.9.9.1.
rewind
function
Resetting the end-of-file and error indicators
was added to the specification of rewind
to make the specification more logically consistent.
clearerr
function
feof
function
ferror
function
perror
function
At various times, the Committee considered providing a form of perror
that delivers up an error string version of
errno
without performing any output.
It ultimately decided to provide this capability in a separate function,
strerror
.
(See §4.11.6.1).