4.9 Input/Output <stdio.h>

Many implementations of the C runtime environment (most notably the UNIX operating system) provide, aside from the standard I/O library (fopen, fclose, fread, fwrite, fseek), a set of unbuffered I/O services (open, close, read, write, lseek). The Committee has decided not to standardize the latter set of functions.

A suggested semantics for these functions in the UNIX world may be found in the emerging IEEE P1003 standard. The standard I/O library functions use a file pointer for referring to the desired I/O stream. The unbuffered I/O services use a file descriptor (a small integer) to refer to the desired I/O stream.

Due to weak implementations of the standard I/O library, many implementors have assumed that the standard I/O library was used for small records and that the unbuffered I/O library was used for large records. However, a good implementation of the standard I/O library can match the performance of the unbuffered services on large records. The user also has the capability of tuning the performance of the standard I/O library (with setvbuf) to suit the application.

Some subtle differences between the two sets of services can make the implementation of the unbuffered I/O services difficult:

The model of a file used in the unbuffered I/O services is an array of characters. Many C environments do not support this file model.
Difficulties arise when handling the new-line character. Many hosts use conventions other than an in-stream new-line character to mark the end of a line. The unbuffered I/O services assume that no translation occurs between the program's data and the file data when performing I/O, so either the new-line character translation would be lost (which breaks programs) or the implementor must be aware of the new-line translation (which results in non-portable programs).
On UNIX systems, file descriptors 0, 1, and 2 correspond to the standard input, output, and error streams. This convention may be problematic for other systems in that (1) file descriptors 0, 1, and 2 may not be available or may be reserved for another purpose, (2) the operating system may use a different set of services for terminal I/O than file I/O.

In summary, the Committee chose not to standardize the unbuffered I/O services because:

They duplicate the facilities provided by the standard I/O services.
The performance of the standard I/O services can be the same or better than the unbuffered I/O services.
The unbuffered I/O file model may not be appropriate for many C language environments.

4.9.1 Introduction

The macros _IOFBF, _IOLBF, _IONBF are enumerations of the third argument to setvbuf, a function adopted from UNIX System V.

SEEK_CUR, SEEK_END, and SEEK_SET have been moved to <stdio.h> from a header specified in the Base Document and not retained in the Standard.

FOPEN_MAX and TMP_MAX are added environmental limits of some interest to programs that manipulate multiple temporary files.

FILENAME_MAX is provided so that buffers to hold file names can be conveniently declared. If the target system supports arbitrarily long filenames, the implementor should provide some reasonable value (80?, 255?, 509?) rather than something unusable like USHRT_MAX.

4.9.2 Streams

C inherited its notion of text streams from the UNIX environment in which it was born. Having each line delimited by a single new-line character, regardless of the characteristics of the actual terminal, supported a simple model of text as a sort of arbitrary length scroll or ``galley.'' Having a channel that is ``transparent'' (no file structure or reserved data encodings) eliminated the need for a distinction between text and binary streams.

Many other environments have different properties, however. If a program written in C is to produce a text file digestible by other programs, by text editors in particular, it must conform to the text formatting conventions of that environment.

The I/O facilities defined by the Standard are both more complex and more restrictive than the ancestral I/O facilities of UNIX. This is justified on pragmatic grounds: most of the differences, restrictions and omissions exist to permit C I/O implementations in environments which differ from the UNIX I/O model.

Troublesome aspects of the stream concept include:

The definition of lines.

In the UNIX model, division of a file into lines is effected by new-line characters. Different techniques are used by other systems --- lines may be separated by CR-LF (carriage return, line feed) or by unrecorded areas on the recording medium, or each line may be prefixed by its length. The Standard addresses this diversity by specifying that new-line be used as a line separator at the program level, but then permitting an implementation to transform the data read or written to conform to the conventions of the environment.

Some environments represent text lines as blank-filled fixed-length records. Thus the Standard specifies that it is implementation-defined whether trailing blanks are removed from a line on input. (This specification also addresses the problems of environments which represent text as variable-length records, but do not allow a record length of 0: an empty line may be written as a one-character record containing a blank, and the blank is stripped on input.)

Transparency.

Some programs require access to external data without modification. For instance, transformation of CR-LF to new-line character is usually not desirable when object code is processed. The Standard defines two stream types, text and binary, to allow a program to define, when a file is opened, whether the preservation of its exact contents or of its line structure is more important in an environment which cannot accurately reflect both.

Random access.

The UNIX I/O model features random access to data in a file, indexed by character number. On systems where a new-line character processed by the program represents an unknown number of physically recorded characters, this simple mechanism cannot be consistently supported for text streams. The Standard abstracts the significant properties of random access for text streams: the ability to determine the current file position and then later reposition the file to the same location. ftell returns a file position indicator, which has no necessary interpretation except that an fseek operation with that indicator value will position the file to the same place. Thus an implementation may encode whatever file positioning information is most appropriate for a text file, subject only to the constraint that the encoding be representable as a long. Use of fgetpos and fsetpos removes even this constraint.

Buffering.

UNIX allows the program to control the extent and type of buffering for various purposes. For example, a program can provide its own large I/O buffer to improve efficiency, or can request unbuffered terminal I/O to process each input character as it is entered. Other systems do not necessarily support this generality. Some systems provide only line-at-a-time access to terminal input; some systems support program-allocated buffers only by copying data to and from system-allocated buffers for processing. Buffering is addressed in the Standard by specifying UNIX-like setbuf and setvbuf functions, but permitting great latitude in their implementation. A conforming library need neither attempt the impossible nor respond to a program attempt to improve efficiency by introducing additional overhead.

Thus, the Standard imposes a clear distinction between text streams, which must be mapped to suit local custom, and binary streams, for which no mapping takes place. Local custom on UNIX (and related) systems is of course to treat the two sorts of streams identically, and nothing in the Standard requires any changes to this practice.

Even the specification of binary streams requires some changes to accommodate a wide range of systems. Because many systems do not keep track of the length of a file to the nearest byte, an arbitrary number of characters may appear on the end of a binary stream directed to a file. The Standard cannot forbid this implementation, but does require that this padding consist only of null characters. The alternative would be to restrict C to producing binary files digestible only by other C programs; this alternative runs counter to the spirit of C.

The set of characters required to be preserved in text stream I/O are those needed for writing C programs; the intent is the Standard should permit a C translator to be written in a maximally portable fashion. Control characters such as backspace are not required for this purpose, so their handling in text streams is not mandated.

It was agreed that some minimum maximum line length must be mandated; 254 was chosen.

4.9.3 Files

The as if principle is once again invoked to define the nature of input and output in terms of just two functions, fgetc and fputc. The actual primitives in a given system may be quite different.

Buffering, and unbuffering, is defined in a way suggesting the desired interactive behavior; but an implementation may still be conforming even if delays (in a network or terminal controller) prevent output from appearing in time. It is the intent that matters here.

No constraints are imposed upon file names, except that they must be representable as strings (with no embedded null characters).

4.9.4 Operations on files

4.9.4.1 The `remove` function

The Base Document provides the unlink system call to remove files. The UNIX-specific definition of this function prompted the Committee to replace it with a portable function.

4.9.4.2 The `rename` function

This function has been added to provide a system-independent atomic operation to change the name of an existing file; the Base Document only provided the link system call, which gives the file a new name without removing the old one, and which is extremely system-dependent.

The Committee considered a proposal that rename should quietly copy a file if simple renaming couldn't be performed in some context, but rejected this as potentially too expensive at execution time.

rename is meant to give access to an underlying facility of the execution environment's operating system. When the new name is the name of an existing file, some systems allow the renaming (and delete the old file or make it inaccessible by that name), while others prohibit the operation. The effect of rename is thus implementation-defined.

4.9.4.3 The `tmpfile` function

The tmpfile function is intended to allow users to create binary ``scratch'' files. The as if principle implies that the information in such a file need never actually be stored on a file-structured device.

The temporary file is created in binary update mode, because it will presumably be first written and then read as transparently as possible. Trailing null-character padding may cause problems for some existing programs.

4.9.4.4 The `tmpnam` function

This function allows for more control than tmpfile: a file can be opened in binary mode or text mode, and files are not erased at completion.

There is always some time between the call to tmpnam and the use (in fopen) of the returned name. Hence it is conceivable that in some implementations the name, which named no file at the call to tmpnam, has been used as a filename by the time of the call to fopen. Implementations should devise name-generation strategies which minimize this possibility, but users should allow for this possibility.

4.9.5 File access functions

4.9.5.1 The `fclose` function

On some operating systems it is difficult, or impossible, to create a file unless something is written to the file. A maximally portable program which relies on a file being created must write something to the associated stream before closing it.

4.9.5.2 The `fflush` function

The fflush function ensures that output has been forced out of internal I/O buffers for a specified stream. Occasionally, however, it is necessary to ensure that all output is forced out, and the programmer may not conveniently be able to specify all the currently-open streams (perhaps because some streams are manipulated within library packages). [Footnote: For instance, on a system (such as UNIX) which supports process forks, it is usually necessary to flush all output buffers just prior to the fork.] To provide an implementation-independent method of flushing all output buffers, the Standard specifies that this is the result of calling fflush with a NULL argument.

4.9.5.3 The `fopen` function

The b type modifier has been added to deal with the text/binary dichotomy (see §4.9.2). Because of the limited ability to seek within text files (see §4.9.9.1), an implementation is at liberty to treat the old update + modes as if b were also specified. Table 4.1 tabulates the capabilities and actions associated with the various specified mode string arguments to fopen.

                                          r   w   a   r+  w+  a+
    file must exist before open           x   -   -   x   -   -
    old file contents discarded on open   -   x   -   -   x   -
    stream can be read                    x   -   -   x   x   x
    stream can be written                 -   x   x   x   x   x
    stream can be written only at end     -   -   x   -   -   x

fopen

Other specifications for files, such as record length and block size, are not specified in the Standard, due to their widely varying characteristics in different operating environments. Changes to file access modes and buffer sizes may be specified using the setvbuf function. (See §4.9.5.6.) An implementation may choose to allow additional file specifications as part of the mode string argument. For instance,

        file1 = fopen(file1name,"wb,reclen=80");

might be a reasonable way, on a system which provides record-oriented binary files, for an implementation to allow a programmer to specify record length.

A change of input/output direction on an update file is only allowed following a fsetpos, fseek, rewind, or fflush operation, since these are precisely the functions which assure that the I/O buffer has been flushed.

The Standard (§4.9.2) imposes the requirement that binary files not be truncated when they are updated. This rule does not preclude an implementation from supporting additional file types that do truncate when written to, even when they are opened with the same sort of fopen call. Magnetic tape files are an example of a file type that must be handled this way. (On most tape hardware it is impossible to write to a tape without destroying immediately following data.) Hence tape files are not ``binary files'' within the meaning of the Standard. A conforming hosted implementation must provide (and document) at least one file type (on disk, most likely) that behaves exactly as specified in the Standard.

4.9.5.4 The `freopen` function

4.9.5.5 The `setbuf` function

setbuf is subsumed by setvbuf, but has been retained for compatibility with old code.

4.9.5.6 The `setvbuf` function

setvbuf has been adopted from UNIX System V, both to control the nature of stream buffering and to specify the size of I/O buffers. An implementation is not required to make actual use of a buffer provided for a stream, so a program must never expect the buffer's contents to reflect I/O operations. Further, the Standard does not require that the requested buffering be implemented; it merely mandates a standard mechanism for requesting whatever buffering services might be provided.

Although three types of buffering are defined, an implementation may choose to make one or more of them equivalent. For example, a library may choose to implement line-buffering for binary files as equivalent to unbuffered I/O or may choose to always implement full-buffering as equivalent to line-buffering.

The general principle is to provide portable code with a means of requesting the most appropriate popular buffering style, but not to require an implementation to support these styles.

4.9.6 Formatted input/output functions

4.9.6.1 The `fprintf` function

Use of the L modifier with floating conversions has been added to deal with formatted output of the new type long double.

Note that the %X and %x formats expect a corresponding int argument; %lX or %lx must be supplied with a long int argument.

The conversion specification %p has been added for pointer conversion, since the size of a pointer is not necessarily the same as the size of an int. Because an implementation may support more than one size of pointer, the corresponding argument is expected to be a (void *) pointer.

The %n format has been added to permit ascertaining the number of characters converted up to that point in the current invocation of the formatter.

Some pre-Standard implementations switch formats for %g at an exponent of -3 instead of (the Standard's) -4: existing code which requires the format switch at -3 will have to be changed.

Some existing implementations provide %D and %O as synonyms or replacements for %ld and %lo. The Committee considered the latter notation preferable.

The Committee has reserved lower case conversion specifiers for future standardization.

The use of leading zero in field widths to specify zero padding has been superseded by a precision field. The older mechanism has been retained.

Some implementations have provided the format %r as a means of indirectly passing a variable-length argument list. The functions vfprintf, etc., are considered to be a more controlled method of effecting this indirection, so %r was not adopted in the Standard. (See §4.9.6.7.)

The printing formats for numbers is not entirely specified. The requirements of the Standard are loose enough to allow implementations to handle such cases as signed zero, not-a-number, and infinity in an appropriate fashion.

4.9.6.2 The `fscanf` function

The specification of fscanf is based in part on these principles:

As soon as one specified conversion fails, the whole function invocation fails.
One-character pushback is sufficient for the implementation of fscanf. Given the invalid field ``-.x'', the characters ``-.'' are not pushed back.
If a ``flawed field'' is detected, no value is stored for the corresponding argument.
The conversions performed by fscanf are compatible with those performed by strtod and strtol.

Input pointer conversion with %p has been added, although it is obviously risky, for symmetry with fprintf. The %i format has been added to permit the scanner to determine the radix of the number in the input stream; the %n format has been added to make available the number of characters scanned thus far in the current invocation of the scanner.

White space is now defined by the isspace function. (See §4.3.1.9.)

An implementation must not use the ungetc function to perform the necessary one-character pushback. In particular, since the unmatched text is left ``unread,'' the file position indicator as reported by the ftell function must be the position of the character remaining to be read. Furthermore, if the unread characters were themselves pushed back via ungetc calls, the pushback in fscanf must not affect the push-back stack in ungetc. A scanf call that matches N characters from a stream must leave the stream in the same state as if N consecutive getc calls had been issued.

4.9.6.3 The `printf` function

See comments of section §4.9.6.1 above.

4.9.6.4 The `scanf` function

See comments in section §4.9.6.2 above.

4.9.6.5 The `sprintf` function

See §4.9.6.1 for comments on output formatting.

In the interests of minimizing redundancy, sprintf has subsumed the older, rather uncommon, ecvt, fcvt, and gcvt.

4.9.6.6 The `sscanf` function

The behavior of sscanf on encountering end of string has been clarified. See also comments in section §4.9.6.2 above.

4.9.6.7 The `vfprintf` function

The functions vfprintf, vprintf, and vsprintf have been adopted from UNIX System V to facilitate writing special purpose formatted output functions.

4.9.6.8 The `vprintf` function

See §4.9.6.7.

4.9.6.9 The `vsprintf` function

See §4.9.6.7.

4.9.7 Character input/output functions

4.9.7.1 The `fgetc` function

Because much existing code assumes that fgetc and fputc are the actual functions equivalent to the macros getc and putc, the Standard requires that they not be implemented as macros.

4.9.7.2 The `fgets` function

This function subsumes gets, which has no limit to prevent storage overwrite on arbitrary input (see §4.9.7.7).

4.9.7.3 The `fputc` function

See §4.9.7.1.

4.9.7.4 The `fputs` function

4.9.7.5 The `getc` function

getc and putc have often been implemented as unsafe macros, since it is difficult in such a macro to touch the stream argument only once. Since this danger is common in prior art, these two functions are explicitly permitted to evaluate stream more than once.

4.9.7.6 The `getchar` function

4.9.7.7 The `gets` function

See §4.9.7.2.

4.9.7.8 The `putc` function

See §4.9.7.5.

4.9.7.9 The `putchar` function

4.9.7.10 The `puts` function

puts(s) is not exactly equivalent to fputs(stdout,s); puts also writes a new line after the argument string. This incompatibility reflects existing practice.

4.9.7.11 The `ungetc` function

The Base Document requires that at least one character be read before ungetc is called, in certain implementation-specific cases. The Committee has removed this requirement, thus obliging a FILE structure to have room to store one character of pushback regardless of the state of the buffer; it felt that this degree of generality makes clearer the ways in which the function may be used.

It is permissible to push back a different character than that which was read; this accords with common existing practice. The last-in, first-out nature of ungetc has been clarified.

ungetc is typically used to handle algorithms, such as tokenization, which involve one-character lookahead in text files. fseek and ftell are used for random access, typically in binary files. So that these disparate file-handling disciplines are not unnecessarily linked, the value of a text file's file position indicator immediately after ungetc has been specified as indeterminate.

Existing practice relies on two different models of the effect of ungetc. One model can be characterized as writing the pushed-back character ``on top of'' the previous character. This model implies an implementation in which the pushed-back characters are stored within the file buffer and bookkeeping is performed by setting the file position indicator to the previous character position. (Care must be taken in this model to recover the overwritten character values when the pushed-back characters are discarded as a result of other operations on the stream.) The other model can be characterized as pushing the character ``between'' the current character and the previous character. This implies an implementation in which the pushed-back characters are specially buffered (within the FILE structure, say) and accounted for by a flag or count. In this model it is natural not to move the file position indicator. The indeterminacy of the file position indicator while pushed-back characters exist accommodates both models.

Mandating either model (by specifying the effect of ungetc on a text file's file position indicator) creates problems with implementations that have assumed the other model. Requiring the file position indicator not to change after ungetc would necessitate changes in programs which combine random access and tokenization on text files, and rely on the file position indicator marking the end of a token even after pushback. Requiring the file position indicator to back up would create severe implementation problems in certain environments, since in some file organizations it can be impossible to find the previous input character position without having read the file sequentially to the point in question. [Footnote: Consider, for instance, a sequential file of variable-length records in which a line is represented as a count field followed by the characters in the line. The file position indicator must encode a character position as the position of the count field plus an offset into the line; from the position of the count field and the length of the line, the next count field can be found. Insufficient information is available for finding the previous count field, so backing up from the first character of a line necessitates, in the general case, a sequential read from the start of the file.]

4.9.8 Direct input/output functions

4.9.8.1 The `fread` function

size_t is the appropriate type both for an object size and for an array bound (see §3.3.3.4), so this is the type of size and nelem.

4.9.8.2 The `fwrite` function

See §4.9.8.1.

4.9.9 File positioning functions

4.9.9.1 The `fgetpos` function

fgetpos and fsetpos have been added to allow random access operations on files which are too large to handle with fseek and ftell.

4.9.9.2 The `fseek` function

Whereas a binary file can be treated as an ordered sequence of bytes, counting from zero, a text file need not map one-to-one to its internal representation (see §4.9.2). Thus, only seeks to an earlier reported position are permitted for text files. The need to encode both record position and position within a record in a long value may constrain the size of text files upon which fseek-ftell can be used to be considerably smaller than the size of binary files.

Given these restrictions, the Committee still felt that this function has enough utility, and is used in sufficient existing code, to warrant its retention in the Standard. fgetpos and fsetpos have been added to deal with files which are too large to handle with fseek and ftell.

The fseek function will reset the end-of-file flag for the stream; the error flag is not changed unless an error occurs, when it will be set.

4.9.9.3 The `fsetpos` function

4.9.9.4 The `ftell` function

ftell can fail for at least two reasons:

the stream is associated with a terminal, or some other file type for which file position indicator is meaningless; or
the file may be positioned at a location not representable in a long int.

Thus a method for ftell to report failure has been specified.

4.9.9.5 The `rewind` function

Resetting the end-of-file and error indicators was added to the specification of rewind to make the specification more logically consistent.

4.9.10 Error-handling functions

4.9.10.1 The `clearerr` function

4.9.10.2 The `feof` function

4.9.10.3 The `ferror` function

4.9.10.4 The `perror` function

At various times, the Committee considered providing a form of perror that delivers up an error string version of errno without performing any output. It ultimately decided to provide this capability in a separate function, strerror. (See §4.11.6.1).

4.8 Variable Arguments

ANSI C Rationale

4.10 General Utilities Index

4.9 Input/Output <stdio.h>

4.9.1 Introduction

4.9.2 Streams

4.9.3 Files

4.9.4 Operations on files

4.9.4.1 The remove function

4.9.4.2 The rename function

4.9.4.3 The tmpfile function

4.9.4.4 The tmpnam function

4.9.5 File access functions

4.9.5.1 The fclose function

4.9.5.2 The fflush function

4.9.5.3 The fopen function

4.9.5.4 The freopen function

4.9.5.5 The setbuf function

4.9.5.6 The setvbuf function

4.9.6 Formatted input/output functions

4.9.6.1 The fprintf function

4.9.6.2 The fscanf function

4.9.6.3 The printf function

4.9.6.4 The scanf function

4.9.6.5 The sprintf function

4.9.6.6 The sscanf function

4.9.6.7 The vfprintf function

4.9.6.8 The vprintf function

4.9.6.9 The vsprintf function

4.9.7 Character input/output functions

4.9.7.1 The fgetc function

4.9.7.2 The fgets function

4.9.7.3 The fputc function

4.9.7.4 The fputs function

4.9.7.5 The getc function

4.9.7.6 The getchar function

4.9.7.7 The gets function

4.9.7.8 The putc function

4.9.7.9 The putchar function

4.9.7.10 The puts function

4.9.7.11 The ungetc function