ISO C/200X Proposal: File Inquiry Functions
By David R. Tribble |
C0X Revision Proposal ===================== Title: File Inquiry Functions Author: David R. Tribble Author Affiliation: Self Postal Address: *************** *************** USA E-mail Address: david@tribble.com Telephone Number: *************** Sponsor: _____________ Revision: 2.4 Date: 2005-08-10 Supersedes: 2.3, 2003-12-30 Proposal Category: __ Editorial change/non-normative contribution __ Correction X_ New feature __ Addition to obsolescent feature list __ Addition to Future Directions __ Other (please specify) ______________________________ Area of Standard Affected: __ Environment __ Language __ Preprocessor X_ Library X_ Macro/typedef/tag name X_ Function __ Header __ Other (please specify) ____________________________________ Prior Art: The 'stat' structure, and 'stat()' and 'fstat()' functions of POSIX (Unix); the 'BY_HANDLE_FILE_INFORMATION' structure and the 'GetFileInformationByHandle()' function, and the 'WIN32_FIND_DATA' and 'WIN32_FILE_ATTRIBUTE_DATA' structures of Microsoft Windows (Win32). Other operating systems have similar functionality (MS-DOS, MacOS, VMS, etc.). Target Audience: C programmers, and programmers who write programs that generate C code. Related Documents (if any): None. Proposal Attached: X_ Yes __ No, but what's your interest? Abstract: The addition of standard functions that return information about a given file or I/O stream. |
All hosted ISO C (ISO 9899:1999) implementations provide the concept of an I/O stream, also known as a file, i.e., a named collection of characters (bytes) that is accessible via the standard library functions declared in the <stdio.h> header, such as fopen().
This proposal describes a set of types and functions to be added to the standard C library to provide the means to interrogate the implementation for information about files and streams.
The functions and types suggested in this proposal codify, in a general way, existing practice among several popular implementations of C.
The following is a list of common programming operations that are not currently supported by ISO C (C99), or at best are expensive to do.
It is possible to use fopen() and then fclose() to check for the existence of a given file, but this is generally an expensive operation. Opening a file takes much longer than simply checking its status on most systems. Even if the fopen() call fails, this does not really indicate whether the file exists of not, because the execution unit (program) might not have the proper access permissions to the file.
It is reasonable to assume that some applications need to know only if a file exists (e.g., it might be used as a semaphore or locking mechanism) without needing to read or write its contents.
A program may want to read the entire contents of the file into a memory buffer, or it may need to allocate a data object based on the size of the file, or it may need to take some action if the file is empty. The size of a file can be determined by opening the file and seeking to its end in order to count the bytes, but this incurs the overhead of actually opening the file, which might not be necessary for the application, particularly if it does not care about the actual contents of the file.
A program may want to perform some action based on the relative sizes of two different files, without regard to the contents of the files themselves.
ISO C does not provide the capability to determine these.
Knowing that a given file is a directory may allow a program to decide not to attempt to open or create it. It may also allow a program to use the directory name to construct the name of another file.
(Note that ISO C does not support the ability to create or remove directories. But see Proposal [P2] for a related proposal that addresses this.)
A program may take certain actions if it can determine that two files or I/O streams designate the same physical file.
A program may handle I/O for a particular file if it is not a regular file, such as sending different end-of-line characters to a writable stream.
A program can check the modification time of a file and compare it to the time it last read/wrote to the file, which could indicate that more data is available (written by some other process), or that there is a conflict because another process is using the file.
Many programmers provide these capabilities by wrapping portable functions around implementation-specific functions. This proposal suggests functions and types to be added to standard C to provide these common capabilities.
The following terms are used in this proposal. Some of the terms may need to be added to the ISO standard.
ISO C does not define or support the concept of directory.
The following constants are preprocessor macros defined in the <stdio.h> standard header.
The following constant is a preprocessor macro defined in the <time.h> standard header.
[Note]
These constants have names with a leading underscore, because such names are explicitly reserved for the implementation and (presumably) for the ISO standard library.
Implementations may provide other constants with names of the form _FILE_TYPE_XXX in addition to those specified in this proposal.
[Note]
The _FILE_TYPE_XXX macros described in this proposal are considered the minimum number of useful files types covering the broadest number of existing implementations.However, implementations may provide additional macros to reflect their support of other file types such as:
See Appendix A for further discussion.
- socket
- pipe (FIFO)
- symbolic link
- character I/O device
- block I/O device
- network device
- volume label
Implementations may provide other constants with names of the form _FILE_PERM_XXX in addition to those specified in this proposal.
[Note]
The _FILE_PERM_XXX macros described in this proposal are considered the minimum number of useful file access permissions covering the broadest number of existing implementations.However, implementations may provide additional macros to reflect their support of file access permissions and modes such as:
See Appendix B for further discussion.
- user access
- group access
- set-user-ID mode
- set-group-ID mode
- backup/archival indicator
- system access
- hidden file
Synopsis
#include <stdio.h> #define _FILE_PERM_EXEC integer-expression
This preprocessor macro evaluates to an integer constant expression. This value specifies that a file designates an executable binary image.
Synopsis
#include <stdio.h> #define _FILE_PERM_READ integer-expression
This preprocessor macro evaluates to an integer constant expression. This value specifies that a file can be read by the execution unit.
Synopsis
#include <stdio.h> #define _FILE_PERM_SEARCH integer-expression
This preprocessor macro evaluates to an integer constant expression. This value specifies that a file designates a directory and that it can be searched by the execution unit.
Synopsis
#include <stdio.h> #define _FILE_PERM_WRITE integer-expression
This preprocessor macro evaluates to an integer constant expression. This value specifies that a file can be written by the execution unit.
Synopsis
#include <stdio.h> #define _FILE_TYPE_DIR integer-expression
This preprocessor macro evaluates to an integer constant expression. This value specifies that a file designates a directory.
Synopsis
#include <stdio.h> #define _FILE_TYPE_FILE integer-expression
This preprocessor macro evaluates to an integer constant expression. This value specifies that a file designates a regular file.
Synopsis
#include <stdio.h> #define _FILE_TYPE_UNKNOWN integer-expression
This preprocessor macro evaluates to an integer constant expression. This value specifies that a file designates an external file that does not exist or has an unknown or unsupported type.
Synopsis
#include <time.h> #define _TIME_ERROR arithmetic-expresssion
This preprocessor macro evaluates to an arithmetic value of type time_t, and represents an unknown, indeterminate, or erroneous time value. It is guaranteed not to represent a valid time.
[Note]
This constant is meant to replace the use of (time_t)(-1) as an indicator of an erroneous or unknown time value, such as returned by the mktime() function. A time of -1 may be an otherwise valid time value on some implementations.
[Note]
If the related Proposal [P3] is adopted into ISO C, the constant _LONGTIME_ERROR can be used instead.
The following types are defined in the <stdio.h> standard header.
[Note]
These types and structure tages have names with a leading underscore, because such names are explicitly reserved for the implementation and (presumably) for the ISO standard library.
[Note]
The constants, types, and functions defined in this proposal could instead be placed into a completely new standard header (perhaps named <stdfile.h>) instead of being added to the existing <stdio.h> header. If this approach is taken, then the names defined in this proposal do not need leading underscores to keep them from intruding into the user namespace.
#include <stdio.h> struct _fileinfo { int fi_type; // File type unsigned long int fi_perms; // Access permissions long long int fi_size; // Size, in bytes time_t fi_modified; // Modification time time_t fi_accessed; // Last access time time_t fi_created; // Creation time time_t fi_revised; // Status update time long int fi_id; // Serial number char fi_filesys[n]; // File system identity };
[Note]
The members having type time_t may instead be defined as having type longtime_t, an extended-precision time type, if the related Proposal [P3] is adopted into ISO C.
Description
This structure contains information about a file.
The structure contains the following members, in no particular order.
This specifies the time that the file was last accessed. If this information cannot be determined (or if no such concept is supported by the implementation), this has a value equal to _TIME_ERROR.
[Note]
The meaning of "accessed" is implementation-defined, since different systems have different notions of what constitutes an "access" to a given file.POSIX systems, for example, may consider this to be the time of the last actual access to a file (corresponding to the st_atime member of the stat structure), or they may consider this to be either the last access time or the time of the last status change (corresponding to the st_ctime member), whichever is later.
This specifies the time that the file was initially created. If this information cannot be determined (or if no such concept is supported by the implementation), this has a value equal to _TIME_ERROR.
[Note]
While many implementations are capable of providing the creation time of a file, POSIX is not. Therefore, POSIX implementations must set this member equal to -1.
Specifies the name of the file system on which the file resides as a null-terminated character string. The length of the name (n) and its contents are implementation-defined. If this information cannot be determined (or if no such concept is supported by the implementation), the first character in the array is zero ('\0').
Two file system identities can be compared for equality using the strcmp() function.
[Note]
The intent is to be able to uniquely identify a file by its file system name and serial number.The implementation is expected to provide a suitable identification value so that two files residing on the same file system have file system name strings that compare equal.
This member has type char[n] in order to be as generic as possible so that it can hold a unique identification value of an arbitrary size.
It might be reasonable to define this to be type long int, similar to the fi_id member. However, there are probably implementations that uniquely identify their file systems by name instead of by number, or even by some other kind of data object.
This kind of non-numeric file system identifier is best represented as either an opaque object of some implementation-defined type, or as a generic character string. The latter approach is preferred because it does not require the existence of a standard structure member with an unknown type, and it lends itself to simple tests for equality using strcmp(). Implementations that use a binary data object for identification can simply encode the bytes of the object as a hexadecimal character string.
This specifies the serial number of the file, which is a unique number within the file system on which the file resides. If this information cannot be determined (or if no such concept is supported by the implementation), this has a value of -1.
[Note]
The intent is that the serial numbers for two different files in the same file system do not compare equal. Conversely, if two files in the same file system have the same serial number, then they can be considered to be the same file (even if they can be opened with different names).It is not clear, however, that these semantics should be requirements mandated by the standard.
This member has type long int, but it might be more practical to define it as type char[n] (for some implementation-defined size n), so that it could hold a unique identification value of an arbitrary size.
This specifies the time that the file was last modified. If this information cannot be determined (or if no such concept is supported by the implementation), this has a value equal to _TIME_ERROR.
[Note]
It is expected that most implementations will be able to provide at least this one piece of date/time information for a given file.On POSIX implementations, for instance, this corresponds to the st_mtime member of the stat structure.
This specifies the access permissions of the file, which is a bitwise or-ing of zero or more _FILE_PERM_XXX constants. If the permissions of the file cannot be determined, this has a value equal to zero (0).
Testing a particular permissions bit by and-ing this member value with one of the _FILE_PERM_XXX constants results in a non-zero (true) value if the execution unit is permitted such access, or a zero (false) value if the access is not permitted.
A regular file or directory can have the _FILE_PERM_READ and _FILE_PERM_WRITE bits set, indicating that it can be read or written, respectively, by the execution unit.
A regular file can have the _FILE_PERM_EXEC bit set, indicating that it is an executable binary image that can be executed from the current execution unit (in an implementation-defined manner, possibly via the system() function).
A directory can have the _FILE_PERM_SEARCH bit set, indicating that the execution unit can search the contents of the directory (in an implementation-defined manner).
[Note]
The permissions bits are intended to take into account user-ID, group-ID, physical file access modes, etc., in determining whether or not a particular _FILE_PERM_XXX access is granted to the program.For example, Unix determines read access to a file by checking three levels of permissions: one for the owner (user), one for the group, and one for everyone else (other). If the execution unit can access a given file according to these combined permissions, then the fi_perms member should have its _FILE_PERM_READ bit set to one (1).
Implementations may support other permissions bits, with corresponding _FILE_PERM_XXX constants, the meanings of which are implementation-defined.
This specifies the time that control information about the file was last modified. (Note that this time is not necessarily the same as the time that the contents of the file were last modified; this control information may refer to other accounting data kept by the implementation.) If this information cannot be determined (or if no such concept is supported by the implementation), this has a value equal to _TIME_ERROR.
[Note]
This corresponds to the st_mtime member of the stat structure on POSIX implementations, which indicates the time of the last i-node status change for the file).Other implementations may not keep track of the time that file system information is updated for a given file. These implementations must therefore set this member to -1.
This specifies the size of the file. The size is represented in bytes (characters). Implementations may represent the file size as a value rounded up to the next multiple of an implementation-dependent block size. Note that the total file size might not include any pending (unflushed) writes to the file. If this information cannot be determined (or if no such concept is supported by the implementation), this has a value equal to -1.
[Note]
This member is signed so that a unique "unknown" sentinel value of -1 can be represented.The member is of type long long int, which should be large enough (at least 63 bits) for most implementations in the forseeable future.
This specifies the type of the file, which is equal to one of the _FILE_TYPE_XXX constants.
A value equal to _FILE_TYPE_FILE indicates a regular file.
A value equal to _FILE_TYPE_DIR indicates a directory.
If the type cannot be determined, this has a value equal to _FILE_TYPE_UNKNOWN.
Implementations may support other file types, with corresponding _FILE_TYPE_XXX constants, the meanings of which are implementation-defined.
The structure may contain other implementation-defined members.
[Note]
These structure members are considered the minimum amount of useful information about files while also covering the widest number of existing implementations.However, implementations may provide additional members to reflect their support of other file information, such as:
- owner user-ID
- owner group-ID
- device ID
- number of links to the file
- number of blocks allocated to the file
- lock control bits
- last backup/archival time
- system access permissions
- network identification
The standard header <stdio.h> contains declarations for the following library functions.
[Note]
These functions have names with a leading underscore, because such names are explicitly reserved for the implementation and (presumably) for the ISO standard library.
#include <stdio.h> extern int _getfileinfo(const char *fname, struct _fileinfo *info);
Description
This function retrieves information about the file with the name fname, placing the information into the structure pointed to by info.
The contents of the character string pointed to by fname are implementation-defined (and typically conform to the same rules dictated by the fopen() function). If pointer fname is null, the function fails.
If pointer info is null, the only meaningful result of the function is its return value (which can be used to determine if the file exists and is accessible by the execution unit).
Returns
If the named file exists and is accessible by the execution unit, and the retrieval of the information for the file is successful, the function returns a positive value. If unsuccessful, the function returns a negative value after modifying the value of errno (which is defined in the <errno.h> standard header).
#include <stdio.h> extern int _fgetfileinfo(FILE *fp, struct _fileinfo *info);
Description
This function retrieves information about the I/O stream pointed to by fp, placing the information into the structure pointed to by info.
Pointer fp points to a previously opened I/O stream. If the stream pointed to by fp has been closed by a prior call to fclose(), the behavior is undefined. If pointer fp is null, the function fails.
If pointer info is null, the only meaningful result of the function is its return value (which can be used to determine if the file exists and is accessible by the execution unit).
Returns
If the retrieval of the information for the stream is successful, the function returns a positive value. If unsuccessful, the function returns a negative value after modifying the value of errno (which is defined in the <errno.h> standard header).
The following program displays information about a given file name.
#include <errno.h> #include <stdio.h> #include <string.h> #include <time.h> void print_file_info(const char *fname) { struct _fileinfo info; char tbuf[40]; // Retrieve information about the file name if (_getfileinfo(fname, &info) < 0) { // Failed printf("Can't get info for: %s, %s\n", fname, strerror(errno)); return; } // Display the file type switch (info.fi_type) { case _FILE_TYPE_FILE: printf("f"); break; case _FILE_TYPE_DIR: printf("d"); break; case _FILE_TYPE_UNKNOWN: default: printf("?"); } // Display the file permissions if ((info.fi_perms & _FILE_PERM_READ) != 0) printf("r"); else printf("-"); if ((info.fi_perms & _FILE_PERM_WRITE) != 0) printf("w"); else printf("-"); if (info.fi_type == _FILE_TYPE_FILE && info.fi_perms & _FILE_PERM_EXEC printf("x"); else if (info.fi_type == _FILE_TYPE_DIR && info.fi_perms & _FILE_PERM_SEARCH) printf("s"); else printf("-"); // Display the file size printf(" %12lld", info.fi_size); // Display the file modification time if (info.fi_modified != _TIME_ERROR) { struct tm ts; ts = *localtime(&info.fi_modified); strftime(tbuf, sizeof(tbuf), "%Y-%m-%d %H:%M", &ts); } else sprintf(tbuf, "unknown"); printf(" %16s", tbuf); // Display the file serial number if (info.fi_id != -1) printf(" %5ld", info.fi_id); else printf(" %5s", "-"); // Display the file name printf(" %s\n", fname); } int main(int argc, char **argv) { // Print information about each filename arg for (int i = 1; i < argc; i++) print_file_info(argv[i]); return (EXIT_SUCCESS); }
This proposal defines a minimum useful set of file types. However, implementations are free to provide additional file types.
For example, POSIX (Unix) systems could provide the following macros:
Microsoft DOS and Windows systems could provide the following macros:
This proposal defines a minimum useful set of file permissions. However, implementations are free to provide additional permissions bits.
For example, the following values could be provided by POSIX and Unix implementations:
The following values could be provided by Microsoft Windows and DOS implementations:
There are difficulties with trying to integrate the POSIX stat structure into the ISO C library, however. While it is reasonable to exclude the structure members that are not general enough for all hosted implementations (e.g., st_gid) and simply allow these to be implemented in POSIX as extensions to ISO C, a few problems still remain.
Unless the same functionality (or a proper subset of the functionality) as POSIX is provided by a new ISO C function named stat(), that function name cannot be added to the standard, because this would render existing POSIX systems nonconforming.
Additions of the members described in items (a) and (b) above force POSIX to be modified in order to conform to ISO C, meaning that even a stripped-down version of the current struct stat specification could not be accepted into ISO C.
The conclusion is that rather than choosing one implementation (widespread as it is) as the model to standardize, it is better to invent a new, simpler model that (practically) all implementations can support.
Win32 also provides directory searching functions FindFirstFile() and FindNextFile() that fill a WIN32_FIND_DATA structure with file attributes.
Win32 provides the creation time as one of the attributes of a file.
Proof-of-concept source code is contained in these files:
The author wishes to express his gratitude to those who provided comments, suggestions, and critism on this proposal.
Further discussion can be found on the comp.std.c newsgroup, under the subject "C0X: File information funcs".
This document is in the public domain. Permission is granted to freely redistribute, copy, or reference this document.
This document: http://david.tribble.com/text/c0xfstat.html.
Author's email address:
david@tribble.com.
Author's home page:
http://david.tribble.com.