ISO C 200X Proposal: File System Inquiry Functions
By David R. Tribble |
C200X Revision Proposal ======================= Title: File System Inquiry Functions Author: David R. Tribble Author Affiliation: Self Postal Address: *************** *************** USA E-mail Address: david@tribble.com Telephone Number: *************** Sponsor: _____________ Revision: 1.6 Date: 2004-01-19 Proposal Category: __ Editorial change/non-normative contribution __ Correction X_ New feature __ Addition to obsolescent feature list __ Addition to Future Directions __ Other (please specify) ______________________________ Area of Standard Affected: __ Environment __ Language __ Preprocessor X_ Library X_ Macro/typedef/tag name X_ Function __ Header __ Other (please specify) ______________________________ Prior Art: The statvfs() function of Standard Unix and POSIX; The statfs() function of BSD Unix; and the NTFS_VOLUME_DATA_BUFFER structure of Microsoft Windows. Target Audience: _____________________________________________ Related Documents (if any): None. Proposal Attached: X_ Yes __ No, but what's your interest? Abstract: The addition of standard functions that return information about the local file system, in particular, the amount of file storage space available. |
All hosted ISO C (ISO 9899:1999) implementations support the concept of an I/O stream, also known as a file, i.e., a named collection of characters (bytes) that is accessible via the standard library functions declared in the <stdio.h> header, such as fopen().
The majority of C implementations support the concept of a file system, also known as a structured storage device, which contains external files and directories. A file system typically represents a set of one or more files, usually arranged in a tree-like hierarchy of directories, subdirectories, and files. Each file corresponds to the ISO C notion of a binary or text I/O stream.
This proposal describes a set of types and functions to be added to the standard C library to provide the means to interrogate the implementation for information about the file system.
The following is a list of common programming operations that are not currently supported by ISO C.
Is the disk full?
One of the features missing from the current ISO C standard library is the ability to determine if the current file system is full or not, i.e., if all of its data storage space is allocated.
Is there enough space on the disk to write a particular amount of data?
A missing feature is the ability to determine if a given minimum amount of space is available on a file system prior to writing data to it.
What is the preferred file name length for a particular file system?
While conforming implementations provide the FILENAME_MAX constant, this is not guaranteed to be the appropriate size for all file names, particularly if the implementation supports several kinds of file systems.
Are file names treated differently than directory names?
Are nested subdirectories supported?
The following terms are used in this proposal. Some of the terms may need to be added to the ISO C standard.
The current ISO C standard does not define or require the concept of directory.
The following constants are defined in the <stdio.h> header file.
[Note]
Implementations are free to provide other implementation-defined constants specifying file system characteristics in addition to those defined below.
Synopsis
#include <stdio.h> #define _FILESYS_IGNORE_CASE integer-expression
Description
This is a preprocessor macro defined as a constant integer expression.
This constant is used in conjunction with the fs_flags member of the _filesys structure, and indicates whether or not a given file system supports file (and directory) names that are case-sensitive (i.e., where names such as "ABC" and "abc" refer to different files).
[Note]
For example, some operating systems (e.g., Unix) employ case-sensitive filenames, so that the names "foo" and "Foo" refer to different files. Other operating systems (e.g., Microsoft Windows and VAX VMS) employ case-insensitive filenames, so that the two names refer to the same file.
Synopsis
#include <stdio.h> #define _FILESYS_NAMES_DIFF integer-expression
Description
This is a preprocessor macro defined as a constant integer expression.
This constant is used in conjunction with the fs_flags member of the _filesys structure, and indicates whether or not a given file system makes a distinction between file names and directory names.
[Note]
For example, many operating systems (e.g., Unix and Microsoft Windows) use the same syntax for file and directory names, and the type of a given directory entry name cannot be determined by examining the name alone.In contrast, other operating systems (e.g., VAX VMS) employ a special naming syntax (e.g., a suffix of ".DIR") for entries that specify subdirectories, so that it is possible to determine whether a given entry name designates a directory by examining the name by itself.
The value of this constant is unspecified for implementations that do not support the notion of directories or directory names.
Synopsis
#include <stdio.h> #define _FILESYS_SUBDIRS integer-expression
Description
This is a preprocessor macro defined as a constant integer expression.
This constant is used in conjunction with the fs_flags member of the _filesys structure, and indicates whether or not a given file system supports subdirectories, i.e., directories containing other directories.
[Note]
This allows for file systems that do not support the notion of nested subdirectories, or more specifically, directories containing other directories.
The value of this constant is equal to zero for implementations that do not support the notion of directories.
The following types are defined in the <stdio.h> standard header.
[Note]
These types and structure tags have names with leading underscore, because such names are explicitly reserved for the implementation and (presumably) for the ISO C standard library.
Synopsis
#include <stdio.h> struct _filesys { // Contains the following members, in any order long long int fs_nfiles; // Files allocated long long int fs_ndirs; // Directories allocated long long int fs_total; // Total space, in blocks long long int fs_free; // Free space, in blocks long int fs_blocksize; // Block size, in bytes unsigned int fs_flags; // Bitflags int fs_namelen; // Max file name length char fs_name[n]; // File system name };
Description
The members of this structure describe the physical characteristics of a file system within the host implementation.
[Note]
Several of these members are signed integer types so that they can represent a unique "unknown" sentinel value of -1. An alternative design would be to make these members unsigned integers and allow a special sentinel value of -1 cast to the matching unsigned type. This would increase the range of possible values representable by these members (on most architectures) than if they were signed types.
The structure contains the following members, in no particular order:
Specifies the size, in bytes, of smallest fundamental unit of storage that is allocated to files in the file system. For implementations that can discern this information, this value is guaranteed to be greater than zero. If this information is not available, this value is equal to -1.
[Note]
Most operating systems employ a disk block size that is some power of 2, typically in the range of 512 to 65,536 bytes. Implementations could conceivably support block sizes as small as 1 byte, however.This member is signed so that a unique "unknown" sentinel value of -1 can be represented.
Contains bitflags that describe various characteristics of the file system.
The following constants designate the corresponding features:
[Note]
Implementations are free to provide other implementation-defined bitflags specifying file system characteristics in addition to those defined here. For example, an implementation might provide additional file system attributes such as:
- files are compressed
- files are encrypted
- is removable media
- is a network device
- is read-only
- is not mounted
Specifies the amount of free space, in blocks of size fs_blocksize, that is currently available (i.e., unallocated) on the file system. This value specifies a minimum size that may be smaller than the actual amount of free space available. If this information is not available, this value is equal to -1.
[Note]
The intended use is to provide the capability of determining if a given file system has exhausted all of its available space (i.e., a "disk full" condition), so that appropriate action can be taken by the program. Beyond receiving a return value from fwrite() or fputc() indicating an unspecified write failure, there is no such capability provided in the current ISO C standard.Another intended use is to allow a program the ability to verify that there is enough space on the file system before attempting to write data to a file (stream) residing on that file system.
This member is signed so that a unique "unknown" sentinel value of -1 can be represented. It is of type long long int, which should be large enough (at least 63 bits) for most implementations in the forseeable future.
For implementations that enforce individual user disk quotas, this member may represent the number of free blocks available only to the user (whose identity the executing program possesses). Similarly, some implementations may provide only the number of blocks accessible by the executing program.
Specifies the name of the file system as a null-terminated character string. The length of the name (n) and its contents are implementation-defined. If this information cannot be determined (or if no such concept is supported by the implementation), this is an empty string, i.e., the first character of the array is a null character ('\0').
Two file system names can be compared for equality using the strcmp() function.
[Note]
The intent is to be able to uniquely identify a file system by some kind of name.For example, a Microsoft Windows system might provide a name such as "C:" to represent the primary hard disk of a typical desktop PC. Unix systems might provide a device name such as "/dev/hd0".
The relationship between file system names and file names (i.e., names passed to the fopen() function) is obviously a decision best left to the implementation designer.
In the worst case, an implementation may simply provide an empty string (i.e., a single '\0' terminating character) for the file system name.
Given two _filesys structures corresponding to two different file names, which have fs_name members that compare equal, it can be assumed that the two files reside on the same file system.
This member has type char[n], instead of, say, an integer type, in order to be as generic as possible and can hold a unique identification value of an arbitrary size. Implementations that support only numeric file system identifiers can easily convert such numbers into unique character strings.
This member is a fixed length character array in order to avoid the obvious difficulties of allocating and deallocating the string(s) containing the file system name(s) that would result if it was a pointer instead.
Specifies the maximum length (characters) for file names allowed by the file system, including a terminating null character. In other words, this is the size of an array of char containing the longest file name string that the implementation guarantees can be opened on the file system. If the file system imposes no practical limit on file name lengths, this value should reflect the recommended size for file name strings. If this information is not available, this value is equal to -1.
[Note]
This value is related to the standard FILENAME_MAX constant.For implementations supporting multiple file systems, each of which having different maximum file name lengths, the FILENAME_MAX constant probably represents the minimum common length that is supported by all of them. This possibility is mentioned, in fact, in a footnote in section §7.19.1 of the ISO 9899:1999 standard, which points out that FILENAME_MAX is not a guaranteed limit for all possible file names supported by an implementation.
In contrast, this structure member represents the guaranteed exact maximum (or recommended) length for a given file system.
Specifies the number of directories currently allocated in (i.e., in use by) the file system. If this information is not available, or if the implementation does not distinguish between files and directories, this value is equal to -1.
[Note]
While most operating systems employ the concept of files and directories, it is possible that some implementations may only support a very simplified concept of files without the need for directories. Indeed, there is nothing in the current ISO C standard that defines or stipulates the concept of a "directory".An implementation might be able to determine only the number of directories to which the user (whose identity the executing program possesses) has appropriate access permissions.
Implementations for which determining the number of directories is an expensive operation (e.g., Win32) may choose to simply set this member to -1.
This member is signed so that a unique "unknown" sentinel value of -1 can be represented.
Specifies the number of files currently allocated in (i.e., in use by) the file system. For implementations that do not distinguish between files and directories, this value may represent the total sum of the number of files and directories in the file system. If this information is not available, this value is equal to -1.
[Note]
Ideally, this member represents the exact number of files currently allocated (in use) on the file system.However, an implementation might be able to determine only the number of files to which the user (whose identity the executing program possesses) has appropriate access permissions.
An implementation might not be able to distinguish between files and directories, having instead a total count of all allocated file system "entries". In cases like this, this member represents that total count.
Implementations for which determining the number of files is an expensive operation (e.g., Win32) may choose to simply set this member to -1.
This member is signed so that a unique "unknown" sentinel value of -1 can be represented.
Specifies the total amount of space, in blocks of size fs_blocksize, that is occupied by the file system. This value specifies a minimum size that may be smaller than the actual amount of space actually allocated. If this information is not available, this value is equal to -1.
[Note]
This member is signed so that a unique "unknown" sentinel value of -1 can be represented. It is of type long long int, which should be large enough (at least 63 bits) for most implementations in the forseeable future.For implementations that enforce individual user disk quotas, this member may represent the total number of blocks available only to the user (whose identity the executing program possesses). Similarly, some implementations may provide only the total number of blocks accessible by the executing program.
This structure may contain other implementation-defined members.
[Note]
For instance, an implementation may provide such things as:etc. device serial number, type of the file system, indication of whether the file system is read-only, user-ID of the owner of the file system, network address for the device, time of the last access made to the device, Other members were considered for inclusion in this proposal, but were deemed too system-specific for a standard library. The resulting set of members described above is considered the minimum useful set of information, while also being the most general set of information across existing implementations.
[Note]
These functions have names with a leading underscore, because such names are explicitly reserved for the implementation and (presumably) for the ISO C standard library.
Synopsis
#include <stdio.h> extern int _getfilesys(const char *name, struct _filesys *info);
Description
This function retrieves information about the file system on which a given file name resides.
Argument name points to a string containing the name of a file system or the name of a file (i.e., a name that could be passed to the fopen() function) that resides on a particular file system. If it constitutes a meaningful file system name or file name (according to the implementation), the structure pointed to by info is filled with information about the given file system.
[Note]
A Unix implementation, for example, might allow a file name such as "/my/files/foo.dat" to specify the file system mounted on directory "/my". Similarly, a file system name of "/" could represent the device on which the root file system is mounted.A Microsoft DOS or Windows implementation might allow a file name such as "C:\my\files\foo.dat" to specify the C: disk drive of the host system. Similarly, a name such as "C:" could signify the name of a specific disk drive.
Pointer name may be null or point to an empty string (""), in which case it specifies the local file system currently associated with the executing program. (Exactly what this means is implementation-defined.)
[Note]
The local file system in Unix, for example, is known as the "current working directory". This is also true of Microsoft DOS and Windows, although these systems have the additional concept of a "current drive".
Pointer info may be null, in which case no information about the file system is returned, and the only meaningful result of the function is its return value. (This can be used to ascertain whether the implementation supports this function, or whether a given file system or device exists and is accessible on the host implementation.)
Returns
The function returns a non-negative value on success. On failure, the function returns a negative value after modifying the value of errno (which is defined in the <errno.h> standard header).
If name does not correspond to a valid file system within the implementation, or if the file system is not accessible by the execution unit, the function fails.
[Note]
An implementation could provide a means to associate an I/O device other than a file to a "file" name, such as a socket, hardware port, or internal memory area. In these cases, the name does not correspond to a file system, so the function fails.The relationship between file system names and file names (i.e., names passed to the fopen() function) is obviously best left as being implementation-defined.
Synopsis
#include <stdio.h> extern int _fgetfilesys(const FILE *fp, struct _filesys *info);
Description
This function retrieves information about the file system on which a given I/O stream resides.
Argument fp points to an I/O stream. If the contents of the stream reside on a file system (according to the implementation), the structure pointed to by info is filled with information about the given file system.
Pointer fp may be null, in which case it specifies the local file system currently associated with the executing program. (Exactly what this means is implementation-defined.)
[Note]
The local file system in Unix, for example, is known as the "current working directory". This is also true of Microsoft DOS and Windows, although these systems have the additional concept of a "current drive".
Pointer info may be null, in which case no information about the file system is returned, and the only meaningful result of the function is its return value. (This can be used to ascertain whether the implementation supports this function.)
Returns
The function returns a non-negative value on success. On failure, the function returns a negative value after modifying the value of errno (which is defined in the <errno.h> standard header).
If the stream does not correspond to a file residing on a file system within the implementation, the function fails.
[Note]
This allows for the possibility that a given I/O stream may not be attached to an actual file, but to some other kind of I/O device such as a socket, hardware port, or internal memory area.In particular, an implementation may allow the predefined stdin, stdout, and stderr streams to be assigned to I/O devices that do not reside on a file system. For example, a Unix program may have its standard output stream redirected to /dev/null; similarly, a Microsoft DOS program may have its output redirected to nul.
If the stream pointed to by fp has been closed by a previous call to fclose(), the behavior is undefined.
The following function prints the name of the local file system.
#include <stdio.h> void local_fs() { struct _filesys info; // Attempt to get local file system information if (_getfilesys(NULL, &info) >= 0) printf("Local file system: '%s'\n", info.fs_name); else printf("Can't get info for the local file system\n"); }
The following function determines if a given file system exists.
#include <stdbool.h> #include <stdio.h> bool fs_exists(const char *fsys) { // Attempt to get information for a given file system return (_getfilesys(fsys, NULL) >= 0); }
This could also be written as a simple preprocessor function macro:
#define fs_exists(f) (_getfilesys(f, NULL) >= 0)
The following function determines if a given file system is full.
#include <stdbool.h> #include <stdio.h> bool fs_isfull(const char *fsys) { struct _filesys info; // Attempt to get information for a given file system if (_getfilesys(fsys, &info) < 0) return (false); // Determine if the file system is full return (info.fs_free == 0); }
The following function retrieves and prints information about the local file system.
#include <stdio.h> void print_fs() { struct _filesys info; // Get information about the local file system if (_getfilesys(NULL, &info) < 0) { printf("Can't get info for the local file system\n"); return; } // Display the file system information printf("Local file system:\n"); printf(" name: %s\n", info.fs_name); if (info.fs_blocksize != -1) printf(" block size: %ld bytes\n", info.fs_blocksize); else printf(" block size: unknown\n"); if (info.fs_total != -1) printf(" total space: %lld blocks\n", info.fs_total); else printf(" total space: unknown\n"); if (info.fs_free != -1) printf(" free space: %lld blocks\n", info.fs_free); else printf(" free space: unknown\n"); if (info.fs_nfiles != -1) printf(" files: %lld\n", info.fs_nfiles); else printf(" files: unknown\n"); if (info.fs_ndirs != -1) printf(" directories: %lld\n", info.fs_ndirs); else printf(" directories: unknown\n"); if (info.fs_namelen != -1) printf(" name length: %d+1\n", info.fs_namelen-1); else printf(" name length: unknown\n"); }
The following function attempts to write a specified number of bytes from a buffer to an specified file, but only after checking that the file system has enough space available for the operation.
#include <stdbool.h> #include <stdio.h> bool writebuf(const char *fname, const char *buf, int nbytes) { struct _filesys info; FILE * out; // Get the file system for the given file name if (_getfilesys(fname, &info) < 0) return (false); // Check that there is enough space to do the write operation if ((nbytes+info.fs_blocksize-1)/info.fs_blocksize > info.fs_free) return (false); // Open the file out = fopen(fname, "wb"); if (out == NULL) return (false); // Write the buffer to the file nbytes = fwrite(buf, 1, nbytes, out); fclose(out); return (nbytes > 0); }
The following is a typical set of declarations for the file system information types and functions:
// <stdio.h> #define _FILESYS_VERS 1 // Structure version [*] struct _filesys { int fs_vers; // Structure version [*] long long int fs_total; // Total space, in blocks long long int fs_free; // Free space, in blocks long long int fs_nfiles; // Files allocated long long int fs_ndirs; // Directories allocated long int fs_blocksize; // Block size, in bytes int fs_namelen; // Max file name length char fs_name[32+1]; // File system name char fs_type[8+1]; // File system type [*] char fs__r[22]; // (Reserved) [*] }; extern int _getfilesys(const char *_nam, struct _filesys *_inf); extern int _fgetfilesys(const FILE *_fp, struct _filesys *_inf);
[*] These are implementation-specific extensions.
struct statvfs { unsigned long f_bsize; // File system block size unsigned long f_frsize; // Fundamental file system block size fsblkcnt_t f_blocks; // Total number of blocks on file system in units of f_frsize fsblkcnt_t f_bfree; // Total number of free blocks fsblkcnt_t f_bavail; // Free blocks available to non-privileged process fsfilcnt_t f_files; // Total number of file serial numbers fsfilcnt_t f_ffree; // Total number of free file serial numbers fsfilcnt_t f_favail; // Number of file serial numbers available to non-privileged process unsigned long f_fsid; // File system ID unsigned long f_flag; // Bit mask of ST_XXX values unsigned long f_namemax; // Maximum filename length };
Most of these members have corresponding members in the proposed _filesys structure.
Of interest is the f_files member, which indicates the total number of file serial numbers, which equates to the total number of unique allocated files, directories, and devices on the file system. There are not separate totals for files and directories.
Also of interest are the f_bsize and f_frsize members. The former is the file system block size, but the latter is the "fundamental" block size, which implies that it may be smaller than the former. It is the latter size that is used to report the number of total and free blocks in the file system.
The f_bavail member indicates the number of free blocks available for use by non-privileged processes. The value of this member would probably be used to set the fs_free member for programs executing with a non-privileged user-ID, while privileged programs would probably use the value of the f_bfree member.
The data types of the members may not be the standard integer ISO C types. It is preferable to use the basic types int, long int, long long int, and their unsigned equivalents instead of relying on additional typedefs.
The file system identity is represented as an integer value in member f_fsid. This not a generic enough representation for those implementations that do not enumerate file systems numerically.
Microsoft Windows (Win32) employs the concept of disk clusters, which are the fundamental allocation units for disk drives which are composed of a number of smaller physical disk sectors. For example, a disk system may use 2,048-byte clusters, each composed of four 512-byte physical sectors. Win32 also employs the concept of file segments.
Win32 apparently does not keep track of the number of allocated files and directories on a given file system. Determining these totals would probably require counting them by traversing the top-level directory tree of a given device, which is undoubtedly an expensive operation. Thus it would make sense for such implementations to report an "unknown" number of files (fs_nfiles) and directories (fs_ndirs).
Microsoft Win32 provides a NTFS_VOLUME_DATA_BUFFER structure type, which contains information about a given volume. This structure is retrieved by an FSCTL_GET_NTFS_VOLUME_DATA query operation.
typedef struct { LARGE_INTEGER VolumeSerialNumber; LARGE_INTEGER NumberSectors; LARGE_INTEGER TotalClusters; LARGE_INTEGER FreeClusters; LARGE_INTEGER TotalReserved; DWORD BytesPerSector; DWORD BytesPerCluster; DWORD BytesPerFileRecordSegment; DWORD ClustersPerFileRecordSegment; LARGE_INTEGER MftValidDataLength; LARGE_INTEGER MftStartLcn; LARGE_INTEGER Mft2StartLcn; LARGE_INTEGER MftZoneStart; LARGE_INTEGER MftZoneEnd; } NTFS_VOLUME_DATA_BUFFER;
Note that LARGE_INTEGER is a 64-bit integer type and DWORD is a 32-bit unsigned integer type.
Win32 also provides a _diskfree_t structure and a _getdiskfree() function for retrieving information about a given disk drive:
// <direct.h>, <dos.h> struct _diskfree_t { unsigned int total_clusters; unsigned int avail_clusters; unsigned int sectors_per_cluster; unsigned int bytes_per_sector; }; extern unsigned _getdiskfree(unsigned dev, struct _diskfree_t *info);
Proof-of-concept source code is contained in these files:
The author wishes to express his gratitude to those who provided comments, suggestions, and criticism on this proposal.
Further discussion can be found on the comp.std.c newsgroup, under the subject of "C0X: File system info funcs".
This document is in the public domain. Permission is granted to freely redistribute, copy, or reference this document.
This document: http://david.tribble.com/text/c0xfilesys.html.
Author's email:
david@tribble.com.
Author's home page:
http://david.tribble.com.