ISO C 200X Proposal: Directory Handling Functions
By David R. Tribble |
Contents |
Cover Sheet |
C200X Revision Proposal ======================= Title: Directory Handling Functions Author: David R. Tribble Author Affiliation: Self Postal Address: *************** USA Telephone Number: *************** E-mail Address: david@tribble.com Sponsor: _____________ Revision: 1.7, 2006-03-10 Supersedes: 1.6, 2006-03-07 Proposal Category: __ Editorial change/non-normative contribution __ Correction X_ New feature __ Addition to obsolescent feature list __ Addition to Future Directions __ Other (please specify) ______________________________ Area of Standard Affected: __ Environment __ Language __ Preprocessor X_ Library X_ Macro/typedef/tag name X_ Function X_ Header __ Other (please specify) ______________________________ Prior Art: The POSIX functions opendir(), readdir(), et al. The Microsoft Win32 functions FindFirstFile() and FindNextFile(). Various other operating systems provide similar functionality (VMS, MS-DOS, MacOS, etc.). Target Audience: _____________________________________________ Related Documents (if any): None. Proposal Attached: X_ Yes __ No, but what's your interest? Abstract: The addition of the concept of "directory" to complement the existing concept of "file" (I/O stream). The addition of standard functions providing the capability to create and search directories, and functions for manipulating file and directory names. These functions and types are defined in a new header file, <stddir.h>. |
1. Introduction |
All hosted ISO C (ISO 9899:1999) implementations support the concept of an I/O stream, also known as a file, i.e., a named collection of characters (bytes) that is accessible via the standard library functions declared in the <stdio.h> header, such as fopen().
The majority of hosted C implementations support the concept of a file system, also known as a structured storage device, containing external files. The majority of hosted C implementations also support the notion of directories containing files and possibly nested subdirectories.
A file system typically represents a set of one or more files, usually arranged in a tree-like hierarchy of directories, subdirectories, and files. Each file corresponds to the ISO C notion of a binary or text I/O stream.
This proposal describes a set of types and functions to be added to the standard C library to provide the means to manipulate directories. These new capabilities are intended to be provided by hosted implementations, and should not be required of free-standing implementations.
The following is a list of common programming operations that are not currently supported by ISO C.
Create a file name for a file within a particular directory.
Decompose a file name into its component directory and file name parts.
Determine the current directory associated with the executing program.
Set the current directory of the executing program.
Search for entries within a particular directory.
This proposal describes a set of types and functions to be added to the standard C library to provide these capabilities.
For creating a set of types and functions to provide directory and file operations, it is useful to adhere to the following list of design goals and guidelines.
Efficiency.
Structure objects should not occupy excessive amounts of memory space,
nor should they require excessive amounts of CPU time to create and destroy.
Safety.
All functions should be thread-safe, and the use of modifiable global data
should be avoided.
Maximum lengths of variable-sized data (such as strings) should be passed as
extra arguments to functions to avoid data overruns.
Wide target universe.
The widest possible variety of existing implementations should be addressed.
Functions that will work on only 95% of existing systems are not a
sufficiently complete solution.
As widespread as they are, Win32 and POSIX systems are not the only
operating systems to be addressed.
Sufficiently general.
Proposed types and functions should be sufficiently general and give
implementers reasonably wide latitude in the way they may choose to
implement the functions.
This implies that certain details should be vague enough to allow for
many disparate operating systems, but well defined enough to allow for
portable coding with predictable behavior.
2. Definitions |
The current ISO C standard (1999) does not define the concept of directory.
3. Constants |
The following constants are defined in the <stddir.h> standard header.
[Note]
The constants, functions, and types in this proposal are to be defined in a new standard library header file. This isolates the new names in a new header file, thus preventing existing code from breaking.
#include <stddir.h> #define __STDC_DIR__ integer-expression
This is a preprocessor macro defined as a constant integer expression. The expression evaluates to a non-zero (true) value if the implementation supports the notion of directories (specifically, if the implementation supports all of the types and functions declared in the <stddir.h> header), otherwise it evaluates to zero (false).
[Note]
This macro can be used to test at compile time whether or not an implementation supports the functions specified in this proposal.This implies that a conforming hosted implementation will be expected to provide the <stddir.h> standard header file, but will not be expected to support the types and functions defined within it.
Presumably, systems that do in fact support the notion of directories will provide an operating set of directory functions and thus define the __STDC_DIR__ macro as true.
Systems that cannot support such a notion are free to provide a minimal header file containing only the definition of the __STDC_DIR__ macro, being defined to zero. Of course, such implementations could provide non-working versions of the directory handling functions (which simply return failure values), but this would probably be undesirable, and in any case such implementations should still define the macro as false.
In this way, ISO C will not force implementations to provide support for directories, particularly those that cannot (e.g., embedded systems having only a very simple notion of a file system or none at all).
The value of this macro could be defined in a fashion similar to other existing __STDC_XXX__ macros, specifying the date of the ISO C standard being supported by the implementation, as a number in the form yyyymmL. For example, this macro could be defined as 200910L, designating support for the ISO C standard dated Oct 2009.
#include <stddir.h> #define __STDC_SETCURRDIR__ integer-expression
This is a preprocessor macro defined as a constant integer expression. The expression evaluates to a non-zero (true) value if the implementation supports the operation of changing the current directory (specifically, if it supports the setcurrdir() function), otherwise it evaluates to zero (false).
[Note]
This is to allow for implementations that do not support the notion of a current directory associated with the executing program, or for those that do but cannot change it at execution time.This macro can be used to test at compile time whether or not the setcurrdir() function actually does anything useful.
4. Types |
The following types are defined in the <stddir.h> standard header.
[Note]
The constants, functions, and types in this proposal are to be defined in a new standard library header file. This isolates the new names in a new header file, thus preventing existing code from breaking.
#include <stddir.h> typedef opaque-type DIR;
This object type is used to store the context of a directory search. This type is not an array type. The contents of this object are implementation-defined.
[Note]
This structure type is analogous to the standard FILE type. Whereas a FILE object embodies the context of reading or writing an I/O stream, a DIR object embodies the context of a directory search.This function is modeled after the POSIX DIR type, which is an opaque type, presumably a structure.
This type is explicitly defined as not being an array type so that its address can be taken without any undesirable semantic side effects. This is a requirement to be able to pass pointers to this object type to the directory searching functions.
Objects of this type are created by the opendir() function and are destroyed by the closedir() function.
#include <stddir.h> struct dirent;
This structure contains the following members, in no specific order:
size_t d_namlen; // Entry name length char d_name[N]; // Entry name
This structure contains information about a directory entry. Such objects are created as a result of searching a directory.
[Note]
This function is modeled after the POSIX and Unix dirent structure, stripped down to its bare essentials. Also, some of the member types are slightly different than the existing Unix members.
The members of the structure are described in further detail below.
A null-terminated string containing the name of a directory entry. The string may designate a file, a directory, or some other implementation-defined type of entry within a directory. The length of the array (N) and its contents are implementation-defined.
If the directory entry designates a regular file, this member string may be passed as an argument to the fopen() function to open an I/O stream for the named file.
[Note]
Since the type of this member is an array of char, there is an implicit assumption that directory entry names are single-byte or multi-byte character strings, and not wide character strings. This parallels the assumption made for filenames passed to the standard fopen() function.
If the directory entry designates a directory, this member string may be passed as an argument to the opendir() function to search the entries contained within the named directory.
The entry may designate some implementation-defined type of entry other than a regular file or directory.
[Note]
Implementations may provide directory entries that designate O/S entities other than files and directories, such as sockets, pipes, semaphores, volume labels, block devices, etc.
This member may be a flexible array member (i.e., an array with an unspecified size), provided that it is also the last member in the structure. This implies that this member should not be used as an operand of the sizeof operator.
[Note]
The length of the directory entry name is related to the standard FILENAME_MAX constant.A footnote in section §7.19.1 of the ISO 9899:1999 standard points out that FILENAME_MAX is not a guaranteed limit for all possible file names supported by an implementation (i.e., for an implementation that presumably supports multiple kinds of file systems).
Therefore it is not clear how to specify the relationship between FILENAME_MAX and the length of the d_name array.
Specifies the length of the name of a directory entry, i.e., the number of characters, not including the terminating null character, in the member array d_name.
[Note]
It would be better to name this member d_namelen, but existing practice (BSD Unix, et al) already uses the name d_namlen.The type of this member is size_t. This differs slightly from the Unix specification, which defines it as having type unsigned short.
The value of this member is related to the standard FILENAME_MAX constant.
As described in the note for member d_name above, it is not clear how to specify the relationship between FILENAME_MAX and the length of the d_name array, i.e., the value of d_namlen.
The dirent structure may contain other implementation-defined members.
[Note]
These structure members are considered the minimum amount of useful information about files while also being the most general for the widest number of existing implementations.Some implementations (e.g., POSIX) provide only the entry name, whereas other implementations (e.g., Win32) provide several pieces of information about the directory entry.
However, implementations may provide additional members to reflect their support of, or need for, other directory search context information, such as:
- search "handle"
- entry record length
- directory lock control information
- entry type or "mode"
- access permissions
- entry serial number within the file system
5. Functions |
The following set of functions are used to manage directories for an execution unit. They are declared in the <stddir.h> standard header.
[Note]
There is no function defined in this proposal for removing directories (e.g., removedir() or rmdir()). This capability is presumably already provided by the standard remove() function. See the Prior Art section for further discussion.
#include <stddir.h> int getcurrdir(char *dir, size_t max);
This function determines the current directory associated with the execution unit.
[Note]
This function is modeled after the POSIX getcwd() function. The POSIX function returns a char * type, however.Also, the POSIX function takes a second length argument of type int instead of type size_t. This is only a minor difference, though.
The dir argument points to a buffer that is to be filled with the name of the current directory, as determined by the implementation. The contents of the string are implementation-defined.
Argument max specifies the maximum number of characters, including a terminating null character, to write into the string pointed to by dir. If the name of the current directory, including the terminating null character, is longer than max characters, the function fails.
If the implementation does not support the concept of directories or a current directory associated with an execution unit, the function fails.
If successful, the function returns a non-negative value after modifying the contents of the string pointed to by dir.
On failure, the function returns a negative value after modifying the value of errno.
[Note]
The function could return a negative value whose absolute value indicates the number of characters required to hold the directory name.
#include <stddir.h> int setcurrdir(const char *dir);
This function establishes the current directory associated with the execution unit, as determined by the implementation.
[Note]
This function is modeled after the POSIX chdir() function.The proposed name is felt to be more meaningful, however, and better parallels the getcurrdir() function name.
The dir argument points to a string containing a directory name. The contents of the string are implementation-defined.
[Note]
This argument can be the value obtained by calling the getcurrdir(), getdirname(), or mkdirname() function.
If the implementation does not support the concept of directories or a current directory associated with an execution unit, the function fails.
Note that the __STDC_SETCURRDIR__ macro indicates whether or not the implementation supports this function.
If successful, the function returns a non-negative value after establishing a new current directory for the execution unit.
On failure, the function returns a negative value after modifying the value of errno.
#include <stddir.h> int createdir(const char *dir);
This function creates a directory of a given name.
[Note]
This function is modeled after the POSIX mkdir() function.It is simpler and more generic than the POSIX function, though, because it does not take a second argument specifying access permissions for the new directory. The concept of "access permissions" is not defined in ISO C.
There is no function defined in this proposal for performing the inverse operation of removing a directory (e.g., a removedir() or rmdir() function). This capability is already provided, presumably, by the standard remove() function. See the Prior Art section for further discussion.
The dir argument points to a null-terminated string containing the name of a new directory to create. The contents of the string are implementation-defined.
[Note]
This argument can be the value obtained by calling the getdirname() or mkdirname() function.Some systems support the concept of "absolute" and "relative" directory names.
If the directory name is an "absolute" path name, then the current directory associated with the executing program is ignored when creating the new directory.
If the directory name is a "relative" path name, then the new directory is created as a subdirectory of the current directory associated with the executing program.
If the implementation has no concept of "absolute" or "relative" directory names, then the directory is created in an implementation-defined manner.
Some systems (e.g., Win32) support the concept of a "device" or "drive" associated with a given directory name. Given a directory name without a device name on such systems, a default device must be assumed by the implementation.
If the implementation does not support the concept of directories, the function fails.
If successful, the function returns a non-negative value. On failure, the function returns a negative value after modifying the value of errno.
The following set of functions are used to manipulate directory and file names.
#include <stddir.h> int getdirname(char *dir, size_t max, const char *path);
This function extracts the directory component of a given file name.
[Note]
This function is modeled after the Unix dirname() function. It operates differently, however, because it does not modify the path name in place.
Argument path points to a string containing a file or directory name. The contents of the string are implementation-defined. If the string does not specify a valid directory or file path name, the function fails. If this pointer is null, the behavior is undefined.
Argument dir points to a buffer that is to be filled with the directory component of the path name pointed to by path. The contents of the resulting string are implementation-defined. The resulting directory name is suitable for use as an argument to the opendir(), mkfilename(), and mkdirname() functions. If this pointer is null, the behavior is undefined.
If the path name does not have a directory component, dir is set to the empty string (i.e., the first character in the string is set to '\0') and the function returns zero.
[Note]
The function should return zero if the given path name designates the "root" or "top-most" directory on the implementation, since such path names do not have a "parent" directory component.It is possible that there may need to be another function that retrieves the name of the "root" or "top-most" directory, viz.:
int getrootdir(char *dir, size_t max);This function would be useful for constructing "absolute" path names.
Argument max specifies the maximum number of characters, including a terminating null character, to write into the string pointed to by dir. If resulting directory name component, including the terminating null character, is longer than max characters, the function fails.
If the implementation does not support the concept of directory names, the function fails.
If successful, the function returns a positive value indicating the length of the resulting directory name component (i.e., the number of characters, excluding the terminating null character, written into dir).
On failure, the function returns a negative value after possibly modifying the value of errno.
[Note]
The function could return a negative value whose absolute value indicates the number of characters required to hold the resulting directory name.
#include <stddir.h> int getfilename(char *file, size_t max, const char *path);
This function extracts the file component of a given file name.
[Note]
This function is modeled after the Unix basename() function. It operates differently, however, because it does not modify the path name in place.
Argument path points to a string containing a file or directory name. The contents of the string are implementation-defined. If the string does not specify a valid directory or file name, the function fails. If this pointer is null, the behavior is undefined.
Argument file points to a buffer that is to be filled with the file component of the path name pointed to by path. The contents of the resulting string are implementation-defined. The resulting directory name is suitable for use as an argument to the fopen(), mkfilename(), and mkdirname() functions. If this pointer is null, the behavior is undefined.
Argument max specifies the maximum number of characters, including a terminating null character, to write into the string pointed to by file. If resulting file name component, including the terminating null character, is longer than max characters, the function fails.
If the implementation does not support the concept of directory names, the function simply copies string path into file.
If successful, the function returns a positive value indicating the length of the resulting file name component (i.e., the number of characters, excluding the terminating null character, written into file).
On failure, the function returns a negative value after possibly modifying the value of errno.
[Note]
The function could return a negative value whose absolute value indicates the number of characters required to hold the resulting file name.
#include <stddir.h> int mkdirname(char *path, size_t max, const char *dir, const char *subdir);
This function creates a directory name by combining a directory name and the name of a subdirectory within that directory.
[Note]
This function is intended to provide a standard means for creating a directory name from a given directory name and subdirectory name. This function performs the inverse operation of the getdirname() and getfilename() functions, and complements the mkfilename() function.The format of directory and file names is implementation-specific, as are the rules for combining them to create subdirectory names.
POSIX implementations, for instance, use a "/" separator between directory path names and subdirectory names. Microsoft DOS and Windows use a "\" or "/" separator between directory path components, and also allow for network host names and disk drive prefixes.
Implementations that do not require different formats for directory names and file names (e.g., Unix and Win32) might implement both the mkdirname() and mkfilename() functions in the same manner.
Some operating systems employ more complicated formats for subdirectory names, e.g., Digital VMS.
Argument dir points to a null-terminated string containing the name of a directory, presumably residing on a structured storage device within the implementation. The contents of the string are implementation-defined, and can be the directory name resulting from a call to getdirname() or mkdirname(). If the string is empty (""), it is assumed to specify the name of the current directory associated with the execution unit. If the implementation does not support such a concept, the function fails. If this pointer is null, the behavior is undefined.
Argument subdir points to a null-terminated string containing the name of a subdirectory located within the directory designated by the dir string. The contents of the string are implementation-defined, and can be the subdirectory name resulting from a call to getfilename(). If this pointer is null, the behavior is undefined.
Argument path points to a character array that is to be filled with the null-terminated directory name that results from combining the given directory name and subdirectory name. The method by which these components are combined into a single name and the contents of the resulting name are implementation-defined. If the components cannot be combined into a valid subdirectory name, the function fails. The resulting subdirectory name is suitable for use as an argument to the opendir() function. If this pointer is null, the behavior is undefined.
[Note]
Since these arguments are character strings, there is an implicit assumption that directory names are single-byte or multi-byte character strings, and not wide character strings. This parallels the assumption made for directory names passed to the opendir() function.
Argument max specifies the maximum number of characters, including a terminating null character, to write into the string pointed to by path. If the resulting subdirectory name contains more than max characters, the function fails.
If the implementation does not support the notion of directory names or the notion of subdirectories, the function fails.
[Note]
This function is related to another proposal [6]. Specifically, the fs_flags member of the proposed _filesys structure and the _FILESYS_SUBDIRS constant, which together specify whether or not a given file system supports subdirectories; and the _FILESYS_NAMES_DIFF constant, which specifies whether or not a file system makes a distinction between file and directory names.It is assumed that a given implementation may support more than one type of file system, and that some of those support subdirectories while others do not. The _FILESYS_SUBDIRS attribute in particular can be examined at runtime to determine if a given file system meaningfully supports the construction of subdirectory names by the mkdirname() function.
If successful, the function returns a positive value indicating the length of the resulting subdirectory name (i.e., the number of characters, excluding the terminating null character, written into path), after modifying the contents of the string pointed to by path.
On failure, the function returns a negative value after possibly modifying the value of errno.
[Note]
The function could return a negative value whose absolute value indicates the number of characters required to hold the resulting directory name.
#include <stddir.h> int mkfilename(char *path, size_t max, const char *dir, const char *file);
This function creates a file name by combined a directory name and the name of a file within that directory.
[Note]
This function is intended to provide a standard means for creating a file name from a given directory name and file (or subdirectory) name. This function performs the inverse operation of the getdirname() and getfilename() functions, and complements the mkdirname() function.The format of directory and file names is implementation-specific, as are the rules for combining them to create file names.
POSIX implementations, for instance, use a "/" separator between directory path names and file names. Microsoft DOS and Windows use a "\" or "/" separator between directory path components and file names, and also allow for network host names and disk drive prefixes.
Some operating systems employ more complicated formats for specifying file names, e.g., Digital VMS.
Argument dir points to a null-terminated string containing the name of a directory, presumably residing on a structured storage device within the implementation. The contents of the string are implementation-defined, and can be the directory name resulting from a call to getdirname() or mkdirname(). If the string is empty (""), it is assumed to specify the name of the current directory associated with the execution unit. If the implementation does not support such a notion, the function fails. If this pointer is null, the behavior is undefined.
Argument file points to a null-terminated string containing the name of a file located within the directory designated by the dir string. The contents of the string are implementation-defined, and can be the file name resulting from a call to getfilename(). If this pointer is null, the behavior is undefined.
Argument path points to a character array that is to be filled with the null-terminated file name that results from combining the given directory name and file name. The method by which these components are combined into a file name and the contents of the resulting string are implementation-defined. If the components cannot be combined into a valid file name, the function fails. The resulting file name is suitable for use as an argument to the fopen() function. If this pointer is null, the behavior is undefined.
[Note]
Since these arguments are character strings, there is an implicit assumption that directory and file names are single-byte or multi-byte character strings, and not wide character strings. This parallels the assumption made for filenames passed to the standard fopen() function.
Argument max specifies the maximum number of characters, including a terminating null character, to write into the string pointed to by path. If the resulting file name contains more than max characters, the function fails.
If the implementation does not support the concept of directory names, the function ignores the dir argument.
If successful, the function returns a positive value indicating the length of the resulting file name (i.e., the number of characters, excluding the terminating null character, written into path), after modifying the contents of the string pointed to by path.
On failure, the function returns a negative value after possibly modifying the value of errno.
[Note]
The function could return a negative value whose absolute value indicates the number of characters required to hold the resulting file name.
#include <stddir.h> int matchfilename(const char *path, const char *pattern);
This function determines if a given path name matches a filename pattern.
Argument path points to a null-terminated string containing a file or directory name. The contents of the string are implementation-defined. If the string is empty (""), it can only be matched by an empty filename pattern. If the string does not contain a valid file or directory name, the behavior is undefined. If this pointer is null, the behavior is undefined.
Argument pattern points to a null-terminated string containing a filename pattern. The format and contents of the string are implementation-defined. If the string is empty (""), it matches only an empty path name. If the string does not contain a valid filename pattern, the behavior is undefined. If this pointer is null, the behavior is undefined.
[Note]
A "pattern" contains pattern-matching characters, similar to a regular expression, but designed to match file and/or directory names. It is implementation-defined as to what constitutes a valid pattern.Some systems allow fairly primitive patterns, e.g., Win32 allows the '?' and '*' characters, also known as "wildcard" characters. Other systems provide more elaborate pattern-matching, e.g., Unix, which also provides character sets, ranges, and exclusion sets.
Some implementations (e.g., Win32) provide case-insensitive filenames, so that the names "abc", "Abc", and "ABC" all refer to the same file or directory. Other systems (e.g., Unix) provide case-sensitive filenames, so that all three names refer to different files or directories.
If path contains a path name that matches the filename pattern pattern, the function returns a positive value. The criteria by which a pattern matches a given filename is implementation-defined.
[Note]
It is unspecified whether or not the path and pattern strings can contain directory components in addition to a filename component.
On failure, the function returns a negative value after possibly modifying the value of errno.
If the implementation does not support filename patterns, the function does a simple comparison of the path name string to the filename pattern string in a manner consistent with the naming of files and directories in the implementation.
[Note]
If the implementation does not support filename patterns, i.e., it does not provided "wildcard" filename expressions, the pattern argument must match the path argument.For implementations that support case-sensitive filenames, this means that the two strings must match exactly, as if calling strcmp(). For implementations that support case-insensitive filenames, the comparison must treat upper and lower case alphabetic characters the same, so that, for instance, the names "abc", "Abc", and "ABC" all match each other.
The following set of functions are used to search directories.
#include <stddir.h> DIR * opendir(const char *dir);
This function initiates a search within a directory.
[Note]
This function operates on directory searches in an fashion analogous to the way that fopen() operates on I/O streams.This function is modeled after the POSIX opendir() function. See the Prior Art section for further discussion.
Some implementations may require the use of an I/O stream for the directory searching operations.
Argument dir points to a string containing the name of a directory within the implementation. The contents of the string are implementation-defined, and can be the directory name resulting from a call to getdirname() or mkdirname().
[Note]
Since this argument is a character string, there is an implicit assumption that directory names are single-byte or multi-byte character strings, and not wide character strings. This parallels the assumption made for filenames passed to the standard fopen() function.
If the implementation does not support the concept of directories or directory searching, the function fails.
If the directory name pointed to by dir constitutes a meaningful directory name (according to the implementation), the function creates and returns a pointer to an object containing context information for searching the specified directory. This object can subsequently be passed to the other directory searching functions. The returned object is destroyed by a subsequent call to closedir().
On failure, the function returns null after modifying the value of errno.
#include <stddir.h> int closedir(DIR *dp);
This function terminates a directory search and destroys the directory search context object that was created by a prior call to opendir().
[Note]
This function operates on directory searches in an fashion analogous to the way that fclose() operates on I/O streams.This function is modeled after the POSIX closedir() function.
Argument dp points to an object containing directory search context information that was created by a prior call to opendir(). If the pointer is null or the object has been used as an argument of a prior call to closedir(), the behavior is undefined.
After a successful return from this function, the object pointed to by dp may not be used for further directory searching operations (and cannot be assumed to still exist as a valid memory object).
If successful, the object pointed to by dp is destroyed (in an implementation-defined way) and the function returns a non-negative value.
Any pointers to dirent structures returned by prior calls to readdir() for the same directory search context are rendered invalid.
On failure, the function returns a negative value after modifying the value of errno.
#include <stddir.h> const struct dirent * readdir(DIR *dp);
This function reads the next entry within a directory being searched.
[Note]
This function is modeled after the POSIX readdir() function. See the Prior Art section for further discussion.
Argument dp points to an object containing directory search context information that was created by a prior call to opendir(). If the pointer is null or if the object has been used as an argument in a call to closedir(), the behavior is undefined.
If successful, a pointer to a structure is returned, where the structure contains information about the next directory entry within the directory search context. The position of the search is advanced to the next directory entry.
The means by which the structure is allocated is unspecified. Subsequent calls to this function invalidate any previously returned pointer values, as does a subsequent call to rewinddir() or closedir().
[Note]
Presumably, the returned structure is allocated by either the readdir() or the opendir() function.A call to readdir() renders any previously returned dirent structure invalid, which implies that pointers to such structures should not be kept beyond one call to readdir().
A subsequent call to rewinddir() or closedir() also renders any previously returned dirent structure pointer invalid.
An earlier version of this proposal defined this function as accepting a second argument, a pointer to a dirent structure, which was filled with information about the next entry found within the directory search.
The problem with that earlier approach is that it requires the dirent structure to be as large as possible in order to contain the largest entry name (filename) allowed by the implementation. The latter alternative approach of returning a pointer to a dirent object allows the object to be only as large as necessary to hold the found entry name.
Of course, this latter approach requires more strict behavior with regards to what happens to the dirent objects once another call is made to readdir(), rewinddir(), or closedir(). The most reasonable approach is to mandate that the storage for all such previously returned dirent objects are no longer valid after subsequent calls to these functions. This gives implementations some latitude, allowing them to allocate such objects as part of the DIR object storage, or to manage them as some other kind of separately allocated and deallocated objects. Presumably, calls to readdir() et al manage the allocations and deallocations appropriately, and a final call to closedir() deallocates all such objects for a given directory search context.
On failure, or if there are no more entries to be found in the directory search, the function returns a null pointer after modifying the value of errno, and the position of the search is left in an indeterminate state.
#include <stddir.h> int rewinddir(DIR *dp);
This function resets the search position of a given directory search context object to its initial position.
[Note]
This function operates on directory searches in an fashion analogous to the way that rewind() operates on I/O streams.This function is modeled after the POSIX rewinddir() function.
However, whereas the POSIX function does not return a value, this function returns a value indicating success or failure. This is a minor difference, though. See the Prior Art section for further discussion.
[Note]
What constitutes the "initial position" of a directory is implementation-defined.
Argument dp points to an object containing directory search context information that was created by a prior call to opendir(). If the pointer is null or the object has been the subject of a prior call to closedir(), the behavior is undefined.
If successful, the directory search context object is modified and the function returns a non-negative value.
Any dirent structure pointers returned by prior calls to readdir() for the same directory search context are rendered invalid.
On failure, the function returns a negative value after modifying the value of errno.
6. Examples |
The following examples illustrate the use of the functions described in this proposal.
The following function searches a given directory, printing the names of the entries contained within it.
#include <stddir.h> #include <stdio.h> int print_dir(const char *dir) { DIR * dp; const struct dirent * ent; int cnt; // Open the given directory dp = opendir(dir); if (dp == NULL) return -1; // Search the directory printf("%s:\n", dir); cnt = 0; while (ent = readdir(dp), ent != NULL) { cnt++; printf("%3d %s\n", cnt, ent->d_name); } // Terminate the search closedir(dp); dp = NULL; printf("Entries: %d\n", cnt); return cnt; }
The following function creates a subdirectory named "logs" within the program's current directory, then creates and opens a file named "mylog" within the new subdirectory.
#include <stddir.h> #include <stdio.h> FILE * open_logfile(const char *dir) { char fname[FILENAME_MAX]; // Create a subdirectory if (createdir("logs") < 0) return NULL; // Create a logfile in the new subdirectory if (mkfilename(fname, sizeof(fname), "logs", "mylog") < 0) return NULL; return fopen(fname, "a"); }
The following function removes all of the entries within a given directory. (Note that this code uses functions, types, and constants defined in another related proposal [5].)
#include <stdbool.h> #include <stddir.h> #include <stdio.h> bool clean_dir(const char *dir) { DIR * dp; const struct dirent * ent; // Do a directory search dp = opendir(dir); if (dp == NULL) goto fail; // Find and remove all entries from the directory while (ent = readdir(dp), ent != NULL) { struct _fileinfo info; char entname[FILENAME_MAX]; // Remove the next directory entry if (mkfilename(entname, sizeof(entname), dir, ent->d_name) < 0) goto fail; if (_getfileinfo(entname, &info) < 0) goto fail; switch (info.fi_type) { case _FILE_TYPE_DIR: // Remove a subdirectory entry if (! clean_dir(entname)) goto fail; if (remove(entname) != 0) goto fail; break; case _FILE_TYPE_FILE: // Remove a regular file entry if (remove(entname) != 0) goto fail; break; default: // Nonstandard directory entry type if (remove(entname) != 0) goto fail; break; } } // Done closedir(dp); return true; fail: // Failure if (dp != NULL) closedir(dp); return false; }
The following function copies all of the files within a given directory to another directory. (Note that this code uses functions, types, and constants defined in a related proposal [5].)
#include <stdbool.h> #include <stddir.h> #include <stdio.h> extern void copy_file(const char *srcfname, const char *dstfname); bool copy_dir(const char *src_dir, const char *dst_dir) { DIR * srcp; const struct dirent * ent; // Do a directory search srcp = opendir(src_dir); if (srcp == NULL) goto fail; // Find and copy all file entries in the directory while (ent = readdir(srcp), ent != NULL) { struct _fileinfo info; char srcfname[FILENAME_MAX]; char dstfname[FILENAME_MAX]; // Copy the next directory entry if (mkfilename(srcfname, sizeof(srcfname), src_dir, ent->d_name) < 0) goto fail; if (_getfileinfo(srcfname, &info) < 0) goto fail; switch (info.fi_type) { case _FILE_TYPE_FILE: // Copy a regular file entry if (mkfilename(dstfname, sizeof(dstfname), dst_dir, ent->d_name) < 0) break; copy_file(srcfname, dstfname); break; case _FILE_TYPE_DIR: default: // Not a file entry, ignore it break; } } // Done closedir(srcp); return true; fail: // Failure if (srcp != NULL) closedir(srcp); return false; }
The following functions extract and print all of the components of the current directory name.
#include <stddir.h> #include <stdio.h> void print_dir_names(const char *path) { char name[FILENAME_MAX]; // Display the parent directory component names first if (getdirname(name, sizeof(name), path) > 0) print_dir_names(name); else printf("[%s]", path); // Display the last component name if (getfilename(name, sizeof(name), path) > 0) printf("[%s]", name); } void get_dir_names(void) { char path[FILENAME_MAX]; // Get the current directory if (getcurrdir(path, sizeof(path) < 0) { printf("unknown\n"); return; } else print_dir_names(path); }
The following function prints all the filename entries within the current directory that match a given filename pattern.
#include <stddir.h> #include <stdio.h> void list_matching_files(const char *pat) { char cwd[FILENAME_MAX]; DIR * dp; const struct dirent * ent; // Get the current directory if (getcurrdir(cwd, sizeof(cwd) < 0) { printf("Can't get the current directory\n"); return; } // Search the directory for matching filenames dp = opendir(cwd); if (dp == NULL) { printf("Can't search the current directory: %s\n", cwd); return; } // Find matching filenames in the directory while (ent = readdir(dp), ent != NULL) { if (matchfilename(ent->d_name, pat) >= 0) printf("%s\n", ent->d_name); } // Stop searching closedir(dp); }
7. Prior Art |
The following items describe various existing implementations that provide directory searching capabilities and the various differences and problems with them.
/dir1/dir2/file.ext
Each directory and file name component is separated by a slash (/). Path names can contain almost any character, including spaces and unprintable control characters. Both file and directory names follow the same naming syntax, and cannot be distinguished by form alone.
If the path name starts with a leading slash, it is known as an absolute path name, and specifies a unique file or directory name. If the path name does not start with a leading slash, it is known as a relative path name, and specifies a file or directory name relative to the current directory.
POSIX (and Unix) also allows other I/O device types to be named in this fashion, including:
d:\dir1\dir2\file.ext
Path names may have a leading drive letter (C:) that specifies a local disk drive or device.
Each directory and file name component is separated by a backslash (\), but the operating system also supports slashes (/) as separators (which are recognized by the fopen() function). Path names can contain almost any character, including spaces and unprintable control characters, but cannot contain certain special characters (< > | ? * ") used by the command shell. Filenames are stored as 16-bit Unicode names internally within the file system. Both file and directory names follow the same naming syntax, and cannot be distinguished by form alone.
If the path name starts with a leading slash, it is known as an absolute path name, and specifies a unique file or directory name. If the path name does not start with a leading slash, it is known as a relative path name, and specifies a file or directory name relative to the current directory.
Network directories and devices path names are prefixed with a two slashes and the node name:
\\node\d:\dir\dir\file.ext
The file extension (suffix) generally indicates the type and format of the file, and the Win32 operating system associates specific applications to each registered file type. For example, foo.txt specifies a text file, and foo.exe specifies an executable binary program file.
node::device:[directory]file.ext;vswhich are composed of one or more of the following components:
Some example VMS file names:
ACS001::SYSLIB:[STD.RUNTIME.LIBS]CRT.LIB;12 USR:[USERS.HOME.DRT]FOO.C [-.INCLUDE]FOO.H PHONE.DAT;-1 HOSTS.TXT
VMS provides the capability for searching for entry names within directories. This functionality is provided as system library functions, and also as built-in procedures within the command line interface (CLI).
Datasets have names like USERS.SMITHG.R2V4.SOURCE.C(FOO). Such a name can be construed as designating the file named "FOO" residing within the directory named "USERS.SMITHG.R2V4.SOURCE.C".
POSIX provides an opaque DIR type for directory searches, analogous to the FILE type for I/O streams, and a set of functions for searching directories.
The contents of a typical Unix <dirent.h> header file look something like the following (although features marked with an (E) are not defined by POSIX 1003.1-1998 and might not be present in some implementations):
// <dirent.h> typedef void * DIR; // Opaque type struct dirent { unsigned long d_fileno; // Entry file number (E) unsigned short d_reclen; // Length of this struct (E) unsigned short d_namlen; // Length of d_name (E) char d_name[MAXNAMLEN+1]; // Entry name }; extern DIR * opendir(const char *path); extern struct dirent * readdir(DIR *dirp); extern long int telldir(DIR *dirp); // (E) extern void seekdir(DIR *dirp, long loc); // (E) extern void rewinddir(DIR *dirp); extern int closedir(DIR *dirp); extern int dirfd(DIR *dirp); // (E)
Some Unix implementations (e.g., BSD 4.4) used the structure tag direct instead of dirent prior to the standardization of POSIX.
The opendir() function initiates a directory search, creating a DIR object.
The DIR object is to directories what the FILE object is to files. This object contains directory search context information, and is passed to subsequent calls to the other directory searching functions, finally being destroyed by a call to closedir().
Section §B.5.1.2 of POSIX 1003.1-1988 describes the rationale for not providing the seekdir() and telldir() functions. It was felt that some existing files systems were not amenable to supporting such functions (i.e., where an integer position indicator did not fit some file system models), and that these functions were not really all that useful. Only the rewinddir() function was provided in POSIX, on the assumption that it was useful.
The POSIX function (which is typically implemented as a macro defined using the seekdir() function) does not return a value. However, it seems like a more orthogonal design to allow it to return a value indicating success or failure, in a fashion similar to fseek().
This is a minor change to POSIX, and should not affect any existing code.
For these reasons, this proposal suggests some, but not all, of the same functions defined by POSIX.
This approach requires more complicated programming logic in order to achieve a typical directory searching loop, because the first directory entry is handled separately from subsequent entries.
The simpler POSIX approach (initialize, then find) was chosen for this proposal. The proposed functions, however, can be written in terms of system-dependent functions that implement this other scheme (initialize and find first, then find next) fairly easily.
// <winbase.h> struct _WIN32_FIND_DATA { unsigned int dwFileAttributes; struct _FILETIME ftCreationTime; struct _FILETIME ftLastAccessTime; struct _FILETIME ftLastWriteTime; unsigned int nFileSizeHigh; unsigned int nFileSizeLow; char cFileName[MAX_PATH]; char cAlternateFileName[14]; }; extern HANDLE FindFirstFile( const char *lpFileName, struct _WIN32_FIND_DATA *lpFindFileData); extern bool FindNextFile( HANDLE hFindFile, struct _WIN32_FIND_DATA *lpFindFileData);
The searching functions require the use of a wildcard filename, which specifies the subset of directory entry names to match.
Win32 also provides a second set of library functions for searching directories, presumably for backward compatibility with MS-DOS programs:
// <io.h> struct _finddata_t { unsigned attrib; time_t time_create; // -1 for FAT file systems time_t time_access; // -1 for FAT file systems time_t time_write; unsigned long size; char name[260]; }; extern long _findfirst(const char *name, struct _finddata_t *ctx); extern int _findnext(long hdl, struct _finddata_t *ctx); extern int _findclose(long hdl);
// <direct.h> extern int chdir(const char *dir); extern char * getcwd(char *buf, int max); extern int mkdir(const char *dir); // Incompatible extern int rmdir(const char *dir);
Also provided are wide character (wchar_t) versions of these functions:
// <direct.h> extern int _wchdir(const wchar_t *dir); extern wchar_t * _wgetcwd(wchar_t *dir, int max); extern wchar_t * _wgetdcwd(int dev, wchar_t *dir, int max); extern int _wmkdir(const wchar_t *dir); extern int _wrmdir(const wchar_t *dir);
Win32 also provides functions for retrieving and setting the current directory path and the current disk drive:
// <direct.h> extern int _chdrive(int dev); extern char * _getdcwd(int dev, char *dir, int max); extern int _getdrive(void); extern unsigned long _getdrives(void); extern unsigned _getdiskfree(unsigned dev, struct _diskfree_t *info);
The POSIX version of mkdir() takes a second argument in addition to the name of the directory to be created, specifying a permissions (or mode) bitmask. This bitmask, along with the executing program's user permissions mask (umask) defines the resulting access permissions for the newly created directory. Accepting such an argument is somewhat POSIX-centric, and there is no concept of access permissions defined in ISO C.
Other implementations support variants of the mkdir() function. For instance, Microsoft Visual C/C++ provides a function taking a single argument, and hence is incompatible with the POSIX version. This makes it difficult to design a single mkdir() function that is compatible across most implementations.
It is for these reasons that standardizing the POSIX mkdir() function is not a good idea, and that a more reasonable approach is to invent a new function (createdir()) that is simple and generic enough for all implementations.
Most implementations allow a directory to be removed only if it is empty, i.e., if it contains no file or subdirectory entries. The process attempting to remove a directory must also have the appropriate access permissions to do so.
BSD Unix (4.4) allows only the superuser (root) to delete a directory. This means that the standard remove() function does not delete an argument designating a directory for arbitrary users, even if they own the directory. This appears to be problem with the BSD implementation, rather than a flaw in the specification of remove().
It might be desirable to add a function to delete a directory, to provide a complement to the operation of deleting a file.
[Note]
A possible definition for such a function could look like:#include <stddir.h> int removedir(const char *dir);where the dir argument contains the name of a directory in some implementation-defined form.
On success, the directory is removed from the system and the function returns a positive value, otherwise the function modifies errno and returns a negative value.
It is implementation-defined whether or not the directory must be empty in order to be removed.
If such a function were added to ISO C, the semantics of the existing remove() function would probably need to be amended.
Win32 provides the ability to specify a directory search for filenames with wildcards, acting as a filter to select a subset of directory entries whose names match the wildcard filename.
VMS provides the ability to specify a directory search for filenames using wildcards. This functionality is provided as system library calls as well as built-in procedures in the command line interface (CLI).
These common capabilities are reflected in the proposed matchfilename() function, which is defined in a very generic fashion.
However, there do not appear to be any implementations that provide a function for accessing this information. Indeed, it is very common for programs to assume some kind of explicitly hard-coded value to serve this purpose (e.g., POSIX uses "/").
A possible definition for such a function could look like:
int getrootdir(char *dir, size_t max);
It is not clear how useful such a function would be, or more importantly, how universal the function would be across implementations.
Source Code |
Proof-of-concept source code is contained in these files:
Acknowledgements |
The author wishes to express his gratitude to those who provided comments, suggestions, and criticism on this proposal.
Further discussion can be found on the comp.std.c newsgroup, under the subject of "C0X: Directory access funcs".
References |
Revision History |
This document is in the public domain. Permission is granted to freely redistribute, copy, or reference this document.
This document: http://david.tribble.com/text/c0xdir.html.
Author's email:
david@tribble.com.
Author's home page:
http://david.tribble.com.