to: "Clive D.W. Feather" cc: "ANSI/ISO C Committee" ======================= Cover sheet starts here ============ Document Number: WG14 N___/X3J11 __-___ C9X Revision Proposal ===================== Title: 'const' String Literals Author: David R. Tribble Author Affiliation: (Self) Postal Address: ******************** E-mail Address: dtribble@technologist.com, david.tribble@beasys.com, dtribble@flash.net Telephone Number: +1 972 738 6125 Fax Number: +1 972 738 6111 Sponsor: ________________________________________ Date: 1997-07-23 Proposal Category: __ Editorial change/non-normative contribution __ Correction X_ New feature __ Addition to obsolescent feature list __ Addition to Future Directions __ Other (please specify) _____________________________ Area of Standard Affected: __ Environment X_ Language __ Preprocessor __ Library __ Macro/typedef/tag name __ Function __ Header __ Other (please specify) _____________________________ Prior Art: Several compilers already do this; some actually place string literal in read-only data segments. C++ also defines string literals as 'const' arrays. Target Audience: All C programmers. Related Documents (if any): (None) Proposal Attached: X_ Yes __ No, but what's your interest? Abstract: The constraint that string literals are 'const' objects. ======================= Cover sheet ends here ============== Proposal: Under the current standard, string literals are defined as having type 'char[N+1]', where N is the number of characters specified in the string literal. This proposal modifies that definition so that string literals are of type 'const char[N+1]'. The same would also hold for wide string literals, i.e., a wide string literal would be defined as having type 'const wchar_t[N+1]'. Section [6.1.4] of the standard would be amended to read: ... In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals.[24] The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type 'const char', and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type 'const wchar_t', and are initialized with the sequence of wide characters corresponding to the multibyte character sequence. Identical string literals of either form need not be distinct. If the program attempts to modify a string literal of either form, the behavior is undefined. [This paragraph is unchanged.] Rationale: String literals ought to be considered non-modifiable objects; considering them as non-const (mutable) object invites disaster. Compilers that enforce the const-ness of string literals would help prevent subtle bugs from occurring which doubtless occur in programs that exist today. Library Changes: No changes are necessary to the standard library, since all of the standard functions that can be passed a string literal as an unalterable parameter are defined as taking a 'const char *' or 'const void *' parameter. (It is true, however, that some exiting programs play fast and loose by passing string literals as the *modifable* arguments of standard library functions. See the next section below for more discussion.) Effects on Existing Code: The new 'const' string constraint cannot be considered a "silent change". The constraint has the potential to break existing programs. Specifically, existing programs may be violating the new constraint in these ways: 1) String literals being assigned to non-const pointers. This will be detected by quality compilers as an attempt to assign a 'const' pointer to a non-'const' pointer. Attempts to modify the contents of such literals through the non-'const' pointers are currently regarded as "undefined behavior". This can be fixed by: A) changing the pointer variables to be 'const' pointers (which may cause other erroneous const/non-const assigments to come to light in a cascade of errors), or B) add a typecast to the string literal to remove its const-ness (which should, at worst, result in only a warning from the compiler). 2) String literals being passed to functions taking a 'char *' or 'char[]' parameter rather than a 'const char *' or 'const char[]' parameter. This will not be detected by the compiler if the function has no prototype in scope at the time of the call. It will be detected, though, if a function prototype is in scope, causing a warning or error to be issued by quality compilers. Functions that actually modify the contents of their non-'const' pointer parameters are currently regarded as exhibiting "undefined behavior". Passing a parameter to a function is equivalent to assignment, so this problem can be fixed in the same ways that (1) is fixed. Programs that pass literals to the standard library functions as modifiable parameters are already in the territory of "undefined behavior". Unfortunately, nothing in the existing standard requires such practices to result in a compiler diagnostic. Sections [6.3.16.1] and [6.5.3] of the Standard cover assignment compatibility between 'const' and non-'const' types. Section [6.3.2.2] covers function call expressions. Language Compatibility: The C++ language already imposes the constraint that string literals are 'const' objects. (See the C++ committee draft, section [2.13.4].) Some existing C compilers already treat string literals as 'const' data. Some even place string literal data in read-only memory areas; attempts to modify the contents of string literals result in memory access exceptions (a.k.a. segmentation violations or bus errors). Comments: Since this proposal would break some existing programs, it may be considered too drastic a change and thus unacceptable at this point in time. An alternative is to add the 'const' constraint as a "Future Directions" item in the Standard (specifically, the practice of considering string literals as modifiable objects would be an obsolescent language item); programs could then be migrated towards the safer 'const' practice over time. String literals would eventually be defined as truly 'const' objects in a later edition of the Standard. On the other hand, the programs that this proposal would break are currently considered as exhibiting "undefined behavior". Tightening up this one loophole in the language would cast those programs over the fence and into "constraint violation" territory. This proposal also some weight behind it in that some compiler vendors already enforce it. Future Considerations: (See the previous section.) References: ANSI/ISO C Standard, 9899-1990, section [6.1.4]. ---, section [6.3.16.1]. ---, section [6.5.3]. ---, section [6.3.2.2]. ANSI/ISO C++ committee draft 2 (CD2, 1997-01-09), section [2.13.4]. ====================== End of Proposal =====================