Subject: Bug Report for ANSI C Date: Mon, 20 Jan 1997 20:34:01 -0600 From: David R Tribble To: rex@aussie.com CC: dtribble@flash.net Defect Report, International Standard ISO/IEC 9899:1990, Programming Language C. --- Author: David R. Tribble Subject: Plain char as signed Phone: +1-972-964-1720, home, 23:00-05:00 GMT +1-972-738-6125, work, 15:00-00:00 GMT email: david.tribble@beasys.central.com, or dtribble@flash.net There appears to be a problem when an implementation is allowed to treat 'plain char' as 'signed char'. Consider the following program fragment: char c; int i; setlocale(LC_CTYPE, "ISO-8859-1"); /* Latin-1 ASCII, 8-bit */ c = '\xFF'; /* Latin-1 lowercase 'y' with umlaut */ if (isprint(c)) ... /* [1], Should be true */ i = c; /* Convert 'c' to int */ if (isprint(i)) ... /* [2], Should be true */ if (isprint('\xFF')) ... /* [3], Should be true */ i = EOF; if (isprint(i)) ... /* [4], Should be false */ Assume that the implementation supports the locale named "ISO-8859-1" and that it is the ISO-8859-1 character set (also known as 8-bit Latin-1 ASCII). Also assume that the implementation considers 'plain char' type to be 'signed char'. Also assume that 'EOF' is implemented as a value equal to -1. The isprint() expression in statement [1], if properly implemented, should return a value of 'true', since the Latin-1 character code '\xFF' is a printable character. Note that 'c' is sign-extended into the value -1 (at least on two's-complement machines). Similarly, expression [2] should return 'true'. The isprint() expression in statement [3] should also return a value of 'true'. Note that the integer constant '\xFF' will also be sign-extended into the value -1. However, the isprint() expression in statement [4] should return a value of 'false' since EOF is not a printable character (in any locale). This appears to pose a contradiction, specifically, that isprint('\xFF') (i.e., isprint(-1)) is true in the appropriate locale while at the same time isprint(EOF) (i.e., isprint(-1)) is false. Suggested Remedy: This scenario seems to imply that it would be more correct to define 'plain char' as implicitly 'unsigned char', rather than leaving it as an implementation choice between 'signed' and 'unsigned'. Thus, signed characters would have to be explicitly declared as 'signed char'. Is this a valid argument, or is there some work-around to the problem? (Note that casting the argument to unsigned char is not acceptable, since it causes the value EOF to appear as a printable character.) References: [6.1.2.5] ... An object declared as type char is large enough to store any member of the basic execution character set. If a member of the required source character set enumerated in 5.2.1 is stored in a char object, its value is guaranteed to be positive. If other quantities are stored in a char object, the behavior is implementation-defined; the values are treated as either signed or nonnegative integers. This allows an implementation to treat 'plain char' as 'signed char'. Characters outside the standard English character set, such as 'extended' Latin-1 ASCII characters, are explicitly 'implementation-defined' values. [6.1.3.4] ... If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int. I take this to mean that an implementation that treats 'char' as 'signed char' will sign-extend the constant '\xFF' into the int value -1 (at least on two's- complement CPUs). [7.3] ... In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall be equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined. I take this to mean that only the values '\x00' through '\xFF' (assuming an 8-bit implementation of char) and EOF can be passed as an argument to isprint() et al, all other values being undefined. [7.3] ... The behavior of these functions is affected by the current locale. I take this to mean that isprint() et al will operate "as expected" if the implementation properly implements the locale "ISO-8859-1" (or some other suitable name) following an appropriate call to setlocale(). [7.3] ... The term 'printing character' refers to a member of an implementation-defined set of characters, each of which occupies one printing position on a display device; ... [7.4.1.1] The setlocale() function ... ---