To: Deborah Donovan From: David R Tribble on Wed, Jan 21, 1998 12:22 PM Subject: Comments on ISO/IEC 9899 (C9X) draft Message-Id: <2.2.32.19980121170940.00f01fac@central.beasys.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Wed, 21 Jan 1998 11:09:40 -0600 To: ddonovan@itic.nw.dc.us From: David R Tribble Subject: Comments on ISO/IEC 9899 (C9X) draft Public Comment Number(s) PC-____ to PC-____ ISO/IEC CD 9899 (SC22N2620) Public Comment =========================================== Date: 1998-01-21 Author: David R. Tribble Author Affiliation: Self Postal Address: 6004 Cave River Dr. Plano, TX 75093-6951 USA E-mail Address: dtribble@technologist.com david.tribble@central.beasys.com dtribble@flash.net Telephone Number: +1 972 738 6125, 16:00-00:00 UTC +1 972 964 1729, 01:00-05:00 UTC Fax Number: +1 972 738 6111 Number of individual comments: 2 ------------------------------------------------------------------------ Comment 2. Category: Request for clarification Committee Draft subsection: 5.1.1.2, 5.2.1, 6.1.2 Title: Source characters not allowed as UCNs Detailed description: Section 5.1.1.2 states that UCN codes representing characters in the source character set are not allowed within the source text. For example, the following fragment is illegal: int func(int i) { return \u0030; // \u0030 is '0' } int bar(int \u006A) // \u006A is 'j' { return \u006A + 1; } But this fragment is legal: int foo(int \u00E1) // \u00E1 is 'a'+accent { return \u00E1 * 2; } There is little difference in these fragments. What is the reason for the limitation on valid UCN codes? Conceivably, a Unicode text editor might store all the characters in a file as UCN sequences for maximum portability. Allowing most characters to be written as UCNs but requiring a few characters to be written strictly as 7-bit ISO-646 characters seems like an artificial restriction. A C compiler implementation could choose to convert all source characters into 16-bit (or even 32-bit) codes internally, preferring to convert UCNs into single internal codes as they are read. Why should it be prevented from accepting every alphanumeric ISO-10646 character, instead of every alphanumeric character /except/ 'a'-'z' et al? ------------------------------------------------------------------------