From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,327bb686c52ccfec X-Google-Attributes: gid103376,public From: dewar@merv.cs.nyu.edu (Robert Dewar) Subject: Re: UNICODE - non-Asian Date: 1998/05/23 Message-ID: #1/1 X-Deja-AN: 355834417 References: <35625647.1E85@erols.com> X-Complaints-To: usenet@news.nyu.edu X-Trace: news.nyu.edu 895929561 30623 (None) 128.122.140.58 Organization: New York University Newsgroups: comp.lang.ada Date: 1998-05-23T00:00:00+00:00 List-Id: Robert Eachus quotes from the standard ISO/IEC 8859-1:1998 Latin 1 (Yes, that is 1998!) ISO/IEC 8859-2:1987 Latin 2 ISO/IEC 8859-3:1988 Latin 3 ISO/IEC 8859-4:1988 Latin 4 ISO/IEC 8859-5:1988 Latin/Cyrillic ISO/IEC 8859-6:1987 Latin/Arabic ISO/IEC 8859-7:1987 Latin/Greek ISO/IEC 8859-8:1988 Latin/Hebrew ISO/IEC 8859-9:1989 Latin 5 ISO/IEC 8859-10:1992 Latin 6 Note that in practice an Ada compiler that supports Latin-1 can be used perfectly well for any of these subparts of the standard. In response to some input command you type in Latin/Arabic as 8-bit codes, and it gets stored internally as some gobbledygook Latin-1 stuff. But since you write your character and string literals with the same translation, everything is fine. There are only two problems in practice: The package Ada.Characters.Latin_1 is of limited use, e.g. its idea of what a letter is is not useful. Of course you can write your own, or perhaps your vendor wlil supply an analogous package. You can't use everything you think are letters in identifiers, and upper/lower case equivalence may be peculiar (for example it may make two "letters" that are quite distinct to you, be treated as the same in identifiers). It may be that the vendor supplies non-standard modes in which other codes than Latin-1 are recognized for identifiers, in which case you can write (potentially non-portable) code taking advantage of this. In the absence of such special non-standard modes, or if you are concerned about writing portable code, then you can simply stick to the lower half of the ISO definition, which is the same in most parts. In GNAT, we have not bothered to provide alternatives to the Latin_1 packages in the runtime, no one, not even a user of the public version, has ever suggested that they wanted this, so the demand is close to zero. We do provide non-standard modes for identifiers: @item 1 Latin-1 identifiers @item 2 Latin-2 letters allowed in identifiers @item 3 Latin-3 letters allowed in identifiers @item 4 Latin-4 letters allowed in identifiers @item p IBM PC letters (code page 437) allowed in identifiers @item 8 IBM PC letters (code page 850) allowed in identifiers @item f Full upper-half codes allowed in identifiers @item n No upper-half codes allowed in identifiers @item w Wide-character codes allowed in identifiers @end table I put in the Latin-1/2/3/4 one day when I had nothing else I felt like doing. I doubt that other than Latin-1 have ever been used. I also put in page 437 PC stuff. A user commented that page 850 would be useful in Europe and supplied the tables, so I put that in. But I don't know if either have been used. The full upper-half option is useful in China, and has been used at least once there. THe no-upper half option is useful for ensuring portability. The wide characters option is useful in Japan and has been used at least a little bit there. If anyone wants to supply additional tables for identifiers (see csets.adb in the GNAT compiler sources), or additional alternative packages for Ada.Characters.Latin_1, we could certainly include them. I don't think this is the most urgent missing feature in GNAT :-) By the way, I want to report that Markus Kuhn supplied the information and a start towards the coding for recognizing UTF-8 in GNAT, and I have just completed that coding, so GNAT will now fully support UTF-8, thanks Markus for this contribution! Robert Dewar Ada Core Technologies