From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,85e8c53792269cfd X-Google-Attributes: gid103376,public From: dewar@merv.cs.nyu.edu (Robert Dewar) Subject: Re: Ada and UNICODE? Date: 1998/05/20 Message-ID: #1/1 X-Deja-AN: 355119949 References: <355CA32B.7B77@erols.com> <35622857.77912B4@cl.cam.ac.uk> X-Complaints-To: usenet@news.nyu.edu X-Trace: news.nyu.edu 895707557 19890 (None) 128.122.140.58 Organization: New York University Newsgroups: comp.lang.ada Date: 1998-05-20T00:00:00+00:00 List-Id: Markus said <JIS conversion tables on ftp.unicode.org in order to provide a conforming implementation. UTF-8 instead of EUC and Shift-JIS is clearly the right encoding to use here. >> A common misconception is that the reference manual has something to say about representation of source programs. That is ENTIRELY wrong, the standard has nothing whatsoever to say about the representation of source programs. So the claim that *any* program representation method violates the standard is simply wrong-at-the-start. When I chaired the CRG (which is the group attached to ISO WG9 that decided on these matters for Ada 9X), we found constant confusion on this issue. There is a requirement that any Ada 95 compiler have *some* representation for all possible programs. Clearly incomplete representations like EUC, and Shift-JIS, though exactly what a lot of users want, do not meet this requirement. So a compiler that had ONLY these methods would be non-compliant. However, GNAT supports a number of different encoding methods, and in particular the "brackets" notation (which is used for example in the distribution format of the ACVC tests) is complete and is supported. Just to emphasize how little the standard specifies here, an implementation that used B to represent the character A, and A to represent B would be highly annoying, but would not violate the standard. In fact this freedom is completely intentional, for example, it is expected that a compiler for Ada 95 on an IBM mainframe might accept *only* EBCDIC input, since such a decision would make perfectly reasonable sense in this environment.