* UNICODE - non-Asian @ 1998-05-20 0:00 William A Whitaker 1998-05-20 0:00 ` Robert Dewar 0 siblings, 1 reply; 10+ messages in thread From: William A Whitaker @ 1998-05-20 0:00 UTC (permalink / raw) If I can re-pulse the group on the original question. My main problem is Greek and Hebrew, not Japanese. I agree that the Japanese are not likely to favor UNICODE. I get the impression that there is no applicable experience here. If I am wrong, please tell me. Whitaker ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UNICODE - non-Asian 1998-05-20 0:00 UNICODE - non-Asian William A Whitaker @ 1998-05-20 0:00 ` Robert Dewar 1998-05-22 0:00 ` Robert I. Eachus 0 siblings, 1 reply; 10+ messages in thread From: Robert Dewar @ 1998-05-20 0:00 UTC (permalink / raw) Bill, you asked: <<If I can re-pulse the group on the original question. My main problem is Greek and Hebrew, not Japanese. I agree that the Japanese are not likely to favor UNICODE. I get the impression that there is no applicable experience here. If I am wrong, please tell me. >> Greek uses an 8-bit code. It is one of the family of 8-bit codes of which Latin-1 is an example. Generally any compiler will support use in a Greek environment without much fiddling. In addition, GNAT provides the option of using Latin-1/Latin-2/Latin-3/Latin-4 as well as the IBM PC set (both code pages 437 and 850) for identifiers. This is a non-standard feature (although this kind of non-standard capability is very much anticipated by 3.5.2(4): Implementation Permissions 4 In a nonstandard mode, an implementation may provide other interpretations for the predefined types Character and Wide_Character, to conform to local conventions. ) The main effect of selecting one of these options in GNAT (they are fully documented in the GNAT documentation) is that you get proper recognition of the full set of "letters" with proper upper/lower case equivalence. I am not sure what encodings are standard for Hebrew, someone here should know. I am a little suprised if we don't support it already, since we have a number of customers in Israel, and this subject has not come up :-) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UNICODE - non-Asian 1998-05-20 0:00 ` Robert Dewar @ 1998-05-22 0:00 ` Robert I. Eachus 1998-05-22 0:00 ` Markus Kuhn ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Robert I. Eachus @ 1998-05-22 0:00 UTC (permalink / raw) In article <dewar.895708219@merv> dewar@merv.cs.nyu.edu (Robert Dewar) writes: > I am not sure what encodings are standard for Hebrew, someone here should > know. I am a little suprised if we don't support it already, since we > have a number of customers in Israel, and this subject has not come up :-) Latin/Hebrew ISO/IEC 8859-8 of course. There are ten defined sub-parts of 8859, Latin-1 is the first, covering almost all of Western Europe, while Latin-2 is used by some central European countries, etc. There are four sets which cover other scripts, Latin/Cyrillic, Latin/Greek, Latin/Arabic, and Latin/Hebrew. The "parts" of 8859 and their names: ISO/IEC 8859-1:1998 Latin 1 (Yes, that is 1998!) ISO/IEC 8859-2:1987 Latin 2 ISO/IEC 8859-3:1988 Latin 3 ISO/IEC 8859-4:1988 Latin 4 ISO/IEC 8859-5:1988 Latin/Cyrillic ISO/IEC 8859-6:1987 Latin/Arabic ISO/IEC 8859-7:1987 Latin/Greek ISO/IEC 8859-8:1988 Latin/Hebrew ISO/IEC 8859-9:1989 Latin 5 ISO/IEC 8859-10:1992 Latin 6 Currently, there are three new parts under DIS balloting, 13, 14,and 15. They are Latin 7, Latin 8 (Celtic), and Latin 0, respectively. Parts 2 through 10 currently are also being revised, but I think that these revisions, and the recent revision to Latin-1 were to bring the documents up to date without significant changes. Incidentally, ISO 10646-1:1993 (Basic Multilingual Plane/Unicode) now has two corrigenda and 19 amendments. Aren't standards wonderful! Trivia question: What English letters were removed from Latin-1 in the original 1987 version? Real Trivia question: What English letter is not in Unicode? -- Robert I. Eachus with Standard_Disclaimer; use Standard_Disclaimer; function Message (Text: in Clever_Ideas) return Better_Ideas is... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UNICODE - non-Asian 1998-05-22 0:00 ` Robert I. Eachus @ 1998-05-22 0:00 ` Markus Kuhn 1998-05-25 0:00 ` Samuel Tardieu 1998-05-26 0:00 ` Robert I. Eachus 1998-05-23 0:00 ` Robert Dewar 1998-06-01 0:00 ` Norman H. Cohen 2 siblings, 2 replies; 10+ messages in thread From: Markus Kuhn @ 1998-05-22 0:00 UTC (permalink / raw) Robert I. Eachus wrote: > Trivia question: What English letters were removed from Latin-1 in > the original 1987 version? The oe and OE ligature (where we have now � and �)? > Real Trivia question: What English letter is not in Unicode? The copyleft sign? Markus -- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UNICODE - non-Asian 1998-05-22 0:00 ` Markus Kuhn @ 1998-05-25 0:00 ` Samuel Tardieu 1998-05-26 0:00 ` Robert I. Eachus 1 sibling, 0 replies; 10+ messages in thread From: Samuel Tardieu @ 1998-05-25 0:00 UTC (permalink / raw) >>>>> "Markus" == Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk> writes: Markus> The oe and OE ligature (where we have now � and �)? Fortunately, these ligatures (well, not everyone agrees that they are ligatures in french) will be included in the forthcoming Latin-0. Sam -- Samuel Tardieu -- sam@ada.eu.org ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UNICODE - non-Asian 1998-05-22 0:00 ` Markus Kuhn 1998-05-25 0:00 ` Samuel Tardieu @ 1998-05-26 0:00 ` Robert I. Eachus 1 sibling, 0 replies; 10+ messages in thread From: Robert I. Eachus @ 1998-05-26 0:00 UTC (permalink / raw) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 700 bytes --] In article <356603AC.79DB5014@cl.cam.ac.uk> Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk> writes: > The oe and OE ligature (where we have now � and �)? Yep! > Real Trivia question: What English letter is not in Unicode? The copyleft sign? No, oe with diaeresis (two dots) over the o. It appears very rarely, but in one case, a village in Brittany, it appears with the O capitalized, and the e lower case. (When AE or OE appear as the first letter in a capitalized English word, it is always the case that both are capitialized.) -- Robert I. Eachus with Standard_Disclaimer; use Standard_Disclaimer; function Message (Text: in Clever_Ideas) return Better_Ideas is... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UNICODE - non-Asian 1998-05-22 0:00 ` Robert I. Eachus 1998-05-22 0:00 ` Markus Kuhn @ 1998-05-23 0:00 ` Robert Dewar 1998-05-24 0:00 ` Ronald Cole 1998-06-01 0:00 ` Norman H. Cohen 2 siblings, 1 reply; 10+ messages in thread From: Robert Dewar @ 1998-05-23 0:00 UTC (permalink / raw) Robert Eachus quotes from the standard ISO/IEC 8859-1:1998 Latin 1 (Yes, that is 1998!) ISO/IEC 8859-2:1987 Latin 2 ISO/IEC 8859-3:1988 Latin 3 ISO/IEC 8859-4:1988 Latin 4 ISO/IEC 8859-5:1988 Latin/Cyrillic ISO/IEC 8859-6:1987 Latin/Arabic ISO/IEC 8859-7:1987 Latin/Greek ISO/IEC 8859-8:1988 Latin/Hebrew ISO/IEC 8859-9:1989 Latin 5 ISO/IEC 8859-10:1992 Latin 6 Note that in practice an Ada compiler that supports Latin-1 can be used perfectly well for any of these subparts of the standard. In response to some input command you type in Latin/Arabic as 8-bit codes, and it gets stored internally as some gobbledygook Latin-1 stuff. But since you write your character and string literals with the same translation, everything is fine. There are only two problems in practice: The package Ada.Characters.Latin_1 is of limited use, e.g. its idea of what a letter is is not useful. Of course you can write your own, or perhaps your vendor wlil supply an analogous package. You can't use everything you think are letters in identifiers, and upper/lower case equivalence may be peculiar (for example it may make two "letters" that are quite distinct to you, be treated as the same in identifiers). It may be that the vendor supplies non-standard modes in which other codes than Latin-1 are recognized for identifiers, in which case you can write (potentially non-portable) code taking advantage of this. In the absence of such special non-standard modes, or if you are concerned about writing portable code, then you can simply stick to the lower half of the ISO definition, which is the same in most parts. In GNAT, we have not bothered to provide alternatives to the Latin_1 packages in the runtime, no one, not even a user of the public version, has ever suggested that they wanted this, so the demand is close to zero. We do provide non-standard modes for identifiers: @item 1 Latin-1 identifiers @item 2 Latin-2 letters allowed in identifiers @item 3 Latin-3 letters allowed in identifiers @item 4 Latin-4 letters allowed in identifiers @item p IBM PC letters (code page 437) allowed in identifiers @item 8 IBM PC letters (code page 850) allowed in identifiers @item f Full upper-half codes allowed in identifiers @item n No upper-half codes allowed in identifiers @item w Wide-character codes allowed in identifiers @end table I put in the Latin-1/2/3/4 one day when I had nothing else I felt like doing. I doubt that other than Latin-1 have ever been used. I also put in page 437 PC stuff. A user commented that page 850 would be useful in Europe and supplied the tables, so I put that in. But I don't know if either have been used. The full upper-half option is useful in China, and has been used at least once there. THe no-upper half option is useful for ensuring portability. The wide characters option is useful in Japan and has been used at least a little bit there. If anyone wants to supply additional tables for identifiers (see csets.adb in the GNAT compiler sources), or additional alternative packages for Ada.Characters.Latin_1, we could certainly include them. I don't think this is the most urgent missing feature in GNAT :-) By the way, I want to report that Markus Kuhn supplied the information and a start towards the coding for recognizing UTF-8 in GNAT, and I have just completed that coding, so GNAT will now fully support UTF-8, thanks Markus for this contribution! Robert Dewar Ada Core Technologies ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UNICODE - non-Asian 1998-05-23 0:00 ` Robert Dewar @ 1998-05-24 0:00 ` Ronald Cole 1998-05-25 0:00 ` Robert Dewar 0 siblings, 1 reply; 10+ messages in thread From: Ronald Cole @ 1998-05-24 0:00 UTC (permalink / raw) dewar@merv.cs.nyu.edu (Robert Dewar) writes: > The full upper-half option is useful in China, and has been used at > least once there. Would that be in the code controlling the guidance systems on the thirteen nukes they've purported pointed at the US? ;) -- Forte International, P.O. Box 1412, Ridgecrest, CA 93556-1412 Ronald Cole <ronald@forte-intl.com> Phone: (760) 499-9142 President, CEO Fax: (760) 499-9152 My PGP fingerprint: E9 A8 E3 68 61 88 EF 43 56 2B CE 3E E9 8F 3F 2B ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UNICODE - non-Asian 1998-05-24 0:00 ` Ronald Cole @ 1998-05-25 0:00 ` Robert Dewar 0 siblings, 0 replies; 10+ messages in thread From: Robert Dewar @ 1998-05-25 0:00 UTC (permalink / raw) Ronald COle asks <<> The full upper-half option is useful in China, and has been used at > least once there. Would that be in the code controlling the guidance systems on the thirteen nukes they've purported pointed at the US? ;) >> Not unless these systems are run on PC's using WIndows 95, which seems unlikely despite Microsoft's interest in being the supplier of everyones operating system. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: UNICODE - non-Asian 1998-05-22 0:00 ` Robert I. Eachus 1998-05-22 0:00 ` Markus Kuhn 1998-05-23 0:00 ` Robert Dewar @ 1998-06-01 0:00 ` Norman H. Cohen 2 siblings, 0 replies; 10+ messages in thread From: Norman H. Cohen @ 1998-06-01 0:00 UTC (permalink / raw) Robert I. Eachus wrote: > > In article <dewar.895708219@merv> dewar@merv.cs.nyu.edu (Robert Dewar) writes: > > > I am not sure what encodings are standard for Hebrew, someone here should > > know. I am a little suprised if we don't support it already, since we > > have a number of customers in Israel, and this subject has not come up :-) > > Latin/Hebrew ISO/IEC 8859-8 of course. That is correct. It is in common use, for example, in Hebrew web pages. (I must say, however, that I find Bill Whitaker's characterization of Hebrew as a "non-Asian" language most curious! :-) ) -- Norman H. Cohen mailto:ncohen@watson.ibm.com http://www.research.ibm.com/people/n/ncohen ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~1998-06-01 0:00 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 1998-05-20 0:00 UNICODE - non-Asian William A Whitaker 1998-05-20 0:00 ` Robert Dewar 1998-05-22 0:00 ` Robert I. Eachus 1998-05-22 0:00 ` Markus Kuhn 1998-05-25 0:00 ` Samuel Tardieu 1998-05-26 0:00 ` Robert I. Eachus 1998-05-23 0:00 ` Robert Dewar 1998-05-24 0:00 ` Ronald Cole 1998-05-25 0:00 ` Robert Dewar 1998-06-01 0:00 ` Norman H. Cohen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox