From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,e136d2bb18e6fb60 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2002-12-01 22:45:05 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!news.tele.dk!news.tele.dk!small.news.tele.dk!fr.usenet-edu.net!usenet-edu.net!enst.fr!not-for-mail From: "Robert C. Leif" Newsgroups: comp.lang.ada Subject: RE: Character Sets (plain text police report) Date: Sun, 1 Dec 2002 22:44:24 -0800 Organization: ENST, France Sender: comp.lang.ada-admin@ada.eu.org Message-ID: Reply-To: comp.lang.ada@ada.eu.org NNTP-Posting-Host: marvin.enst.fr Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: avanie.enst.fr 1038811503 99666 137.194.161.2 (2 Dec 2002 06:45:03 GMT) X-Complaints-To: usenet@enst.fr NNTP-Posting-Date: Mon, 2 Dec 2002 06:45:03 +0000 (UTC) Return-Path: X-Envelope-From: rleif@rleif.com X-Envelope-To: X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4024 Importance: Normal In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Errors-To: comp.lang.ada-admin@ada.eu.org X-BeenThere: comp.lang.ada@ada.eu.org X-Mailman-Version: 2.0.13 Precedence: bulk List-Unsubscribe: , List-Id: comp.lang.ada mail<->news gateway List-Post: List-Help: List-Subscribe: , Errors-To: comp.lang.ada-admin@ada.eu.org X-BeenThere: comp.lang.ada@ada.eu.org Xref: archiver1.google.com comp.lang.ada:31334 Date: 2002-12-01T22:44:24-08:00 The problem is that wide character versions of Ada.Strings.Bounded and other string packages were not included in Ada 95. Although I have nothing against extending Text_Io, I strongly believe that many future applications will be based on XML. The simplest solution is to create an Ada API interface to the XML languages. These XML languages constitute a very rich GUI environment. The packages for this API can be also used to extend Text_Io. In principle, the equivalent of XML could be created in Ada; and probably would be better. Unfortunately, this is, at present, not economically feasible. However, Ada could drive all or part of an XML based windowing system. Since one can create XML schema with very close to Ada semantics, the Ada community should take advantage of this. I will talk on this subject at SIGAda 2002. Parenthetically, an Ada.Strings.Bounded with the character size as a generic type could permit the creation of 4 bit characters, Char_4. These Char_4s would be an elegant coding for DNA and RNA base sequences. Bob Leif -----Original Message----- From: comp.lang.ada-admin@ada.eu.org [mailto:comp.lang.ada-admin@ada.eu.org] On Behalf Of Marin David Condic Sent: Sunday, December 01, 2002 6:38 AM To: comp.lang.ada@ada.eu.org Subject: Re: Character Sets (plain text police report) Jacob Sparre Andersen wrote in message news:3DE9F24E.3010002@nbi.dk... > > effort? (Is there much use out there for 32-bit characters?) > > Maybe not directly (except for in the far east), but there > is a rather large and growing indirect need for full support > for ISO-10646. > My understanding was that the 16 bit characters covered most of the practical uses one would find in modern languages. The reason for the 32 bit characters was to provide for things that might be truly obscure (Egyptian heiroglyphics and such) or other special character sets that may not be that big a deal if Ada didn't support it. > In Europe people are starting to switch from ISO-8859 > encodings to the UTF-8 encoding of ISO-10646. This means > that although people in practice seldom will use more than > the 470-something European characters, they will start to > expect to have access to use all of ISO-10646. > So possibly if there was some kind of variant of Text_IO that dealt with UTF-8 files, it might be useful. You'd need special data types and operations, but that wouldn't be insurmountable. Some set of packages that would be wrapped around UTF-8 as an extension to Ada or part of a standard Ada library might make sense. > > Agreed. One needs some kind of information about which > encoding is used - but that is already the case. The best > solution I can think of is to demand that the operating > system keeps track of the file type (including encoding for > text files). The second best solution is (IMHO) to > introduce a sensible common standard encoding. I don't know > if it should be UTF-8 or raw 32-bit ISO-10646. And I can > certainly not advice people to use the current procedure on > Unix systems, where each user chooses his/her assumed > encoding of text files. > You'd almost certainly want some indication from the OS that a file was a UTF-8 file. The "Form" parameter in the Text_IO.Open procedure would be the natural place to be specifying it, I'd think. Or if it was a set of new packages, the underlying implementation would want a means of checking that the file was of the appropriate type. The alternative is to dump it on the user's head - as one generally must with Unix OS's since files there tend to be viewed as a stream of bytes. "I ask you for a UTF-8 input file and if you give me a relational database file, well, that's your tough luck..." > > No. But it would be nice, if one could demand that > compilers can handle UTF-8 or raw 32-bit ISO-10646 encoded > source files. > That sounds like an implementation issue. (You're talking about the Ada compiler eating Ada source that is in UTF-8? No reason that can't be done without a language revision.) Otherwise, I'd think you could provide all the tools by creating a Wide_Wide_Character and Wide_Wide_String type and providing all the customary packages that would involve. From there, additional utility probably should come from a standard Ada library so that it could be enhanced and extended without formal language revision. MDC -- ====================================================================== Marin David Condic I work for: http://www.belcan.com/ My project is: http://www.jast.mil/ Send Replies To: m c o n d i c @ a c m . o r g "I'd trade it all for just a little more" -- Charles Montgomery Burns, [4F10] ======================================================================