From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.4 required=5.0 tests=AC_FROM_MANY_DOTS,BAYES_00 autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,e136d2bb18e6fb60 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2002-12-01 06:38:44 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news-out.visi.com!hermes.visi.com!newsfeed1.earthlink.net!newsfeed.earthlink.net!stamper.news.pas.earthlink.net!stamper.news.atl.earthlink.net!harp.news.atl.earthlink.net!not-for-mail From: "Marin David Condic" Newsgroups: comp.lang.ada Subject: Re: Character Sets (plain text police report) Date: Sun, 1 Dec 2002 09:38:18 -0500 Organization: MindSpring Enterprises Message-ID: References: <3DE9F24E.3010002@nbi.dk> NNTP-Posting-Host: d1.56.b7.7b X-Server-Date: 1 Dec 2002 14:38:43 GMT X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Xref: archiver1.google.com comp.lang.ada:31327 Date: 2002-12-01T14:38:43+00:00 List-Id: Jacob Sparre Andersen wrote in message news:3DE9F24E.3010002@nbi.dk... > > effort? (Is there much use out there for 32-bit characters?) > > Maybe not directly (except for in the far east), but there > is a rather large and growing indirect need for full support > for ISO-10646. > My understanding was that the 16 bit characters covered most of the practical uses one would find in modern languages. The reason for the 32 bit characters was to provide for things that might be truly obscure (Egyptian heiroglyphics and such) or other special character sets that may not be that big a deal if Ada didn't support it. > In Europe people are starting to switch from ISO-8859 > encodings to the UTF-8 encoding of ISO-10646. This means > that although people in practice seldom will use more than > the 470-something European characters, they will start to > expect to have access to use all of ISO-10646. > So possibly if there was some kind of variant of Text_IO that dealt with UTF-8 files, it might be useful. You'd need special data types and operations, but that wouldn't be insurmountable. Some set of packages that would be wrapped around UTF-8 as an extension to Ada or part of a standard Ada library might make sense. > > Agreed. One needs some kind of information about which > encoding is used - but that is already the case. The best > solution I can think of is to demand that the operating > system keeps track of the file type (including encoding for > text files). The second best solution is (IMHO) to > introduce a sensible common standard encoding. I don't know > if it should be UTF-8 or raw 32-bit ISO-10646. And I can > certainly not advice people to use the current procedure on > Unix systems, where each user chooses his/her assumed > encoding of text files. > You'd almost certainly want some indication from the OS that a file was a UTF-8 file. The "Form" parameter in the Text_IO.Open procedure would be the natural place to be specifying it, I'd think. Or if it was a set of new packages, the underlying implementation would want a means of checking that the file was of the appropriate type. The alternative is to dump it on the user's head - as one generally must with Unix OS's since files there tend to be viewed as a stream of bytes. "I ask you for a UTF-8 input file and if you give me a relational database file, well, that's your tough luck..." > > No. But it would be nice, if one could demand that > compilers can handle UTF-8 or raw 32-bit ISO-10646 encoded > source files. > That sounds like an implementation issue. (You're talking about the Ada compiler eating Ada source that is in UTF-8? No reason that can't be done without a language revision.) Otherwise, I'd think you could provide all the tools by creating a Wide_Wide_Character and Wide_Wide_String type and providing all the customary packages that would involve. From there, additional utility probably should come from a standard Ada library so that it could be enhanced and extended without formal language revision. MDC -- ====================================================================== Marin David Condic I work for: http://www.belcan.com/ My project is: http://www.jast.mil/ Send Replies To: m c o n d i c @ a c m . o r g "I'd trade it all for just a little more" -- Charles Montgomery Burns, [4F10] ======================================================================