From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,43ab55a75a8b5d1 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news2.google.com!news4.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!nx01.iad01.newshosting.com!newshosting.com!newsfeed.icl.net!newsfeed.fjserv.net!colt.net!feeder.news-service.com!newsfeed.freenet.de!ecngs!feeder2.ecngs.de!news.osn.de!diablo2.news.osn.de!news.belwue.de!newsfeed.arcor.de!news.arcor.de!not-for-mail From: "Dmitry A. Kazakov" Subject: Re: System.WCh_Cnv Newsgroups: comp.lang.ada User-Agent: 40tude_Dialog/2.0.15.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Reply-To: mailbox@dmitry-kazakov.de Organization: cbb software GmbH References: <1nbqjel4blzuj$.obwkz78gfdph$.dlg@40tude.net> Date: Tue, 25 Jul 2006 15:36:42 +0200 Message-ID: NNTP-Posting-Date: 25 Jul 2006 15:36:42 MEST NNTP-Posting-Host: 2ca09ac3.newsread4.arcor-online.net X-Trace: DXC=YKPeV]=]o\XghFd\k@b23T:ejgIfPPldTjW\KbG]kaMXea\9g\;7NmUSW3;h_FolCU[6LHn;2LCV^7enW;^6ZC`TIXm65S@:3>_ X-Complaints-To: usenet-abuse@arcor.de Xref: g2news2.google.com comp.lang.ada:5918 Date: 2006-07-25T15:36:42+02:00 List-Id: On Tue, 25 Jul 2006 14:03:21 +0100, Marius Amado-Alves wrote: >> So I'm quite happy with UTF-8 and plain strings. > > I am more or less happy with this too [1], but I think we can do > better. With UTF-8 in strings the two abstractions (codepoints, > encodings) are too entangled for my taste. In rigour you cannot use > the standard string operations. Yes, not all of them. > I mean you can but must fiddle with > the encodings i.e. you are not searching for a codepoint but for a > particular encoding. Instead I want to be able to write things like > > for I in Str'Range loop > if Str (I) = Euro_Sign then ... > end loop; > > I cannot do that with UTF-8 in strings. I do it this way: declare Index : Integer := Str'First; Value : UTF8_Code_Point; begin while Index <= Str'Last loop Get (Str, Index, Value); if Euro_Sign then ... end loop; Actually if Ada had abstract array interfaces and inheritance we could have it in exactly the form you wrote it. Alas. Note that the pattern you refer is beyond just Unicode issues. Exactly the same problem exists in pattern matching: while Index <= Str'Last loop if Match (Str, Index, Pattern) then ... end loop; Basically it is a stream interface to strings with an ability to roll it back or, equivalently, to look ahead. > Note that Wide_Wide_String is > of little help here, because of the endianess issue. But it might be > a good idea to base Unico on Wide_Wide_String for closeness to the > standard. I prefer general solutions, like array interfaces. You have an opaque object. Add an array interface to it, which would return code points or Wide_x_100_Character or whatever you want. Here you are. > [1] What makes me happy about UTF-8 is that it seems to have become a > de facto default, common denominator encoding. Long live Linux! (:-)) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de