comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: System.WCh_Cnv
Date: Tue, 25 Jul 2006 15:36:42 +0200
Date: 2006-07-25T15:36:42+02:00	[thread overview]
Message-ID: <f9r2o22pm0ot$.184kj5ela2gcb.dlg@40tude.net> (raw)
In-Reply-To: mailman.48.1153832611.30988.comp.lang.ada@ada-france.org

On Tue, 25 Jul 2006 14:03:21 +0100, Marius Amado-Alves wrote:

>> So I'm quite happy with UTF-8 and plain strings.
> 
> I am more or less happy with this too [1], but I think we can do  
> better. With UTF-8 in strings the two abstractions (codepoints,  
> encodings) are too entangled for my taste. In rigour you cannot use  
> the standard string operations.

Yes, not all of them.

> I mean you can but must fiddle with  
> the encodings i.e. you are not searching for a codepoint but for a  
> particular encoding. Instead I want to be able to write things like
> 
> for I in Str'Range loop
>     if Str (I) = Euro_Sign then ...
> end loop;
>
> I cannot do that with UTF-8 in strings.

I do it this way:

declare
   Index : Integer := Str'First;
   Value : UTF8_Code_Point;  
begin
   while Index <= Str'Last loop
      Get (Str, Index, Value);
      if Euro_Sign then ...
   end loop;

Actually if Ada had abstract array interfaces and inheritance we could have
it in exactly the form you wrote it. Alas.

Note that the pattern you refer is beyond just Unicode issues. Exactly the
same problem exists in pattern matching:

while Index <= Str'Last loop
    if Match (Str, Index, Pattern) then ...
end loop;

Basically it is a stream interface to strings with an ability to roll it
back or, equivalently, to look ahead.

> Note that Wide_Wide_String is  
> of little help here, because of the endianess issue. But it might be  
> a good idea to base Unico on Wide_Wide_String for closeness to the  
> standard.

I prefer general solutions, like array interfaces. You have an opaque
object. Add an array interface to it, which would return code points or
Wide_x_100_Character or whatever you want. Here you are.

> [1] What makes me happy about UTF-8 is that it seems to have become a  
> de facto default, common denominator encoding.

Long live Linux! (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



  reply	other threads:[~2006-07-25 13:36 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <EBEKJMEEPPFAACCBBGNHAELNDIAA.randy@rrsoftware.com>
2006-07-25 10:31 ` System.WCh_Cnv Marius Amado-Alves
2006-07-25 12:21   ` System.WCh_Cnv Dmitry A. Kazakov
2006-07-25 13:03     ` System.WCh_Cnv Marius Amado-Alves
2006-07-25 13:36       ` Dmitry A. Kazakov [this message]
2006-07-25 14:09       ` System.WCh_Cnv Georg Bauhaus
     [not found] <8BB3B99E-16DA-4EBF-A2FE-50B079349CA9@amado-alves.info>
2006-07-25  0:45 ` System.WCh_Cnv Marius Amado-Alves
2006-07-12 14:13 System.WCh_Cnv Y.Tomino
2006-07-12 15:51 ` System.WCh_Cnv Martin Krischik
2006-07-12 18:57   ` System.WCh_Cnv Björn Persson
2006-07-13 17:24   ` System.WCh_Cnv demoonlit
2006-07-13 21:30     ` System.WCh_Cnv Björn Persson
2006-07-14  7:19       ` System.WCh_Cnv Dmitry A. Kazakov
2006-07-14  7:40       ` System.WCh_Cnv Martin Krischik
2006-07-14 12:18         ` System.WCh_Cnv Björn Persson
2006-07-16 11:41           ` System.WCh_Cnv Martin Krischik
2006-07-24 21:00             ` System.WCh_Cnv Björn Persson
2006-07-24 23:35               ` System.WCh_Cnv Randy Brukardt
2006-07-25  0:45                 ` System.WCh_Cnv Marius Amado-Alves
2006-07-14 16:13         ` System.WCh_Cnv Georg Bauhaus
2006-07-12 18:57 ` System.WCh_Cnv Björn Persson
2006-07-13 17:34   ` System.WCh_Cnv demoonlit
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox