comp.lang.ada
 help / color / mirror / Atom feed
* IBM 437 encoded String to UTF-16 Wide_String
@ 2012-11-27 21:02 gautier_niouzes
  2012-11-27 21:38 ` J-P. Rosen
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: gautier_niouzes @ 2012-11-27 21:02 UTC (permalink / raw)


Hello!

I'm looking for a IBM 437 encoded String [1] to UTF-16 Wide_String conversion.

In the Ada 2012 standard (I found the very useful UTF-8 <-> UTF-16 conversions) ?
In GNAT ?
In some open-source package ?

TIA

[1]: http://en.wikipedia.org/wiki/Code_page_437
_________________________ 
Gautier's Ada programming 
http://gautiersblog.blogspot.com/search/label/Ada 
NB: follow the above link for a valid e-mail address 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-27 21:02 IBM 437 encoded String to UTF-16 Wide_String gautier_niouzes
@ 2012-11-27 21:38 ` J-P. Rosen
  2012-11-27 22:12 ` gautier_niouzes
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: J-P. Rosen @ 2012-11-27 21:38 UTC (permalink / raw)


Le 27/11/2012 22:02, gautier_niouzes@hotmail.com a �crit :
> I'm looking for a IBM 437 encoded String [1] to UTF-16 Wide_String conversion.
That doesn't look possible, since IBM 437 is a character set (a mapping
from code points to glyphs) while UTF-16 is an encoding scheme (a way to
compress all 10646 code points on 16 (or sometimes 32) bits values.

For an explanation of these strange terms, refer to the discussion
section of  AI05-0137-2/03.
-- 
J-P. Rosen
Adalog
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00
http://www.adalog.fr



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-27 21:02 IBM 437 encoded String to UTF-16 Wide_String gautier_niouzes
  2012-11-27 21:38 ` J-P. Rosen
@ 2012-11-27 22:12 ` gautier_niouzes
  2012-11-27 22:14 ` Dmitry A. Kazakov
  2012-11-27 23:41 ` Vadim Godunko
  3 siblings, 0 replies; 22+ messages in thread
From: gautier_niouzes @ 2012-11-27 22:12 UTC (permalink / raw)


> I'm looking for a IBM 437 encoded String [1] to UTF-16 Wide_String conversion.

Found! It was just a question of reading carefully the Wikipedia page:

> [1]: http://en.wikipedia.org/wiki/Code_page_437

More specifically, the reference #8:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-27 21:02 IBM 437 encoded String to UTF-16 Wide_String gautier_niouzes
  2012-11-27 21:38 ` J-P. Rosen
  2012-11-27 22:12 ` gautier_niouzes
@ 2012-11-27 22:14 ` Dmitry A. Kazakov
  2012-11-27 23:13   ` gautier_niouzes
  2012-11-27 23:41 ` Vadim Godunko
  3 siblings, 1 reply; 22+ messages in thread
From: Dmitry A. Kazakov @ 2012-11-27 22:14 UTC (permalink / raw)


On Tue, 27 Nov 2012 13:02:05 -0800 (PST), gautier_niouzes@hotmail.com
wrote:

> I'm looking for a IBM 437 encoded String [1] to UTF-16 Wide_String conversion.
> 
> In the Ada 2012 standard (I found the very useful UTF-8 <-> UTF-16 conversions) ?
> In GNAT ?
> In some open-source package ?
> TIA
> 
> [1]: http://en.wikipedia.org/wiki/Code_page_437

What about

   type Map is array (Character) of Wide_Character :=
      (  Wide_Character'Val (0), Wide_Character'Val (1),
         <other values from the wiki page> );

It is not a big deal to type 256 values. Half of them (0..127) are literals
corresponding to 7-bit ASCII.

That would give you UCS-2. I presume that UTF-16 is not needed in this
case.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-27 22:14 ` Dmitry A. Kazakov
@ 2012-11-27 23:13   ` gautier_niouzes
  0 siblings, 0 replies; 22+ messages in thread
From: gautier_niouzes @ 2012-11-27 23:13 UTC (permalink / raw)
  Cc: mailbox

Le mardi 27 novembre 2012 23:14:22 UTC+1, Dmitry A. Kazakov a écrit :

> It is not a big deal to type 256 values. Half of them (0..127) are literals
> corresponding to 7-bit ASCII.

Type ? I'm too lazy for that :-).
I've put this:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT
(from the Wiki page) into Excel, and abracadabra, it became that:

http://sf.net/p/azip/code/69/tree//trunk/gui_common/azip_common.adb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-27 21:02 IBM 437 encoded String to UTF-16 Wide_String gautier_niouzes
                   ` (2 preceding siblings ...)
  2012-11-27 22:14 ` Dmitry A. Kazakov
@ 2012-11-27 23:41 ` Vadim Godunko
  2012-11-28  8:34   ` briot.emmanuel
  3 siblings, 1 reply; 22+ messages in thread
From: Vadim Godunko @ 2012-11-27 23:41 UTC (permalink / raw)


On Wednesday, November 28, 2012 1:02:05 AM UTC+4, gautier...@hotmail.com wrote:
> 
> I'm looking for a IBM 437 encoded String [1] to UTF-16 Wide_String conversion.
> 
> In some open-source package ?
> 
Matreshka includes text codecs to convert text data between different encoding, see

http://forge.ada-ru.org/matreshka/wiki/League/TextCodec



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-27 23:41 ` Vadim Godunko
@ 2012-11-28  8:34   ` briot.emmanuel
  2012-11-28  8:52     ` Dmitry A. Kazakov
  2012-11-28 13:51     ` gautier_niouzes
  0 siblings, 2 replies; 22+ messages in thread
From: briot.emmanuel @ 2012-11-28  8:34 UTC (permalink / raw)



XML/Ada also has a few conversion packages, but not IBM 437.
I think the most convenient here would be to create a small binding to the iconv library. I believe it exists on most systems, although with slightly different interfaces. And it supports a huge number of encodings.
You basically need to bind three functions ("open", "iconv" and "close")



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28  8:34   ` briot.emmanuel
@ 2012-11-28  8:52     ` Dmitry A. Kazakov
  2012-11-28  9:43       ` Georg Bauhaus
  2012-11-28 13:51     ` gautier_niouzes
  1 sibling, 1 reply; 22+ messages in thread
From: Dmitry A. Kazakov @ 2012-11-28  8:52 UTC (permalink / raw)


On Wed, 28 Nov 2012 00:34:53 -0800 (PST), briot.emmanuel@gmail.com wrote:

> I think the most convenient here would be to create a small binding to the
> iconv library. I believe it exists on most systems, although with slightly
> different interfaces. And it supports a huge number of encodings.

No. IMO the most convenient way would be to fix the language in order to
have Wide_Wide_String'Class of which String, Wide_String, Wide_Wide_String,
UTF8_String etc were members.

Encoding is nothing but an instance of Wide_Wide_String'Class implementing
the interface of an array of code units. In the case of IBM 437 it is
something like:

   type IBM_437_String is
      new Wide_Wide_String  -- Logical view, string of code points
      and array (Positive range <>) of Byte; -- Presentation view

Conversions if ever needed, would be type/view conversions.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28  8:52     ` Dmitry A. Kazakov
@ 2012-11-28  9:43       ` Georg Bauhaus
  2012-11-28  9:58         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 22+ messages in thread
From: Georg Bauhaus @ 2012-11-28  9:43 UTC (permalink / raw)


On 28.11.12 09:52, Dmitry A. Kazakov wrote:
> No. IMO the most convenient way would be to fix the language in order to
> have Wide_Wide_String'Class of which String, Wide_String, Wide_Wide_String,
> UTF8_String etc were members.

If there isn't anything special about {Wide_}Character,
a Vector of Character might be an alternative, though hated
by haters of generics, I should think.

As a practical alternative, why not add a generalized
std::valarray<type T = (<>)> to the language instead
of fixing it?




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28  9:43       ` Georg Bauhaus
@ 2012-11-28  9:58         ` Dmitry A. Kazakov
  2012-11-28 11:31           ` Georg Bauhaus
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry A. Kazakov @ 2012-11-28  9:58 UTC (permalink / raw)


On Wed, 28 Nov 2012 10:43:52 +0100, Georg Bauhaus wrote:

> On 28.11.12 09:52, Dmitry A. Kazakov wrote:
>> No. IMO the most convenient way would be to fix the language in order to
>> have Wide_Wide_String'Class of which String, Wide_String, Wide_Wide_String,
>> UTF8_String etc were members.
> 
> If there isn't anything special about {Wide_}Character,
> a Vector of Character might be an alternative,

No. When I mentioned Wide_Wide_String I meant an array of code points. The
logical view of *any* string type is array of code points. The only
difference between different string types is in the constraints put on the
code points. E.g. String has code points 0 to 255. IBM_437_String would
have a non-contiguous set of code points etc.

> though hated by haters of generics, I should think.

I don't see how generics are relevant here. All strings are arrays of code
points. They all belong to this class. No explicit conversions should be
needed between them.

> As a practical alternative, why not add a generalized
> std::valarray<type T = (<>)> to the language instead
> of fixing it?

No idea what this is supposed to mean.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28  9:58         ` Dmitry A. Kazakov
@ 2012-11-28 11:31           ` Georg Bauhaus
  2012-11-28 13:36             ` Dmitry A. Kazakov
  2012-11-29  3:18             ` Randy Brukardt
  0 siblings, 2 replies; 22+ messages in thread
From: Georg Bauhaus @ 2012-11-28 11:31 UTC (permalink / raw)


On 28.11.12 10:58, Dmitry A. Kazakov wrote:
>  When I mentioned Wide_Wide_String I meant an array of code points. The
> logical view of *any* string type is array of code points. The only
> difference between different string types is in the constraints put on the
> code points. E.g. String has code points 0 to 255. IBM_437_String would
> have a non-contiguous set of code points etc.

What is a non-contiguous set?

In case of differentiation by sets of code points, I'd rather
have an honest type Unicode_String and---if we are already
fixing the language---put everything that has {Wide_}String
in its name in Annex J.

But then, consider

    type Index is range 1 .. 12;

    type R is ('I', 'V', 'X', 'L', 'C', 'D', 'M');

    type N is array (Index range <>) of R;

A string of R, named N here, is just fine. In fact,

    Year : constant N := "MCMLXXXIII";

has a valid literal, and the year so written is not of any of
the standard string types. The definition of type R actually
implies a codespace, and, for example, Character'('V') or
Wide_Character'('V') have no role in it, irrespective of
any accidental overlap in encoding or representation or
position.

So, which by force should type N be in Whatever_String'Class?

>> As a practical alternative, why not add a generalized
>> std::valarray<type T = (<>)> to the language instead
>> of fixing it?
>
> No idea what this is supposed to mean.

Call it

generic
    type Element_Type is ...
    type Index_Type is ...
package Ada.Containers.Tuples is
    ...

and make it have standard container operations, extended as needed.
The idea is that if Element_Type is ordered scalars, and if container
operations provide for writing algorithms efficiently, then
that's a more practical way of having strings of anything
than, say, finally removing "tagged" from the language and make
every type be in some 'Class.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28 11:31           ` Georg Bauhaus
@ 2012-11-28 13:36             ` Dmitry A. Kazakov
  2012-11-28 13:47               ` Georg Bauhaus
  2012-11-29  3:18             ` Randy Brukardt
  1 sibling, 1 reply; 22+ messages in thread
From: Dmitry A. Kazakov @ 2012-11-28 13:36 UTC (permalink / raw)


On Wed, 28 Nov 2012 12:31:35 +0100, Georg Bauhaus wrote:

> On 28.11.12 10:58, Dmitry A. Kazakov wrote:
>>  When I mentioned Wide_Wide_String I meant an array of code points. The
>> logical view of *any* string type is array of code points. The only
>> difference between different string types is in the constraints put on the
>> code points. E.g. String has code points 0 to 255. IBM_437_String would
>> have a non-contiguous set of code points etc.
> 
> What is a non-contiguous set?

A convex set in this case, i.e.:

for code points x,y,z, such that x<y<z if x,z in S then y in S

> In case of differentiation by sets of code points, I'd rather
> have an honest type Unicode_String and---if we are already
> fixing the language---put everything that has {Wide_}String
> in its name in Annex J.
> 
> But then, consider
> 
>     type Index is range 1 .. 12;
> 
>     type R is ('I', 'V', 'X', 'L', 'C', 'D', 'M');
> 
>     type N is array (Index range <>) of R;
> 
> A string of R, named N here, is just fine. In fact,
> 
>     Year : constant N := "MCMLXXXIII";
> 
> has a valid literal, and the year so written is not of any of
> the standard string types. The definition of type R actually
> implies a codespace, and, for example, Character'('V') or
> Wide_Character'('V') have no role in it, irrespective of
> any accidental overlap in encoding or representation or
> position.
> 
> So, which by force should type N be in Whatever_String'Class?

Per inheritance:

   type N is
      new Wide_Wide_String
      and array (...) of R;

>>> As a practical alternative, why not add a generalized
>>> std::valarray<type T = (<>)> to the language instead
>>> of fixing it?
>>
>> No idea what this is supposed to mean.
> 
> Call it
> 
> generic
>     type Element_Type is ...
>     type Index_Type is ...
> package Ada.Containers.Tuples is
>     ...
> 
> and make it have standard container operations, extended as needed.

It does not solve anything. The problem is not construction of a container
type. It is the relation of the obtained type to the string interface. The
string interface is an array of code points. The container must implement
this interface in order to be a string. All strings must implement this
interface, this is why they are called "strings."

> The idea is that if Element_Type is ordered scalars, and if container
> operations provide for writing algorithms efficiently, then
> that's a more practical way of having strings of anything
> than, say, finally removing "tagged" from the language and make
> every type be in some 'Class.

Every type is in more than just one class, trivially.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28 13:36             ` Dmitry A. Kazakov
@ 2012-11-28 13:47               ` Georg Bauhaus
  2012-11-28 14:23                 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 22+ messages in thread
From: Georg Bauhaus @ 2012-11-28 13:47 UTC (permalink / raw)


On 28.11.12 14:36, Dmitry A. Kazakov wrote:
> The problem is not construction of a container
> type. It is the relation of the obtained type to the string interface. The
> string interface is an array of code points. The container must implement
> this interface in order to be a string. All strings must implement this
> interface, this is why they are called "strings."

This says that a string interface consists of operations
that allow us to use string objects like one uses arrays.

Is this set of array ops not included in a Vector's interface,
or in a Tuple's interface, provided the formal Element_Type
requires the properties of "code points"?

Which algorithms require a String_Interface that excludes
other array/vector operations?




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28  8:34   ` briot.emmanuel
  2012-11-28  8:52     ` Dmitry A. Kazakov
@ 2012-11-28 13:51     ` gautier_niouzes
  1 sibling, 0 replies; 22+ messages in thread
From: gautier_niouzes @ 2012-11-28 13:51 UTC (permalink / raw)


Le mercredi 28 novembre 2012 09:34:53 UTC+1, briot.e...@gmail.com a écrit :
> XML/Ada also has a few conversion packages, but not IBM 437.
> 
> I think the most convenient here would be to create a small binding to the iconv library. I believe it exists on most systems, although with slightly different interfaces. And it supports a huge number of encodings.
> 
> You basically need to bind three functions ("open", "iconv" and "close")

Thanks for that, it could be useful in another project.
In this case (Zip archives) there are two entry name encodings: UTF-8 and IBM 437. Now that latter is covered too, with a simple constant array(Character) of Wide_Character. No need to look further...
G.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28 13:47               ` Georg Bauhaus
@ 2012-11-28 14:23                 ` Dmitry A. Kazakov
  2012-11-28 17:35                   ` Georg Bauhaus
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry A. Kazakov @ 2012-11-28 14:23 UTC (permalink / raw)


On Wed, 28 Nov 2012 14:47:10 +0100, Georg Bauhaus wrote:

> On 28.11.12 14:36, Dmitry A. Kazakov wrote:
>> The problem is not construction of a container
>> type. It is the relation of the obtained type to the string interface. The
>> string interface is an array of code points. The container must implement
>> this interface in order to be a string. All strings must implement this
>> interface, this is why they are called "strings."
> 
> This says that a string interface consists of operations
> that allow us to use string objects like one uses arrays.

It say that instance implementing the interface are substitutable where a
string is expected. You should be able to pass IBM_437_String to Put_Line,
Trim, To_Lower etc.

> Is this set of array ops not included in a Vector's interface,

By which means the language or the reader knows if it is?

Ada had manifesting type system, so far...

> Which algorithms require a String_Interface that excludes
> other array/vector operations?

?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28 14:23                 ` Dmitry A. Kazakov
@ 2012-11-28 17:35                   ` Georg Bauhaus
  2012-11-28 18:00                     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 22+ messages in thread
From: Georg Bauhaus @ 2012-11-28 17:35 UTC (permalink / raw)


On 28.11.12 15:23, Dmitry A. Kazakov wrote:
> On Wed, 28 Nov 2012 14:47:10 +0100, Georg Bauhaus wrote:
> 
>> On 28.11.12 14:36, Dmitry A. Kazakov wrote:
>>> The problem is not construction of a container
>>> type. It is the relation of the obtained type to the string interface. The
>>> string interface is an array of code points. The container must implement
>>> this interface in order to be a string. All strings must implement this
>>> interface, this is why they are called "strings."
>>
>> This says that a string interface consists of operations
>> that allow us to use string objects like one uses arrays.
> 
> It say that instance implementing the interface are substitutable where a
> string is expected. You should be able to pass IBM_437_String to Put_Line,
> Trim, To_Lower etc.
> 
>> Is this set of array ops not included in a Vector's interface,
> 
> By which means the language or the reader knows if it is?

This is why I mentioned generics. It is not nice, but lets the
reader see the expected interface:

   generic
      type Apple is (<>);

      with package V is
        new Ada.Containers.Vectors (Element_Type => Apple,
                                    others => <>);
   package String_Ops is

      type String is new V.Vector with null record;

      Pattern_Error : exception;

      procedure Put_Line (Source : String);  -- not normally here

      function Slice (Source : String;
                      Low    : V.Index_Type;
                      High   : V.Extended_Index) return String;

      function Index (Source, Pattern : String) return V.Extended_Index;

   end String_Ops;

Or even

   generic
      type Element_Type is ...
   package Ada.Containers.Strings is ...

I don't prefer these types of generics to an improved type system,
one that is less complex and less full of historical reasons.
But since Ada is not going to get this sort of type system...

> Ada had manifesting type system, so far...
> 
>> Which algorithms require a String_Interface that excludes
>> other array/vector operations?
> 
> ?

That is, is there a set difference between string operations
and "vector" operations such that, from a user's perspective,
nothing could turn vectors into objects of type String, or
Wide_String, or Wide_Wide_String?




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28 17:35                   ` Georg Bauhaus
@ 2012-11-28 18:00                     ` Dmitry A. Kazakov
  2012-11-29  9:51                       ` Georg Bauhaus
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry A. Kazakov @ 2012-11-28 18:00 UTC (permalink / raw)


On Wed, 28 Nov 2012 18:35:06 +0100, Georg Bauhaus wrote:

> On 28.11.12 15:23, Dmitry A. Kazakov wrote:
>> On Wed, 28 Nov 2012 14:47:10 +0100, Georg Bauhaus wrote:
>> 
>>> On 28.11.12 14:36, Dmitry A. Kazakov wrote:
>>>> The problem is not construction of a container
>>>> type. It is the relation of the obtained type to the string interface. The
>>>> string interface is an array of code points. The container must implement
>>>> this interface in order to be a string. All strings must implement this
>>>> interface, this is why they are called "strings."
>>>
>>> This says that a string interface consists of operations
>>> that allow us to use string objects like one uses arrays.
>> 
>> It say that instance implementing the interface are substitutable where a
>> string is expected. You should be able to pass IBM_437_String to Put_Line,
>> Trim, To_Lower etc.
>> 
>>> Is this set of array ops not included in a Vector's interface,
>> 
>> By which means the language or the reader knows if it is?
> 
> This is why I mentioned generics. It is not nice,

Nice is a wrong word here. It just does not solve the problem, which is
making string types strings.

> but lets the
> reader see the expected interface:

He could see that from hexadecimal code as well, couldn't he?
 
> But since Ada is not going to get this sort of type system...

... then it should be untyped and reader should deduce types from observed
program behavior, right?

>> Ada had manifesting type system, so far...
>> 
>>> Which algorithms require a String_Interface that excludes
>>> other array/vector operations?
>> 
>> ?
> 
> That is, is there a set difference between string operations
> and "vector" operations such that, from a user's perspective,
> nothing could turn vectors into objects of type String, or
> Wide_String, or Wide_Wide_String?

Irrelevant, so far Ada's type system was nominal. T1 /= T2 even if there is
no difference between them in terms you described.

You are thinking in terms of implementations while talking about contracts
(interfaces). This is utterly wrong. A string can have any implementation.
The set of operations is determined by the contract, not otherwise.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28 11:31           ` Georg Bauhaus
  2012-11-28 13:36             ` Dmitry A. Kazakov
@ 2012-11-29  3:18             ` Randy Brukardt
  1 sibling, 0 replies; 22+ messages in thread
From: Randy Brukardt @ 2012-11-29  3:18 UTC (permalink / raw)


"Georg Bauhaus" <rm.dash-bauhaus@futureapps.de> wrote in message 
news:50b5f60e$0$9524$9b4e6d93@newsspool1.arcor-online.net...
...
> In case of differentiation by sets of code points, I'd rather
> have an honest type Unicode_String and---if we are already
> fixing the language---put everything that has {Wide_}String
> in its name in Annex J.

That's rather what I would like to do, especially as trying to support 
Wide_Wide_String file names makes things into a hash. (Do we really want to 
have Wide_Wide_Open in Text_IO??).

The language already has almost everything needed to support 
Root_String'Class. Most of the missing capabilities center around getting 
string literals for such a type. We'd probably need to keep Wide_Wide_String 
around in order to provide a common interconversion format.

> But then, consider
>
>    type Index is range 1 .. 12;
>
>    type R is ('I', 'V', 'X', 'L', 'C', 'D', 'M');
>
>    type N is array (Index range <>) of R;
>
> A string of R, named N here, is just fine. In fact,
>
>    Year : constant N := "MCMLXXXIII";
>
> has a valid literal, and the year so written is not of any of
> the standard string types. The definition of type R actually
> implies a codespace, and, for example, Character'('V') or
> Wide_Character'('V') have no role in it, irrespective of
> any accidental overlap in encoding or representation or
> position.
>
> So, which by force should type N be in Whatever_String'Class?

N is not derived from Root_String'Class, and as such it couldn't be used 
with Put_Line (for one example). If you derived it from that type (possibly 
using a generic to fill in the operations), then of course you could. In 
that case, you'd have to provide (or let the generic provide) conversions to 
and from Unicode.

                                         Randy.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-28 18:00                     ` Dmitry A. Kazakov
@ 2012-11-29  9:51                       ` Georg Bauhaus
  2012-11-29 10:52                         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 22+ messages in thread
From: Georg Bauhaus @ 2012-11-29  9:51 UTC (permalink / raw)


On 28.11.12 19:00, Dmitry A. Kazakov wrote:

>> That is, is there a set difference between string operations
>> and "vector" operations such that, from a user's perspective,
>> nothing could turn vectors into objects of type String, or
>> Wide_String, or Wide_Wide_String?
>
> Irrelevant, so far Ada's type system was nominal. T1 /= T2 even if there is
> no difference between them in terms you described.

I have not wanted to suggest structural equivalence or duck typing.
Neither will generic formals allow circumventing name equivalence.
Matching works (at the not so recent Ada level at least, with GNAT).

It's just that

- if Ada does not get a complete apparatus for handling
   all string types in some Root_String'Class, and

- if I want my subprograms to work with different
   types of strings,

this particular kind of problem can be solved with the help of a
generic formal package that a user of my programs has instantiated,
substituting his or her types for its formals.  The formal contract
of the formal generic package describes a string type whatsoever and
this description is sufficient for the subprograms of my generic,
and is good enough for the compiler as well.

It is not nice, it works in one direction, and it does not cover
the same cases that Root_String'Class could cover, but it does
solve the problem of making one algorithm work with objects of
different types.

Propose something better, have it replace current language,
and I will switch.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-29  9:51                       ` Georg Bauhaus
@ 2012-11-29 10:52                         ` Dmitry A. Kazakov
  2012-11-29 14:05                           ` Georg Bauhaus
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry A. Kazakov @ 2012-11-29 10:52 UTC (permalink / raw)


On Thu, 29 Nov 2012 10:51:27 +0100, Georg Bauhaus wrote:

> It's just that
> 
> - if Ada does not get a complete apparatus for handling
>    all string types in some Root_String'Class, and

Bad
 
> - if I want my subprograms to work with different
>    types of strings,

type A is new B; -- Ada 83

> this particular kind of problem can be solved with the help of a
> generic formal package that a user of my programs has instantiated,

Egh, which problem? You say that if I don't want a car, a tin opener may
serve me instead. This kind of logic? First, I do want a car. Second, don't
eat canned food anyway.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-29 10:52                         ` Dmitry A. Kazakov
@ 2012-11-29 14:05                           ` Georg Bauhaus
  2012-11-29 21:03                             ` Dmitry A. Kazakov
  0 siblings, 1 reply; 22+ messages in thread
From: Georg Bauhaus @ 2012-11-29 14:05 UTC (permalink / raw)


On 29.11.12 11:52, Dmitry A. Kazakov wrote:
> On Thu, 29 Nov 2012 10:51:27 +0100, Georg Bauhaus wrote:
> 
>> It's just that
>>
>> - if Ada does not get a complete apparatus for handling
>>    all string types in some Root_String'Class, and
                                                  ^^^

> Bad
>  
>> - if I want my subprograms to work with different
>>    types of strings,
> 
> type A is new B; -- Ada 83

type A is new Root_String'Class;  -- not Ada

>> this particular kind of problem can be solved with the help of a
>> generic formal package that a user of my programs has instantiated,
> 
> Egh, which problem?

That of making a subprograms work with any type of
string in current Ada.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 437 encoded String to UTF-16 Wide_String
  2012-11-29 14:05                           ` Georg Bauhaus
@ 2012-11-29 21:03                             ` Dmitry A. Kazakov
  0 siblings, 0 replies; 22+ messages in thread
From: Dmitry A. Kazakov @ 2012-11-29 21:03 UTC (permalink / raw)


On Thu, 29 Nov 2012 15:05:19 +0100, Georg Bauhaus wrote:

> On 29.11.12 11:52, Dmitry A. Kazakov wrote:
>> On Thu, 29 Nov 2012 10:51:27 +0100, Georg Bauhaus wrote:
>> 
>>> - if I want my subprograms to work with different
>>>    types of strings,
>> 
>> type A is new B; -- Ada 83
> 
> type A is new Root_String'Class;  -- not Ada

Rather:

   type My_Incompatible_String is new String; -- Ada 83

Yes, Ada 95 broke that for tagged typed. Presently, the workaround is this:

   type Base is abstract tagged ...;
   -- Define operations to clone here

   type A is new Base with null record;
   type B is new Base with null record;

Now B has the semantics of cloned A.

   type B is new A;

In any case, you cannot clone classes, you should their root types instead.
That will have the effect of cloning the class.

>>> this particular kind of problem can be solved with the help of a
>>> generic formal package that a user of my programs has instantiated,
>> 
>> Egh, which problem?
> 
> That of making a subprograms work with any type of
> string in current Ada.

And how generic container library may help? Generic instances do not
comprise a run-time class, which is their major flaw.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-12-05  3:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-27 21:02 IBM 437 encoded String to UTF-16 Wide_String gautier_niouzes
2012-11-27 21:38 ` J-P. Rosen
2012-11-27 22:12 ` gautier_niouzes
2012-11-27 22:14 ` Dmitry A. Kazakov
2012-11-27 23:13   ` gautier_niouzes
2012-11-27 23:41 ` Vadim Godunko
2012-11-28  8:34   ` briot.emmanuel
2012-11-28  8:52     ` Dmitry A. Kazakov
2012-11-28  9:43       ` Georg Bauhaus
2012-11-28  9:58         ` Dmitry A. Kazakov
2012-11-28 11:31           ` Georg Bauhaus
2012-11-28 13:36             ` Dmitry A. Kazakov
2012-11-28 13:47               ` Georg Bauhaus
2012-11-28 14:23                 ` Dmitry A. Kazakov
2012-11-28 17:35                   ` Georg Bauhaus
2012-11-28 18:00                     ` Dmitry A. Kazakov
2012-11-29  9:51                       ` Georg Bauhaus
2012-11-29 10:52                         ` Dmitry A. Kazakov
2012-11-29 14:05                           ` Georg Bauhaus
2012-11-29 21:03                             ` Dmitry A. Kazakov
2012-11-29  3:18             ` Randy Brukardt
2012-11-28 13:51     ` gautier_niouzes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox