comp.lang.ada
 help / color / mirror / Atom feed
From: "Robert C. Leif" <rleif@rleif.com>
Subject: RE: Character Sets (plain text police report)
Date: Fri, 29 Nov 2002 12:37:26 -0800
Date: 2002-11-29T12:37:26-08:00	[thread overview]
Message-ID: <mailman.1038602282.10532.comp.lang.ada@ada.eu.org> (raw)
In-Reply-To: <3DE65BB7.5010505@cogeco.ca>

Oops. My apologies.
Bob Leif
The correct text version is below. 
Addendum: The solution is the creation of versions of Ada.Strings.Bounded for 16 and 32 bit characters. The 32 bit Unicode characters allow direct comparison of characters based on their position in Unicode.

-----Original Message-----
From: comp.lang.ada-admin@ada.eu.org [mailto:comp.lang.ada-admin@ada.eu.org] On Behalf Of Warren W. Gay VE3WWG
Sent: Thursday, November 28, 2002 10:09 AM
To: comp.lang.ada@ada.eu.org
Subject: Re: Character Sets (plain text police report)

Hmmm... I guess since Robert Dewar is avoiding this group these
days, we also lost our "plain text" police force ;-)

In case you were not aware of it, you are posting HTML to this
news group. This is generally discouraged so that others who
are not using HTML capable news readers, are still able to make
sense of your posting.
--------------------------------------------------------
Christoph Grein responded to my inquiry by stating that,
" Latin_9.Euro_Sign is a name for a character. The same character in Latin_1 has a different name, it is the Currency_Sign." "So why do you expect this character not to be in the set only because you use a different name for it?" The Euro_Sign and the Currency_Sign have a different representation according to The ISO 8859 Alphabet Soup http://czyborra.com/charsets/iso8859.html
------------------------------------------------
GNAT Latin_9 (ISO-8859-15)includes the following:
   -- Summary of Changes from Latin-1 => Latin-9 --
   ------------------------------------------------

   --   164     Currency                => Euro_Sign
   --   166     Broken_Bar              => UC_S_Caron
   --   168     Diaeresis               => LC_S_Caron
   --   180     Acute                   => UC_Z_Caron
   --   184     Cedilla                 => LC_Z_Caron
   --   188     Fraction_One_Quarter    => UC_Ligature_OE
   --   189     Fraction_One_Half       => LC_Ligature_OE
   --   190     Fraction_Three_Quarters => UC_Y_Diaeresis
Since these are changes, they should not be the same character. Below are the results of an extension of my original program that now tests the characters of Latin_9 from character number 164 through 190 and prints them out. I understand that choice of the Windows font will change their representation. The correct glyphs can be found at The ISO 8859 Alphabet Soup. For anyone interested, I have put my program at the end of this note. I suspect that the best solution would be to introduce UniCode, ISO/IEC 10646, into the Ada standard. The arguments for this are contained in W3C Character Model for the World Wide Web 1.0, W3C Working Draft 30 April 2002 http://www.w3.org/TR/charmod/ "The choice of Unicode was motivated by the fact that Unicode: is the only universal character repertoire available, covers the widest possible range, provides a way of referencing characters independent of the encoding of a resource, is being updated/completed carefully, is widely accepted and implemented by industry." "W3C adopted Unicode as the document character set for HTML in [HTML 4.0]. The same approach was later used for specifications such as XML 1.0 [XML 1.0] and CSS2 [CSS2]. Unicode now serves as a common reference for W3C specifications and applications." "The IETF has adopted some policies on the use of character sets on the Internet (see [RFC 2277])." Bob Leif ------------------------Starting Test----------------------- Latin_9_Diff is ñѪº¿⌐¬½¼¡«»░▒▓│┤╡╢╖╕╣║╗╝╜╛

The Character ñ is in Latin_1 is TRUE. Its position is  164
The Character Ñ is in Latin_1 is TRUE. Its position is  165
The Character ª is in Latin_1 is TRUE. Its position is  166
The Character º is in Latin_1 is TRUE. Its position is  167
The Character ¿ is in Latin_1 is TRUE. Its position is  168
The Character ⌐ is in Latin_1 is TRUE. Its position is  169
The Character ¬ is in Latin_1 is TRUE. Its position is  170
The Character ½ is in Latin_1 is TRUE. Its position is  171
The Character ¼ is in Latin_1 is TRUE. Its position is  172
The Character ¡ is in Latin_1 is TRUE. Its position is  173
The Character « is in Latin_1 is TRUE. Its position is  174
The Character » is in Latin_1 is TRUE. Its position is  175
The Character ░ is in Latin_1 is TRUE. Its position is  176
The Character ▒ is in Latin_1 is TRUE. Its position is  177
The Character ▓ is in Latin_1 is TRUE. Its position is  178
The Character │ is in Latin_1 is TRUE. Its position is  179
The Character ┤ is in Latin_1 is TRUE. Its position is  180
The Character ╡ is in Latin_1 is TRUE. Its position is  181
The Character ╢ is in Latin_1 is TRUE. Its position is  182
The Character ╖ is in Latin_1 is TRUE. Its position is  183
The Character ╕ is in Latin_1 is TRUE. Its position is  184
The Character ╣ is in Latin_1 is TRUE. Its position is  185
The Character ║ is in Latin_1 is TRUE. Its position is  186
The Character ╗ is in Latin_1 is TRUE. Its position is  187
The Character ╝ is in Latin_1 is TRUE. Its position is  188
The Character ╜ is in Latin_1 is TRUE. Its position is  189
The Character ╛ is in Latin_1 is TRUE. Its position is  190 ------------------------Ending Test----------------------- --Robert C. Leif, Ph.D & Ada_Med Copyright all rights reserved. --Main Procedure 
--Created 27 November 2002
with Ada.Text_Io;
with Ada.Io_Exceptions;
with Ada.Exceptions;
with Ada.Strings;
with Ada.Strings.Maps;
with  Ada.Characters.Latin_1;
with  Ada.Characters.Latin_9;
procedure Char_Sets_Test is 
   ------------------Table of Contents------------- 
   package T_Io renames Ada.Text_Io;
   package Str_Maps renames Ada.Strings.Maps;
   package Latin_1 renames Ada.Characters.Latin_1;
   package Latin_9 renames Ada.Characters.Latin_9;
   subtype Character_Set_Type is Str_Maps.Character_Set;
   subtype Character_Sequence_Type is Str_Maps.Character_Sequence;

   -----------------End Table of Contents-------------
   Latin_1_Range    : constant Str_Maps.Character_Range
      := (Low => Latin_1.Nul, High => Latin_1.Lc_Y_Diaeresis);  
   Latin_1_Char_Set :          Character_Set_Type      
      := Str_Maps.To_Set (Span => Latin_1_Range);  
   --Standard for Ada '95
   -- Latin_9 Differences: Euro_Sign, Uc_S_Caron, Lc_S_Caron, Uc_Z_Caron, 
   -- Lc_Z_Caron, Uc_Ligature_Oe, Lc_Ligature_Oe, Uc_Y_Diaeresis.
   Latin_9_Diff_Latin_1_Super_Range  : constant Str_Maps.Character_Range
      := (Low => Latin_9.Euro_Sign, High => Latin_9.Uc_Y_Diaeresis);  
   Latin_9_Diff_Latin_1_Super_Set    :          Character_Set_Type      
      := Str_Maps.To_Set (Span => Latin_9_Diff_Latin_1_Super_Range);  
   Latin_9_Diff_Latin_1_Super_String :          Character_Sequence_Type 
      := Str_Maps.To_Sequence (Latin_9_Diff_Latin_1_Super_Set);  
   Character_Set_Name                :          String                 
      := "Latin_1";  
   ---------------------------------------------   
   procedure Test_Character_Sets (
         Character_Sequence_Var : in     Character_Sequence_Type; 
         Set                    : in     Character_Set_Type       ) is 
      Is_In_Character_Set : Boolean   := False;  
      Char                : Character := 'X';  
      Character_Set_Position : Positive := 164; -- Euro_Sign   
   begin--Test_Character_Sets
      T_Io.Put_Line("Latin_9_Diff is " & Latin_9_Diff_Latin_1_Super_String);
      T_Io.Put_Line("");
      Test_Chars:
         for I in Character_Sequence_Var'range loop
         Char:= Character_Sequence_Var(I);
         Is_In_Character_Set:= Str_Maps.Is_In(
            Element => Char,            
            Set     => Latin_1_Char_Set);
         T_Io.Put_Line("The Character " & Char & " is in " & Character_Set_Name
            &  " is " & Boolean'Image (
               Is_In_Character_Set) & ". Its position is "
                  & Positive'Image(Character_Set_Position));
         Character_Set_Position:= Character_Set_Position + 1;
      end loop Test_Chars;
   end Test_Character_Sets;
   ---------------------------------------------     
begin--Bd_W_Char_Sets_Test
   T_Io.Put_Line("----------------------Starting Test---------------------);
   Test_Character_Sets (
      Character_Sequence_Var => Latin_9_Diff_Latin_1_Super_String, 
      Set                    => Latin_1_Char_Set);
   ---------------------------------------------
   T_Io.Put_Line("------------------------Ending Test---------------------);

exception
   when A: Ada.Io_Exceptions.Status_Error =>
      T_Io.Put_Line("Status_Error in Char_Sets_Test.");
      T_Io.Put_Line(Ada.Exceptions.Exception_Information(A));
   when O: others =>
      T_Io.Put_Line("Others_Error in Char_Sets_Test.");
      T_Io.Put_Line(Ada.Exceptions.Exception_Information(O));

end Char_Sets_Test;




  parent reply	other threads:[~2002-11-29 20:37 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-28 17:53 Character Sets Robert C. Leif
2002-11-28 18:08 ` Character Sets (plain text police report) Warren W. Gay VE3WWG
2002-11-28 18:11   ` Warren W. Gay VE3WWG
2002-11-29 11:12     ` Lutz Donnerhacke
2002-11-29 14:58       ` Frank J. Lhota
2002-11-29 20:37   ` Robert C. Leif [this message]
2002-11-30 14:49     ` Marin David Condic
2002-12-01 11:28       ` Jacob Sparre Andersen
2002-12-01 14:38         ` Marin David Condic
2002-12-01 20:25           ` Jacob Sparre Andersen
2002-12-02  9:43             ` Preben Randhol
2002-12-02 13:26               ` Marin David Condic
2002-12-02  6:44           ` Robert C. Leif
2002-12-02  9:41           ` Preben Randhol
2002-12-02 16:58           ` Charles Lindsey
2002-12-02 19:29     ` A suggestion, completely unrelated to the original topic Wes Groleau
2002-12-02 23:21       ` David C. Hoos, Sr.
2002-11-29 12:28 ` Character Sets Georg Bauhaus
2002-12-02 18:28 ` Stephen Leake
2002-12-03  2:45   ` Robert C. Leif
2002-12-03 13:33     ` Robert A Duff
2002-12-03 15:32       ` Juanma Barranquero
2002-12-04  0:49       ` Robert C. Leif
2002-12-14  3:27         ` David Starner
2002-12-14 22:53           ` Vadim Godunko
2002-12-15  3:46             ` David Starner
2002-12-15 23:26             ` Robert C. Leif
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox