From: "Robert C. Leif" <rleif@rleif.com>
Subject: RE: Character Sets (plain text police report)
Date: Fri, 29 Nov 2002 12:37:26 -0800
Date: 2002-11-29T12:37:26-08:00 [thread overview]
Message-ID: <mailman.1038602282.10532.comp.lang.ada@ada.eu.org> (raw)
In-Reply-To: <3DE65BB7.5010505@cogeco.ca>
Oops. My apologies.
Bob Leif
The correct text version is below.
Addendum: The solution is the creation of versions of Ada.Strings.Bounded for 16 and 32 bit characters. The 32 bit Unicode characters allow direct comparison of characters based on their position in Unicode.
-----Original Message-----
From: comp.lang.ada-admin@ada.eu.org [mailto:comp.lang.ada-admin@ada.eu.org] On Behalf Of Warren W. Gay VE3WWG
Sent: Thursday, November 28, 2002 10:09 AM
To: comp.lang.ada@ada.eu.org
Subject: Re: Character Sets (plain text police report)
Hmmm... I guess since Robert Dewar is avoiding this group these
days, we also lost our "plain text" police force ;-)
In case you were not aware of it, you are posting HTML to this
news group. This is generally discouraged so that others who
are not using HTML capable news readers, are still able to make
sense of your posting.
--------------------------------------------------------
Christoph Grein responded to my inquiry by stating that,
" Latin_9.Euro_Sign is a name for a character. The same character in Latin_1 has a different name, it is the Currency_Sign." "So why do you expect this character not to be in the set only because you use a different name for it?" The Euro_Sign and the Currency_Sign have a different representation according to The ISO 8859 Alphabet Soup http://czyborra.com/charsets/iso8859.html
------------------------------------------------
GNAT Latin_9 (ISO-8859-15)includes the following:
-- Summary of Changes from Latin-1 => Latin-9 --
------------------------------------------------
-- 164 Currency => Euro_Sign
-- 166 Broken_Bar => UC_S_Caron
-- 168 Diaeresis => LC_S_Caron
-- 180 Acute => UC_Z_Caron
-- 184 Cedilla => LC_Z_Caron
-- 188 Fraction_One_Quarter => UC_Ligature_OE
-- 189 Fraction_One_Half => LC_Ligature_OE
-- 190 Fraction_Three_Quarters => UC_Y_Diaeresis
Since these are changes, they should not be the same character. Below are the results of an extension of my original program that now tests the characters of Latin_9 from character number 164 through 190 and prints them out. I understand that choice of the Windows font will change their representation. The correct glyphs can be found at The ISO 8859 Alphabet Soup. For anyone interested, I have put my program at the end of this note. I suspect that the best solution would be to introduce UniCode, ISO/IEC 10646, into the Ada standard. The arguments for this are contained in W3C Character Model for the World Wide Web 1.0, W3C Working Draft 30 April 2002 http://www.w3.org/TR/charmod/ "The choice of Unicode was motivated by the fact that Unicode: is the only universal character repertoire available, covers the widest possible range, provides a way of referencing characters independent of the encoding of a resource, is being updated/completed carefully, is widely accepted and implemented by industry." "W3C adopted Unicode as the document character set for HTML in [HTML 4.0]. The same approach was later used for specifications such as XML 1.0 [XML 1.0] and CSS2 [CSS2]. Unicode now serves as a common reference for W3C specifications and applications." "The IETF has adopted some policies on the use of character sets on the Internet (see [RFC 2277])." Bob Leif ------------------------Starting Test----------------------- Latin_9_Diff is ñѪº¿⌐¬½¼¡«»░▒▓│┤╡╢╖╕╣║╗╝╜╛
The Character ñ is in Latin_1 is TRUE. Its position is 164
The Character Ñ is in Latin_1 is TRUE. Its position is 165
The Character ª is in Latin_1 is TRUE. Its position is 166
The Character º is in Latin_1 is TRUE. Its position is 167
The Character ¿ is in Latin_1 is TRUE. Its position is 168
The Character ⌐ is in Latin_1 is TRUE. Its position is 169
The Character ¬ is in Latin_1 is TRUE. Its position is 170
The Character ½ is in Latin_1 is TRUE. Its position is 171
The Character ¼ is in Latin_1 is TRUE. Its position is 172
The Character ¡ is in Latin_1 is TRUE. Its position is 173
The Character « is in Latin_1 is TRUE. Its position is 174
The Character » is in Latin_1 is TRUE. Its position is 175
The Character ░ is in Latin_1 is TRUE. Its position is 176
The Character ▒ is in Latin_1 is TRUE. Its position is 177
The Character ▓ is in Latin_1 is TRUE. Its position is 178
The Character │ is in Latin_1 is TRUE. Its position is 179
The Character ┤ is in Latin_1 is TRUE. Its position is 180
The Character ╡ is in Latin_1 is TRUE. Its position is 181
The Character ╢ is in Latin_1 is TRUE. Its position is 182
The Character ╖ is in Latin_1 is TRUE. Its position is 183
The Character ╕ is in Latin_1 is TRUE. Its position is 184
The Character ╣ is in Latin_1 is TRUE. Its position is 185
The Character ║ is in Latin_1 is TRUE. Its position is 186
The Character ╗ is in Latin_1 is TRUE. Its position is 187
The Character ╝ is in Latin_1 is TRUE. Its position is 188
The Character ╜ is in Latin_1 is TRUE. Its position is 189
The Character ╛ is in Latin_1 is TRUE. Its position is 190 ------------------------Ending Test----------------------- --Robert C. Leif, Ph.D & Ada_Med Copyright all rights reserved. --Main Procedure
--Created 27 November 2002
with Ada.Text_Io;
with Ada.Io_Exceptions;
with Ada.Exceptions;
with Ada.Strings;
with Ada.Strings.Maps;
with Ada.Characters.Latin_1;
with Ada.Characters.Latin_9;
procedure Char_Sets_Test is
------------------Table of Contents-------------
package T_Io renames Ada.Text_Io;
package Str_Maps renames Ada.Strings.Maps;
package Latin_1 renames Ada.Characters.Latin_1;
package Latin_9 renames Ada.Characters.Latin_9;
subtype Character_Set_Type is Str_Maps.Character_Set;
subtype Character_Sequence_Type is Str_Maps.Character_Sequence;
-----------------End Table of Contents-------------
Latin_1_Range : constant Str_Maps.Character_Range
:= (Low => Latin_1.Nul, High => Latin_1.Lc_Y_Diaeresis);
Latin_1_Char_Set : Character_Set_Type
:= Str_Maps.To_Set (Span => Latin_1_Range);
--Standard for Ada '95
-- Latin_9 Differences: Euro_Sign, Uc_S_Caron, Lc_S_Caron, Uc_Z_Caron,
-- Lc_Z_Caron, Uc_Ligature_Oe, Lc_Ligature_Oe, Uc_Y_Diaeresis.
Latin_9_Diff_Latin_1_Super_Range : constant Str_Maps.Character_Range
:= (Low => Latin_9.Euro_Sign, High => Latin_9.Uc_Y_Diaeresis);
Latin_9_Diff_Latin_1_Super_Set : Character_Set_Type
:= Str_Maps.To_Set (Span => Latin_9_Diff_Latin_1_Super_Range);
Latin_9_Diff_Latin_1_Super_String : Character_Sequence_Type
:= Str_Maps.To_Sequence (Latin_9_Diff_Latin_1_Super_Set);
Character_Set_Name : String
:= "Latin_1";
---------------------------------------------
procedure Test_Character_Sets (
Character_Sequence_Var : in Character_Sequence_Type;
Set : in Character_Set_Type ) is
Is_In_Character_Set : Boolean := False;
Char : Character := 'X';
Character_Set_Position : Positive := 164; -- Euro_Sign
begin--Test_Character_Sets
T_Io.Put_Line("Latin_9_Diff is " & Latin_9_Diff_Latin_1_Super_String);
T_Io.Put_Line("");
Test_Chars:
for I in Character_Sequence_Var'range loop
Char:= Character_Sequence_Var(I);
Is_In_Character_Set:= Str_Maps.Is_In(
Element => Char,
Set => Latin_1_Char_Set);
T_Io.Put_Line("The Character " & Char & " is in " & Character_Set_Name
& " is " & Boolean'Image (
Is_In_Character_Set) & ". Its position is "
& Positive'Image(Character_Set_Position));
Character_Set_Position:= Character_Set_Position + 1;
end loop Test_Chars;
end Test_Character_Sets;
---------------------------------------------
begin--Bd_W_Char_Sets_Test
T_Io.Put_Line("----------------------Starting Test---------------------);
Test_Character_Sets (
Character_Sequence_Var => Latin_9_Diff_Latin_1_Super_String,
Set => Latin_1_Char_Set);
---------------------------------------------
T_Io.Put_Line("------------------------Ending Test---------------------);
exception
when A: Ada.Io_Exceptions.Status_Error =>
T_Io.Put_Line("Status_Error in Char_Sets_Test.");
T_Io.Put_Line(Ada.Exceptions.Exception_Information(A));
when O: others =>
T_Io.Put_Line("Others_Error in Char_Sets_Test.");
T_Io.Put_Line(Ada.Exceptions.Exception_Information(O));
end Char_Sets_Test;
next prev parent reply other threads:[~2002-11-29 20:37 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-11-28 17:53 Character Sets Robert C. Leif
2002-11-28 18:08 ` Character Sets (plain text police report) Warren W. Gay VE3WWG
2002-11-28 18:11 ` Warren W. Gay VE3WWG
2002-11-29 11:12 ` Lutz Donnerhacke
2002-11-29 14:58 ` Frank J. Lhota
2002-11-29 20:37 ` Robert C. Leif [this message]
2002-11-30 14:49 ` Marin David Condic
2002-12-01 11:28 ` Jacob Sparre Andersen
2002-12-01 14:38 ` Marin David Condic
2002-12-01 20:25 ` Jacob Sparre Andersen
2002-12-02 9:43 ` Preben Randhol
2002-12-02 13:26 ` Marin David Condic
2002-12-02 6:44 ` Robert C. Leif
2002-12-02 9:41 ` Preben Randhol
2002-12-02 16:58 ` Charles Lindsey
2002-12-02 19:29 ` A suggestion, completely unrelated to the original topic Wes Groleau
2002-12-02 23:21 ` David C. Hoos, Sr.
2002-11-29 12:28 ` Character Sets Georg Bauhaus
2002-12-02 18:28 ` Stephen Leake
2002-12-03 2:45 ` Robert C. Leif
2002-12-03 13:33 ` Robert A Duff
2002-12-03 15:32 ` Juanma Barranquero
2002-12-04 0:49 ` Robert C. Leif
2002-12-14 3:27 ` David Starner
2002-12-14 22:53 ` Vadim Godunko
2002-12-15 3:46 ` David Starner
2002-12-15 23:26 ` Robert C. Leif
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox