comp.lang.ada
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: windows-1251 to utf-8
  @ 2018-10-31 20:58  5%     ` Randy Brukardt
  0 siblings, 0 replies; 16+ results
From: Randy Brukardt @ 2018-10-31 20:58 UTC (permalink / raw)


>Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
>news:prcn4v$d30$1@gioia.aioe.org...
> On 2018-10-31 16:28, eduardsapotski@gmail.com wrote:
>> Let's make it easier. For example:
>>
>> ------------------------------------------------------------------
>>
>> with Ada.Strings.Unbounded;     use Ada.Strings.Unbounded;
>> with Ada.Text_IO.Unbounded_IO;  use Ada.Text_IO.Unbounded_IO;
>>
>> with AWS.Client;            use AWS.Client;
>> with AWS.Messages;          use AWS.Messages;
>> with AWS.Response;          use AWS.Response;
>>
>> procedure Main is
>>
>>     HTML_Result   : Unbounded_String;
>>     Request_Header_List : Header_List;
>>
>> begin
>>
>>     Request_Header_List.Add(Name => "User-Agent", Value => "Mozilla/5.0 
>> (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0");
>>
>>     HTML_Result := Message_Body(Get(URL => "http://www.sql.ru/", Headers 
>> => Request_Header_List));
>>
>>     Put_Line(HTML_Result);
>>
>> end Main;
>>
>> ------------------------------------------------------------------
>>
>> My linux terminal (default UTF-8) show: 
>> https://photos.app.goo.gl/EPgwKoiFSuwkJvgSA
>>
>> If set encoding in terminal Windows-1251 - all is well: 
>> https://photos.app.goo.gl/goN5g7uofD8rYLP79
>>
>> Are there standard ways to solve this problem?
>
> What problem? The page uses the content charset=windows-1251. It is legal.
>
> Your program is illegal as it prints the body using Put_Line. Ada standard 
> requires Character be Latin-1. The only case when your program would be 
> correct is when charset=ISO-8859-1.
>
> You must convert the page body according to the encoding specified by the 
> charset key into a string containing UTF-8 octets and use 
> Streams.Stream_IO to write these octets as-is. The conversion for the case 
> of windows-1251 I described earlier. Create a table Character'Pos 
> 0..255 -> Code_Point and use it for each "character" of HTML_Result.
>
> P.S. GNAT Text_IO ignores Latin-1, but that is between GNAT and the 
> underlying OS.
>
> P.P.S. Technically AWS also ignores Ada standard. But that is an 
> established practice. Since there is no better way.

Right. Probably the easiest way to do this (using just Ada functions) would 
be to:

 (A)  Use Ada.Characters to convert the To_String of the unbounded string to 
a Wide_String, and then store that in a Wide_Unbounded_String (or is that a 
Unbounded_Wide_String?)
 (B) Use Ada.Strings.Wide_Maps to create a character conversion map (the 
conversions were described by another reply);
 (C) Use Ada.Strings.Wide_Unbounded.Translate to apply the mapping from (B) 
to your Wide_Unbounded_String.
(D) Use Ada.Strings.UTF_Encoding.Wide_Strings.Encode to convert 
To_Wide_String to your translated Wide_Unbounded_String, presumably storing 
the result into a Unbounded_String.

You potentially could skip (D) if Wide_Text_IO works when sent to 
Standard_Output (I'd expect that on Windows, no idea on Linux). In that 
case, use Wide_Text_IO.Put to send your result.

In any case, this shows why Unicode exists, and why anything these days that 
uses non-standard encodings is evil. There's really no short-cut to recoding 
such things, and that makes them maddening.

                                  Randy.





^ permalink raw reply	[relevance 5%]

* Re: Ada.Strings.Unbounded vs Ada.Containers.Indefinite_Holders
  @ 2017-09-23  9:16  5%           ` Jeffrey R. Carter
  0 siblings, 0 replies; 16+ results
From: Jeffrey R. Carter @ 2017-09-23  9:16 UTC (permalink / raw)


On 09/23/2017 10:09 AM, Dmitry A. Kazakov wrote:
> On 2017-09-23 00:15, Victor Porton wrote:
>>
>> In my opinion, it would be better to change RM phrasing from "null string"
>> to "empty string", because in some other languages (notably C) NULL means
>> something other. It is just confusing.
> 
> The adjective null and the noun null are distinct parts of speech. C's noun null 
> is an abbreviation of null pointer. If pointers can be null so strings can.

Another way to look at it: Ada has the formal concepts of:

* null access value ARM 4.2(9)
* null array 3.6.1(7)
* null constraint 3.2(7/2)
* null_exclusion 3.10(5.1/2)
* null extension 3.9.1(4.1/2)
* null procedure 6.7(3/3)
* null range 3.5(4)
* null record 3.8(15)
* null slice 4.1.2(7)
* null string literal 2.6(6)
* null value (of an access type) 3.10(13/2)
* null_statement 5.1(6)

not to mention the language-defined identifiers

Null_Address
    in System   13.7(12)
Null_Bounded_String
    in Ada.Strings.Bounded   A.4.4(7)
Null_Id
    in Ada.Exceptions   11.4.1(2/2)
Null_Occurrence
    in Ada.Exceptions   11.4.1(3/2)
Null_Ptr
    in Interfaces.C.Strings   B.3.1(7)
Null_Set
    in Ada.Strings.Maps   A.4.2(5)
    in Ada.Strings.Wide_Maps   A.4.7(5)
    in Ada.Strings.Wide_Wide_Maps   A.4.8(5/2)
Null_Task_Id
    in Ada.Task_Identification   C.7.1(2/2)
Null_Unbounded_String
    in Ada.Strings.Unbounded   A.4.5(5)

(Just look under N in the index.)

It's called overloading. Many of these cases refer to things that can have 
components and mean one with zero components: a null record has no components, a 
null array has no components ('Length = 0), a null string literal has no 
characters, a null set has no members, ... It should not be confusing.

-- 
Jeff Carter
"You cheesy lot of second-hand electric donkey-bottom biters."
Monty Python & the Holy Grail
14

^ permalink raw reply	[relevance 5%]

* Re: Implementing character sets for Wide_Character
  2015-03-06 21:06  5%   ` Martin Trenkmann
@ 2015-03-06 22:21  0%     ` Bob Duff
  0 siblings, 0 replies; 16+ results
From: Bob Duff @ 2015-03-06 22:21 UTC (permalink / raw)


Martin Trenkmann <martin.trenkmann@posteo.de> writes:

> Thanks for pointing out that both methods may consume the same amount of
> memory. So at the end it's up to the compiler and caching effects which version
> runs faster. 

Compilers can do all sorts of things, which is one reason why it's wise
to measure.

But I doubt any Ada compiler would generate a jump table for the case
statement you showed in your OP.  That would be a 2**16-entry table with
almost all of the entries duplicated.  I'd expect something more like a
subtract, a compare, and a conditional jump.  And the function should be
inlined.

> ...At the meantime, however, I was made aware of package
> Ada.Strings.Wide_Maps that I will use now.

And keep using it, unless you discover efficiency problems.  ;-)

- Bob

^ permalink raw reply	[relevance 0%]

* Re: Implementing character sets for Wide_Character
  @ 2015-03-06 21:06  5%   ` Martin Trenkmann
  2015-03-06 22:21  0%     ` Bob Duff
  0 siblings, 1 reply; 16+ results
From: Martin Trenkmann @ 2015-03-06 21:06 UTC (permalink / raw)


>> I know that using an array will consume more memory, but with the Pack aspect it should only be 8 KB - please correct me if I am wrong. The function approach is more memory friendly, but might be a bit slower as an array lookup.
> 
> The function's switch may well be translated into a table,
> so memory consumption may be the same, albeit in a different
> section of the executable. It might be even more then, since
> the packing does not occur. If so, it may be possible to
> prevent this by using IF.
> 
> Like the translation of the switch, speed of lookup
> requires experimentation, I think, that's usually been
> the only reliable predictor given an implementation.

Thanks for pointing out that both methods may consume the same amount of memory. So at the end it's up to the compiler and caching effects which version runs faster. At the meantime, however, I was made aware of package Ada.Strings.Wide_Maps that I will use now.

- Martin



^ permalink raw reply	[relevance 5%]

* Re: Implementing character sets for Wide_Character
  2015-03-06 18:15  5% ` Bob Duff
@ 2015-03-06 21:02  0%   ` Martin Trenkmann
  0 siblings, 0 replies; 16+ results
From: Martin Trenkmann @ 2015-03-06 21:02 UTC (permalink / raw)


>> I need to implement a containment check for Wide_Character in a wide
>> character set. I have two approaches in mind.
> 
> If you really care about speed, then implement both and measure the
> speed.  It's not clear that the bit map will be faster -- using more
> memory harms cache behavior.

Yes that's true.

> Also take a look at package Ada.Strings.Wide_Maps.
> It contains type Wide_Character_Set, represented as a sorted
> sequence of ranges.  Measure that one, too.

Don't know why I overlooked that package. I don't intend to reinvent the wheel - thanks for the hint.

> Why Wide_Character rather than Wide_Wide_Character?

Because my pretty old character set I have to deal with assigns Japanese characters to 16-bit values.

- Martin

^ permalink raw reply	[relevance 0%]

* Re: Implementing character sets for Wide_Character
  @ 2015-03-06 18:15  5% ` Bob Duff
  2015-03-06 21:02  0%   ` Martin Trenkmann
    1 sibling, 1 reply; 16+ results
From: Bob Duff @ 2015-03-06 18:15 UTC (permalink / raw)


Martin Trenkmann <martin.trenkmann@posteo.de> writes:

> I need to implement a containment check for Wide_Character in a wide
> character set. I have two approaches in mind.

If you really care about speed, then implement both and measure the
speed.  It's not clear that the bit map will be faster -- using more
memory harms cache behavior.

Also take a look at package Ada.Strings.Wide_Maps.
It contains type Wide_Character_Set, represented as a sorted
sequence of ranges.  Measure that one, too.

Another possibility is an array of pointers to bit maps,
with "null" in place of "all-False".

It all depends on what kinds of sets you're going to create,
and what operations you need to do.

Why Wide_Character rather than Wide_Wide_Character?

- Bob


^ permalink raw reply	[relevance 5%]

* Re: Q: Localizing type and package references
  2014-01-06  1:29  0% ` Jeffrey Carter
@ 2014-01-06  8:05  0%   ` Simon Wright
  0 siblings, 0 replies; 16+ results
From: Simon Wright @ 2014-01-06  8:05 UTC (permalink / raw)


Jeffrey Carter <spam.jrcarter.not@spam.not.acm.org> writes:

> On 01/05/2014 04:55 PM, b.mcguinness747@gmail.com wrote:
>>
>> --------------------------------------------------------------------------------
>> -- Types - Declarations of data types and related packages
>> --------------------------------------------------------------------------------
>> with Ada.Characters;
>> with Ada.Characters.Wide_Latin_1;
>> with Ada.Strings;
>> with Ada.Strings.Wide_Maps;
>> with Ada.Strings.Wide_Unbounded;
>> with Ada.Wide_Characters;
>> with Ada.Wide_Characters.Handling;
>> with Ada.Wide_Text_IO;
>> with Ada.Wide_Text_IO.Text_Streams;
>>
>> package Types is
>>    package Chars         renames Ada.Characters.Wide_Latin_1;
>>    package Char_Handling renames Ada.Wide_Characters.Handling;
>>    package Char_IO       renames Ada.Wide_Text_IO;
>>    package Char_Maps     renames Ada.Strings.Wide_Maps;
>>    package Char_Streams  renames Ada.Wide_Text_IO.Text_Streams;
>>    package Char_Strings  renames Ada.Strings.Wide_Unbounded;
>>
>>    subtype Char        is Wide_Character;
>>    subtype Char_String is Ada.Strings.Wide_Unbounded.Unbounded_Wide_String;
>> end Types;
>>
>> and then tried referencing this from the main program file with:
>>
>> with Types;
>> use  Types;
>>
>> with Char_Strings;
>>
>> but the compiler (Gnat 4.6) complains that there is no file called
>> char_strings.ads.  I am not sure if I have made a simple mistake that
>> can be easily corrected to make this work, or if there is a different
>> approach that I should be trying.
>
> You can only with a library-level package, one not declared in
> anything else. Char_Strings is declared in package types, so it's not
> library level and can't be withed.

So, you can either reclare package Chars etc at library level: for
example, in char_strings.ads,

   with Ada.Strings.Wide_Unbounded;
   package Char_Strings  renames Ada.Strings.Wide_Unbounded;

or you can change your main program file to say

   with Types;
   use Types;
   use Types.Chars;
   use Types.Char_Handling;
   ...

or perhaps

   with Types;
   use Types;
   procedure Main is
      use Chars;
      use Char_Handling;
      ...

^ permalink raw reply	[relevance 0%]

* Re: Q: Localizing type and package references
  2014-01-05 23:55  6% Q: Localizing type and package references b.mcguinness747
@ 2014-01-06  1:29  0% ` Jeffrey Carter
  2014-01-06  8:05  0%   ` Simon Wright
  0 siblings, 1 reply; 16+ results
From: Jeffrey Carter @ 2014-01-06  1:29 UTC (permalink / raw)


On 01/05/2014 04:55 PM, b.mcguinness747@gmail.com wrote:
>
> --------------------------------------------------------------------------------
> -- Types - Declarations of data types and related packages
> --------------------------------------------------------------------------------
> with Ada.Characters;
> with Ada.Characters.Wide_Latin_1;
> with Ada.Strings;
> with Ada.Strings.Wide_Maps;
> with Ada.Strings.Wide_Unbounded;
> with Ada.Wide_Characters;
> with Ada.Wide_Characters.Handling;
> with Ada.Wide_Text_IO;
> with Ada.Wide_Text_IO.Text_Streams;
>
> package Types is
>    package Chars         renames Ada.Characters.Wide_Latin_1;
>    package Char_Handling renames Ada.Wide_Characters.Handling;
>    package Char_IO       renames Ada.Wide_Text_IO;
>    package Char_Maps     renames Ada.Strings.Wide_Maps;
>    package Char_Streams  renames Ada.Wide_Text_IO.Text_Streams;
>    package Char_Strings  renames Ada.Strings.Wide_Unbounded;
>
>    subtype Char        is Wide_Character;
>    subtype Char_String is Ada.Strings.Wide_Unbounded.Unbounded_Wide_String;
> end Types;
>
> and then tried referencing this from the main program file with:
>
> with Types;
> use  Types;
>
> with Char_Strings;
>
> but the compiler (Gnat 4.6) complains that there is no file called
> char_strings.ads.  I am not sure if I have made a simple mistake that
> can be easily corrected to make this work, or if there is a different
> approach that I should be trying.

You can only with a library-level package, one not declared in anything else. 
Char_Strings is declared in package types, so it's not library level and can't 
be withed.

-- 
Jeff Carter
"Spam! Spam! Spam! Spam! Spam! Spam! Spam! Spam!"
Monty Python's Flying Circus
53


^ permalink raw reply	[relevance 0%]

* Q: Localizing type and package references
@ 2014-01-05 23:55  6% b.mcguinness747
  2014-01-06  1:29  0% ` Jeffrey Carter
  0 siblings, 1 reply; 16+ results
From: b.mcguinness747 @ 2014-01-05 23:55 UTC (permalink / raw)


I want to write an Ada program using the Wide_Character type, but I might
want to move to Wide_Wide_Character later on.  So I want to localize all
references to Wide_Character and the associated standard Ada packages to
a single file that I can easily update.  If I was working in C++, I would
use typedefs to create pseudonyms and put these in a header file that I could
#include from various source files.  So I have tried to do something similar
in Ada.  I created the file types.ads:

--------------------------------------------------------------------------------
-- Types - Declarations of data types and related packages
--------------------------------------------------------------------------------
with Ada.Characters;
with Ada.Characters.Wide_Latin_1;
with Ada.Strings;
with Ada.Strings.Wide_Maps;
with Ada.Strings.Wide_Unbounded;
with Ada.Wide_Characters;
with Ada.Wide_Characters.Handling;
with Ada.Wide_Text_IO;
with Ada.Wide_Text_IO.Text_Streams;

package Types is
  package Chars         renames Ada.Characters.Wide_Latin_1;
  package Char_Handling renames Ada.Wide_Characters.Handling;
  package Char_IO       renames Ada.Wide_Text_IO;
  package Char_Maps     renames Ada.Strings.Wide_Maps;
  package Char_Streams  renames Ada.Wide_Text_IO.Text_Streams;
  package Char_Strings  renames Ada.Strings.Wide_Unbounded;

  subtype Char        is Wide_Character;
  subtype Char_String is Ada.Strings.Wide_Unbounded.Unbounded_Wide_String;
end Types;


and then tried referencing this from the main program file with:


with Types;
use  Types;

with Char_Strings;


but the compiler (Gnat 4.6) complains that there is no file called
char_strings.ads.  I am not sure if I have made a simple mistake that
can be easily corrected to make this work, or if there is a different
approach that I should be trying.

Help would be appreciated.

Thanks.

--- Brian


^ permalink raw reply	[relevance 6%]

* Re: Range types
  2007-10-23 23:52  0%         ` anon
@ 2007-10-24 12:57  0%           ` Christos Chryssochoidis
  0 siblings, 0 replies; 16+ results
From: Christos Chryssochoidis @ 2007-10-24 12:57 UTC (permalink / raw)


Thanks very much, for giving me in such detail the solution, anon!

Regards,

Christos Chryssochoidis

anon wrote:
> --
> -- Package example:
> --
> -- For non Greek keyboards use Wide_Character 
> --  { ["<xxxx>"]["<yyyy>"] } 
> --  where <xxxx> and <yyyy> are 4-hex-digits to represents
> --  the two Wide_Character values 
> --
> --  Example is 
> --    ["03d6"]["1eee"]  -- valid Greek string
> --    ["03d6"]h -- is not valid because character "h" is not
> --                 a valid Greek character.
> --
> 
> 
> with Ada.Wide_Text_IO ;
> use  Ada.Wide_Text_IO ;
> 
> procedure tst is
> 
>   --
>   -- Internal Packages:
>   --
>   package Greek is
> 
> 
>     function Is_Greek_Character ( GC : Wide_Character ) 
>                                 return Boolean ;
> 
> 
>     function Is_Greek_Character_2 ( GC : Wide_Character ) 
>                                 return Boolean ;
> 
>     function Is_Greek_Character ( GS : Wide_String ) 
>                                 return Boolean ;
> 
>   end Greek ;
> 
>   --
>   -- Internal Body Package 
>   --
>   package body Greek is
> 
>     -- --------------------------- --
>     -- Use for Is_Greek_Character  --
>     -- --------------------------- --
> 
>     --
>     -- creates a greek constraint type
>     --
>     subtype Greek_Base is Wide_Character 
>                                 range Wide_Character'Val ( 16#370# )
>                                    .. Wide_Character'Val ( 16#1FFF# ) ;
> 
>     --
>     -- creates an excluded type
>     --
>     subtype Greek_Exclude_Subtype is Greek_Base
>                                 range Greek_Base'Val ( 16#03D8# )
>                                    .. Greek_Base'Val ( 16#0FFF# ) ;
> 
>     -- ----------------------------- --
>     -- Use for Is_Greek_Character_2  --
>     -- ----------------------------- --
> 
>     --
>     -- create lower greek characters type
>     --
>     subtype Lower_Greek_Character is Wide_Character 
>                                 range Wide_Character'Val ( 16#0370# )
>                                    .. Wide_Character'Val ( 16#03D7# ) ;
>     --
>     -- create upper greek characters type
>     --
>     subtype Upper_Greek_Character is Wide_Character 
>                                range Wide_Character 'Val ( 16#1000# )  
>                                   .. Wide_Character 'Val ( 16#1FFF# ) ;
> 
> 
> 
> 
>   --
>   -- Is_Greek_Character 
>   --
>   function Is_Greek_Character ( GC : Wide_Character ) 
>                               return Boolean is
> 
>     begin
>       --
>       -- Is character within the Greek base
>       --
>       if GC in Greek_Base then
>         --
>         -- Is character apart of the the non-Greek sub type
>         --
>         if GC in Greek_Exclude_Subtype then    
>           return False ;
>         else
>           return True ;
>         end if ;
>       else
>         return False ;
>       end if ;
>     end ;
> 
> 
>   --
>   -- Is_Greek_Character version number 2
>   --
>   function Is_Greek_Character_2 ( GC : Wide_Character ) 
>                               return Boolean is
> 
>     begin
>       --
>       -- Could use:
>       -- 
>       -- when Lower_Greek_Character | Upper_Greek_Character =>
>       --    return True ;
>       --
>       case GC is 
>          when Lower_Greek_Character =>
>             return True ;
>          when Upper_Greek_Character =>
>             return True ;
>          when others =>
>             return False ;
>       end case ;
>     end ;
> 
> 
>   function Is_Greek_Character ( GS : Wide_String ) 
>                               return Boolean is
> 
>     begin
>       --
>       -- Could use:
>       -- 
>       --
>       for Index in 1 .. GS'Length loop
>         --
>         -- if index-character of a string is not a Greek character
>         --
>         if not Is_Greek_Character_2 ( GS ( Index ) ) then 
>           return False ;
>         end if ;
>       end loop ;
>       --
>       -- String contains all Greek characters
>       --
>       return True ;
>     end ;
> 
>   end Greek ;
> 
> 
>   stz : wide_string ( 1..2 ) ;
> 
>   use Greek ;
> 
> begin
> --
>   put ( "Enter (2 character Greek string) => " ) ;
>   get ( stz ) ;
> --
>   put ( "Testing => " ) ;
>   put ( stz ) ;
>   new_line ;  
> --
>   if Is_Greek_Character ( stz ) then
>     put_line ( "Greek String ? => Yes" ) ;
>   else
>     put_line ( "Greek String ? => No" ) ;
>     --
>     -- Char 1 ?
>     --
>     if Is_Greek_Character ( stz ( 1 ) ) then
>       put_line ( "Character (1) Greek ? => Yes" ) ;
>     else
>       put_line ( "Character (1) Greek ? => No" ) ;
>     end if ;    
> 
>     --
>     -- Char 2 ?
>     --
>     if Is_Greek_Character_2 ( stz ( 2 ) ) then
>       put_line ( "Character (2) Greek ? => Yes" ) ;
>     else
>       put_line ( "Character (1) Greek ? => No" ) ;
>     end if ;
>   end if ;    
> --
> end tst ;
> 
> 
> 
> In <1193051690.350063@athprx04>, Christos Chryssochoidis <C.Chryssochoidis@gmail.com> writes:
>> Jacob Sparre Andersen wrote:
>>> Christos Chryssochoidis wrote:
>>>
>>>> I would like to define a subtype of Wide_Character for a program
>>>> that processes (unicode) text. This type would represent the Greek
>>>> letters.
>>> This sounds like what enumerated types are for.  You could do it like
>>> this:
>>>
>>>    type Faroese_Letter is ('a', 'A', 'b', 'B', 'd', 'D', 'ð', 'Ð',
>>>                            'e', 'E', [...],
>>>                            'y', 'Y', 'ý', 'Ý', 'æ', 'Æ', 'ø', 'Ø');
>>>    -- optional representation clause
>>>
>>>    function To_Wide_Wide_Character (Item : in Faroese_Letter)
>>>      return Wide_Wide_Character;
>>>
>>>    function To_Faroese_Letter (Item : in Wide_Wide_Character)
>>>      return Faroese_Letter;
>>>
>>> The conversion functions could make use of representation clauses,
>>> "Image" and "Value" functions, or tables.
>>>
>>>> Greek letters in Unicode, with all their diacritics, are
>>>> located in two separate ranges: 0370 - 03D7 and 1F00 - 1FFF. That's
>>>> 360 characters to write in an enumeration... Since gaps are not
>>>> allowed in ranges, I 'm thinking instead of defining such a type, to
>>>> define a function that would accept a Wide_Character as argument and
>>>> return a boolean value indicating whether the given Wide_Character
>>>> falls in the ranges of the Greek characters.
>>> This could be done very simply using Ada.Strings.Maps.
>>>
>>> How you should do it depends strongly on what you actually need your
>>> Greek_Letter type for.
>>>
>>> Greetings,
>>>
>>> Jacob
>> Thanks! Ada.Strings.Wide_Maps seems very helpful for what I want to do. 
>> Basically, what I would like to do is to write a program that given a 
>> text file in utf8 encoding, which would contain ancient greek text, 
>> which is written with all the diacritic marks on the letters, this 
>> program would load the contents of the file in memory, strip the 
>> in-memory text contents from all the diacritics except those used in 
>> today's "modern" Greek, and write the modified contents to a new file of 
>> the user's choosing. For this it would be nice if there were some 
>> package for regular expressions for Ada. Then if I succeeded in the 
>> mentioned task,  I 'd like to do some natural language processing (NLP, 
>> that is linguistics processing) with my program, but I don't know if Ada 
>> would be an appropriate language for such a task (NLP). I've seen on the 
>> web references to NLP applications with functional languages or logic 
>> programming languages, but not many implemented with imperative 
>> languages... (Sorry for getting of topic...)
>>
>> Thanks very much,
>> Christos
> 



^ permalink raw reply	[relevance 0%]

* Re: Range types
  2007-10-22 11:14  5%       ` Christos Chryssochoidis
@ 2007-10-23 23:52  0%         ` anon
  2007-10-24 12:57  0%           ` Christos Chryssochoidis
  0 siblings, 1 reply; 16+ results
From: anon @ 2007-10-23 23:52 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7135 bytes --]

--
-- Package example:
--
-- For non Greek keyboards use Wide_Character 
--  { ["<xxxx>"]["<yyyy>"] } 
--  where <xxxx> and <yyyy> are 4-hex-digits to represents
--  the two Wide_Character values 
--
--  Example is 
--    ["03d6"]["1eee"]  -- valid Greek string
--    ["03d6"]h -- is not valid because character "h" is not
--                 a valid Greek character.
--


with Ada.Wide_Text_IO ;
use  Ada.Wide_Text_IO ;

procedure tst is

  --
  -- Internal Packages:
  --
  package Greek is


    function Is_Greek_Character ( GC : Wide_Character ) 
                                return Boolean ;


    function Is_Greek_Character_2 ( GC : Wide_Character ) 
                                return Boolean ;

    function Is_Greek_Character ( GS : Wide_String ) 
                                return Boolean ;

  end Greek ;

  --
  -- Internal Body Package 
  --
  package body Greek is

    -- --------------------------- --
    -- Use for Is_Greek_Character  --
    -- --------------------------- --

    --
    -- creates a greek constraint type
    --
    subtype Greek_Base is Wide_Character 
                                range Wide_Character'Val ( 16#370# )
                                   .. Wide_Character'Val ( 16#1FFF# ) ;

    --
    -- creates an excluded type
    --
    subtype Greek_Exclude_Subtype is Greek_Base
                                range Greek_Base'Val ( 16#03D8# )
                                   .. Greek_Base'Val ( 16#0FFF# ) ;

    -- ----------------------------- --
    -- Use for Is_Greek_Character_2  --
    -- ----------------------------- --

    --
    -- create lower greek characters type
    --
    subtype Lower_Greek_Character is Wide_Character 
                                range Wide_Character'Val ( 16#0370# )
                                   .. Wide_Character'Val ( 16#03D7# ) ;
    --
    -- create upper greek characters type
    --
    subtype Upper_Greek_Character is Wide_Character 
                               range Wide_Character 'Val ( 16#1000# )  
                                  .. Wide_Character 'Val ( 16#1FFF# ) ;




  --
  -- Is_Greek_Character 
  --
  function Is_Greek_Character ( GC : Wide_Character ) 
                              return Boolean is

    begin
      --
      -- Is character within the Greek base
      --
      if GC in Greek_Base then
        --
        -- Is character apart of the the non-Greek sub type
        --
        if GC in Greek_Exclude_Subtype then    
          return False ;
        else
          return True ;
        end if ;
      else
        return False ;
      end if ;
    end ;


  --
  -- Is_Greek_Character version number 2
  --
  function Is_Greek_Character_2 ( GC : Wide_Character ) 
                              return Boolean is

    begin
      --
      -- Could use:
      -- 
      -- when Lower_Greek_Character | Upper_Greek_Character =>
      --    return True ;
      --
      case GC is 
         when Lower_Greek_Character =>
            return True ;
         when Upper_Greek_Character =>
            return True ;
         when others =>
            return False ;
      end case ;
    end ;


  function Is_Greek_Character ( GS : Wide_String ) 
                              return Boolean is

    begin
      --
      -- Could use:
      -- 
      --
      for Index in 1 .. GS'Length loop
        --
        -- if index-character of a string is not a Greek character
        --
        if not Is_Greek_Character_2 ( GS ( Index ) ) then 
          return False ;
        end if ;
      end loop ;
      --
      -- String contains all Greek characters
      --
      return True ;
    end ;

  end Greek ;


  stz : wide_string ( 1..2 ) ;

  use Greek ;

begin
--
  put ( "Enter (2 character Greek string) => " ) ;
  get ( stz ) ;
--
  put ( "Testing => " ) ;
  put ( stz ) ;
  new_line ;  
--
  if Is_Greek_Character ( stz ) then
    put_line ( "Greek String ? => Yes" ) ;
  else
    put_line ( "Greek String ? => No" ) ;
    --
    -- Char 1 ?
    --
    if Is_Greek_Character ( stz ( 1 ) ) then
      put_line ( "Character (1) Greek ? => Yes" ) ;
    else
      put_line ( "Character (1) Greek ? => No" ) ;
    end if ;    

    --
    -- Char 2 ?
    --
    if Is_Greek_Character_2 ( stz ( 2 ) ) then
      put_line ( "Character (2) Greek ? => Yes" ) ;
    else
      put_line ( "Character (1) Greek ? => No" ) ;
    end if ;
  end if ;    
--
end tst ;



In <1193051690.350063@athprx04>, Christos Chryssochoidis <C.Chryssochoidis@gmail.com> writes:
>Jacob Sparre Andersen wrote:
>> Christos Chryssochoidis wrote:
>> 
>>> I would like to define a subtype of Wide_Character for a program
>>> that processes (unicode) text. This type would represent the Greek
>>> letters.
>> 
>> This sounds like what enumerated types are for.  You could do it like
>> this:
>> 
>>    type Faroese_Letter is ('a', 'A', 'b', 'B', 'd', 'D', '�', '�',
>>                            'e', 'E', [...],
>>                            'y', 'Y', '�', '�', '�', '�', '�', '�');
>>    -- optional representation clause
>> 
>>    function To_Wide_Wide_Character (Item : in Faroese_Letter)
>>      return Wide_Wide_Character;
>> 
>>    function To_Faroese_Letter (Item : in Wide_Wide_Character)
>>      return Faroese_Letter;
>> 
>> The conversion functions could make use of representation clauses,
>> "Image" and "Value" functions, or tables.
>> 
>>> Greek letters in Unicode, with all their diacritics, are
>>> located in two separate ranges: 0370 - 03D7 and 1F00 - 1FFF. That's
>>> 360 characters to write in an enumeration... Since gaps are not
>>> allowed in ranges, I 'm thinking instead of defining such a type, to
>>> define a function that would accept a Wide_Character as argument and
>>> return a boolean value indicating whether the given Wide_Character
>>> falls in the ranges of the Greek characters.
>> 
>> This could be done very simply using Ada.Strings.Maps.
>> 
>> How you should do it depends strongly on what you actually need your
>> Greek_Letter type for.
>> 
>> Greetings,
>> 
>> Jacob
>
>Thanks! Ada.Strings.Wide_Maps seems very helpful for what I want to do. 
>Basically, what I would like to do is to write a program that given a 
>text file in utf8 encoding, which would contain ancient greek text, 
>which is written with all the diacritic marks on the letters, this 
>program would load the contents of the file in memory, strip the 
>in-memory text contents from all the diacritics except those used in 
>today's "modern" Greek, and write the modified contents to a new file of 
>the user's choosing. For this it would be nice if there were some 
>package for regular expressions for Ada. Then if I succeeded in the 
>mentioned task,  I 'd like to do some natural language processing (NLP, 
>that is linguistics processing) with my program, but I don't know if Ada 
>would be an appropriate language for such a task (NLP). I've seen on the 
>web references to NLP applications with functional languages or logic 
>programming languages, but not many implemented with imperative 
>languages... (Sorry for getting of topic...)
>
>Thanks very much,
>Christos




^ permalink raw reply	[relevance 0%]

* Re: Range types
  @ 2007-10-22 11:14  5%       ` Christos Chryssochoidis
  2007-10-23 23:52  0%         ` anon
  0 siblings, 1 reply; 16+ results
From: Christos Chryssochoidis @ 2007-10-22 11:14 UTC (permalink / raw)


Jacob Sparre Andersen wrote:
> Christos Chryssochoidis wrote:
> 
>> I would like to define a subtype of Wide_Character for a program
>> that processes (unicode) text. This type would represent the Greek
>> letters.
> 
> This sounds like what enumerated types are for.  You could do it like
> this:
> 
>    type Faroese_Letter is ('a', 'A', 'b', 'B', 'd', 'D', '�', '�',
>                            'e', 'E', [...],
>                            'y', 'Y', '�', '�', '�', '�', '�', '�');
>    -- optional representation clause
> 
>    function To_Wide_Wide_Character (Item : in Faroese_Letter)
>      return Wide_Wide_Character;
> 
>    function To_Faroese_Letter (Item : in Wide_Wide_Character)
>      return Faroese_Letter;
> 
> The conversion functions could make use of representation clauses,
> "Image" and "Value" functions, or tables.
> 
>> Greek letters in Unicode, with all their diacritics, are
>> located in two separate ranges: 0370 - 03D7 and 1F00 - 1FFF. That's
>> 360 characters to write in an enumeration... Since gaps are not
>> allowed in ranges, I 'm thinking instead of defining such a type, to
>> define a function that would accept a Wide_Character as argument and
>> return a boolean value indicating whether the given Wide_Character
>> falls in the ranges of the Greek characters.
> 
> This could be done very simply using Ada.Strings.Maps.
> 
> How you should do it depends strongly on what you actually need your
> Greek_Letter type for.
> 
> Greetings,
> 
> Jacob

Thanks! Ada.Strings.Wide_Maps seems very helpful for what I want to do. 
Basically, what I would like to do is to write a program that given a 
text file in utf8 encoding, which would contain ancient greek text, 
which is written with all the diacritic marks on the letters, this 
program would load the contents of the file in memory, strip the 
in-memory text contents from all the diacritics except those used in 
today's "modern" Greek, and write the modified contents to a new file of 
the user's choosing. For this it would be nice if there were some 
package for regular expressions for Ada. Then if I succeeded in the 
mentioned task,  I 'd like to do some natural language processing (NLP, 
that is linguistics processing) with my program, but I don't know if Ada 
would be an appropriate language for such a task (NLP). I've seen on the 
web references to NLP applications with functional languages or logic 
programming languages, but not many implemented with imperative 
languages... (Sorry for getting of topic...)

Thanks very much,
Christos



^ permalink raw reply	[relevance 5%]

* Re: config files proposal
  2002-06-04 19:55  7%         ` Ted Dennison
@ 2002-06-09 20:43  0%           ` Stephen Leake
  0 siblings, 0 replies; 16+ results
From: Stephen Leake @ 2002-06-09 20:43 UTC (permalink / raw)


dennison@telepath.com (Ted Dennison) writes:

> Darren New <dnew@san.rr.com> wrote in message
> news:<3CFB94A7.A455B8DD@san.rr.com>... 
> > I think they should be case sensitive unless there's a standard way of
> > converting uppercase unicode to lowercase unicode in Ada's libraries.
> 
> We need to match up well with what's in the Ada standard I think. 
> 
> The Ada standard has an Ada.Strings.Maps.Constants.Lower_Case_Map and
> an Ada.Strings.Wide_Maps.Wide_Constants.Lower_Case_Map. There is no
> reason why we can't define things to say that keys will be fed through
> the appropriate Lower_Case_Map before being matched.

Good point.

> If the language itself has some kind of problem with
> Ada.Strings.Wide_Maps.Wide_Constants.Lower_Case_Map, I say that is
> the language's problem, and outside the scope of a configuration
> item facility to try to worry about fixing. As long as we define
> what we are doing carfully this way, it is still explicit. If we
> stick to using the standard library, things should at least behave
> for people the way they have come to expect them to behave when
> using Ada.

According to my interpretation of recent posts by Robert Dewar (is
that qualified enough :), there is very little use of Wide_String. So
it's not clear whether Lower_Case_Map is "what people expect". But I
agree it would be a good place to start. 

Perhaps case sensitivity should be an option in the config file API.

-- 
-- Stephe



^ permalink raw reply	[relevance 0%]

* Re: config files proposal
  @ 2002-06-04 19:55  7%         ` Ted Dennison
  2002-06-09 20:43  0%           ` Stephen Leake
  0 siblings, 1 reply; 16+ results
From: Ted Dennison @ 2002-06-04 19:55 UTC (permalink / raw)


Darren New <dnew@san.rr.com> wrote in message news:<3CFB94A7.A455B8DD@san.rr.com>...
> I think they should be case sensitive unless there's a standard way of
> converting uppercase unicode to lowercase unicode in Ada's libraries.

We need to match up well with what's in the Ada standard I think. 

The Ada standard has an Ada.Strings.Maps.Constants.Lower_Case_Map and
an Ada.Strings.Wide_Maps.Wide_Constants.Lower_Case_Map. There is no
reason why we can't define things to say that keys will be fed through
the appropriate Lower_Case_Map before being matched.

If the language itself has some kind of problem with
Ada.Strings.Wide_Maps.Wide_Constants.Lower_Case_Map, I say that is the
language's problem, and outside the scope of a configuration item
facility to try to worry about fixing. As long as we define what we
are doing carfully this way, it is still explicit. If we stick to
using the standard library, things should at least behave for people
the way they have come to expect them to behave when using Ada.

-- 
T.E.D. 
Home     -  mailto:dennison@telepath.com (Yahoo: Ted_Dennison)
Homepage -  http://www.telepath.com/dennison/Ted/TED.html



^ permalink raw reply	[relevance 7%]

* Re: String type conversions
  @ 1999-02-09  0:00  6% ` Stephen Leake
  0 siblings, 0 replies; 16+ results
From: Stephen Leake @ 1999-02-09  0:00 UTC (permalink / raw)


David Botton <David@Botton.com> writes:

> What is the best way to convert back forth between:
> 
> Wide_String to/from String

I'd define a map (Ada.Strings.Wide_Maps) to map non-ASCII characters
to ASCII.Nul or whatever might be appropriate, then write a loop to do
the actual copy from the map output character by character. Let the
compiler optimize the loop.

Or you can use Ada.Strings.Handling.To_Character in the loop, if you
don't need a full map.

> wchar_array to/from char_array

Convert to the Ada type, then proceed as above.

-- Stephe




^ permalink raw reply	[relevance 6%]

* Re: Equality operator overloading in ADA 83
  @ 1997-04-22  0:00  4%     ` Robert A Duff
  0 siblings, 0 replies; 16+ results
From: Robert A Duff @ 1997-04-22  0:00 UTC (permalink / raw)



In article <335CAEFE.35DC@elca-matrix.ch>,
Mats Weber  <Mats.Weber@elca-matrix.ch> wrote:
>The same holds for Ada.Strings.Unbounded, and there was some discussion
>on this a year ago or so here in c.l.a. Is anything being done so that
>an AI is issued to ensure this (if AIs still exist) ?

This is AI95-123, which has been approved by the ARG, but not (yet) by
WG9.  I've included it below.  In general, I believe that ACVC tests do
not get written based on AIs until WG9 has approved.

In case anyone's interested, the AI's are stored on
sw-eng.falls-church.va.us, in /public/adaic/standards/95com/ada-issues.

- Bob

!standard 04.05.02 (24)                               97-03-19  AI95-00123/05
!class binding interpretation 96-07-23
!status ARG approved 10-0-2 (subject to editorial review)  96-10-07
!status work item (letter ballot was 6-6-0) 96-10-03
!status ARG approved 8-0-0 (subject to letter ballot) 96-06-17
!status work item 96-04-04
!status received 96-04-04
!priority High
!difficulty Medium
!subject Equality for Composite Types

!summary 96-11-19

The primitive equality operators of language defined types compose
properly, when the type is used as a component type, or a generic actual
type.

For any composite type, the order in which "=" is called for components
is not defined by the language.  Furthermore, if the result is
determined before calling "=" on some components, the language does not
define whether "=" is called on those components.

!question 96-07-23

The following language-defined types are private, and have an explicitly
defined primitive "=" operator:

    System.Address
    Ada.Strings.Maps.Character_Set
    Ada.Strings.Bounded.Generic_Bounded_Length.Bounded_String
    Ada.Strings.Unbounded.Unbounded_String
    Ada.Strings.Wide_Maps.Wide_Character_Set
    Ada.Task_Identification.Task_ID

This would seem to imply that the composability of these "=" operators
depends on whether the implementation chooses to implement them as
tagged types, by 4.5.2(14-24):

  14   For a type extension, predefined equality is defined in terms of the
  primitive (possibly user-defined) equals operator of the parent type and of
  any tagged components of the extension part, and predefined equality for any
  other components not inherited from the parent type.

  15   For a private type, if its full type is tagged, predefined equality is
  defined in terms of the primitive equals operator of the full type; if the
  full type is untagged, predefined equality for the private type is that of
  its full type.
  ...
  21   Given the above definition of matching components, the result of the
  predefined equals operator for composite types (other than for those
  composite types covered earlier) is defined as follows:

     22  If there are no components, the result is defined to be True;

     23  If there are unmatched components, the result is defined to be
         False;

     24  Otherwise, the result is defined in terms of the primitive equals
         operator for any matching tagged components, and the predefined
         equals for any matching untagged components.

This would cause portability problems.

Also, in the above definition, what does "in terms of" mean?  For a
composite type, if some parts have an "=" with side effects, does the
language define whether all of these side effects happen, and in what
order?

!recommendation 96-11-16

(See summary.)

!wording 96-07-23

!discussion 97-03-19

Composability of equality means three things:

    1. If a composite type has a component of type T with a user-defined
       equality operator, then the predefined equality of the composite
       type calls the user-defined equality operator of type T (for that
       component).

    2. If an actual type T for a generic formal type has a user-defined
       equality operator, then the predefined equality on the generic
       formal type calls the user-defined equality operator of type T.

    3. If a parent type T has a user-defined equality operator, then the
       predefined equality of a type extension of T calls the
       user-defined equality on T (for the parent part), in addition to
       comparing the extension parts.

Non-composability means that the predefined equality is called for T,
despite the fact that T has a user-defined equality operator.  Of
course, if there is no user-defined equality, then equality always
composes properly.

Number 3 is irrelevant here, since none of the types in question is
(visibly) tagged.

For a private type, if the underlying type is tagged, or if there is no
user-defined equality, then equality composes.  Otherwise, it does not.
(Here, "underlying type" means the full type, or if that comes from a
private type, then the underlying type of *that* type, and so on.)

However, for the private types mentioned in the question, the RM does
not specify whether the underlying type is tagged, nor whether the
equality operator is truly user-defined (as opposed to just being the
normal bit-wise equality).

It is important that the composability of "=" for these types be defined
by the language.  We choose to make them composable.  An implementation
can achieve this by making the full type tagged.  Alternatively, the
implementation could simply use the predefined "=" for these types.
(Alternatively, an implementation could treat these types specially,
making them untagged, but with composable equality.  However, this would
add some complexity to the compiler.)

Here is an analysis of implementation concerns for each type in
question:

    - System.Address: The intent is for this type to directly represent 
      a hardware address.  Therefore, it is probably not feasible to
      to implement it as a tagged type.  The simplest implementation of
      equality of Addresses is thus the normal bit-wise equality.  This
      is what most users would expect, anyway.

      On certain segmented architectures, it is possible for two
      different addresses to point to the same location.  The same thing
      can happen due to memory mapping, on many machines.  Such
      addresses will typically compare unequal, despite the fact that
      they point to the same location.

    - Ada.Strings.Maps.Character_Set: A typical implementation will use
      an array of Booleans, so bit-wise equality will be used, so it
      will compose.

    - Ada.Strings.Bounded.Generic_Bounded_Length.Bounded_String: Two
      reasonable implementations are: (1) Nul-out the unused
      characters, and use bit-wise equality, and (2) use a tagged type
      with a user-defined equality.  Either way, equality will compose.
      This is, admittedly, a slight implementation burden, because it
      rules out an untagged record with user-defined equality.

    - Ada.Strings.Unbounded.Unbounded_String: A tagged (controlled) type
      will normally be necessary anyway, for storage reclamation.  In a
      garbage-collected implementation, a tagged type is not strictly
      necessary, but we choose to require composability anyway.

    - Ada.Strings.Wide_Maps.Wide_Character_Set: Some sort of data
      structure built out of access types is necessary anyway, so the
      extra overhead of composability is not a serious problem; the
      implementation can simply make the full type tagged.

    - Ada.Task_Identification.Task_ID: This will typically be a
      pointer-to-TCB of some sort (access-to-TCB, or
      index-into-table-of-TCB's).  In any case, bit-wise equality will
      work, so equality will compose.

As to the second question, the RM clearly does not define any order of
calling "=" on components, nor does it say whether the results are
combined with "and" or "and then".  Equality operators with side effects
are questionable in any case, so we allow implementations freedom to do
what is most convenient and/or most efficient.  Consider equality of a
variant record: The implementation might first check that the
discriminants are equal, and if not, skip the component-by-component
comparison.  Alternatively, the implementation might first compare the
common elements, and *then* check the discriminants.  A third
possibility is to first compare some portions with a bit-wise equality,
and then (conditionally) call user-defined equality operators on the
other components.  All of these implementations are valid.

!appendix 97-03-19

...




^ permalink raw reply	[relevance 4%]

Results 1-16 of 16 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
1997-04-21  0:00     Equality operator overloading in ADA 83 Manuel Wenger
1997-04-22  0:00     ` Matthew Heaney
1997-04-22  0:00       ` Mats Weber
1997-04-22  0:00  4%     ` Robert A Duff
1999-02-09  0:00     String type conversions David Botton
1999-02-09  0:00  6% ` Stephen Leake
2002-06-02 16:07     config files proposal Stephen Leake
2002-06-02 21:29     ` Darren New
2002-06-02 22:16       ` Stephen Leake
2002-06-03 14:56         ` Ted Dennison
2002-06-03 16:08           ` Darren New
2002-06-04 19:55  7%         ` Ted Dennison
2002-06-09 20:43  0%           ` Stephen Leake
2007-10-21 19:15     Range types Christos Chryssochoidis
2007-10-21 20:23     ` Niklas Holsti
2007-10-21 21:28       ` Christos Chryssochoidis
2007-10-22  7:23         ` Jacob Sparre Andersen
2007-10-22 11:14  5%       ` Christos Chryssochoidis
2007-10-23 23:52  0%         ` anon
2007-10-24 12:57  0%           ` Christos Chryssochoidis
2014-01-05 23:55  6% Q: Localizing type and package references b.mcguinness747
2014-01-06  1:29  0% ` Jeffrey Carter
2014-01-06  8:05  0%   ` Simon Wright
2015-03-06 18:01     Implementing character sets for Wide_Character Martin Trenkmann
2015-03-06 18:15  5% ` Bob Duff
2015-03-06 21:02  0%   ` Martin Trenkmann
2015-03-06 18:18     ` G.B.
2015-03-06 21:06  5%   ` Martin Trenkmann
2015-03-06 22:21  0%     ` Bob Duff
2017-09-21 18:14     Ada.Strings.Unbounded vs Ada.Containers.Indefinite_Holders Victor Porton
2017-09-21 21:30     ` AdaMagica
2017-09-22 12:16       ` Victor Porton
2017-09-22 19:25         ` Simon Wright
2017-09-22 22:15           ` Victor Porton
2017-09-23  8:09             ` Dmitry A. Kazakov
2017-09-23  9:16  5%           ` Jeffrey R. Carter
2018-10-31  2:57     windows-1251 to utf-8 eduardsapotski
2018-10-31 15:28     ` eduardsapotski
2018-10-31 17:01       ` Dmitry A. Kazakov
2018-10-31 20:58  5%     ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox