comp.lang.ada
 help / color / mirror / Atom feed
* String filtering
@ 2005-09-27  6:27 David Trudgett
  2005-09-27  7:38 ` Jacob Sparre Andersen
                   ` (2 more replies)
  0 siblings, 3 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-27  6:27 UTC (permalink / raw)



Hi all,

I've been puzzling for a little bit over a good way to filter out
unwanted characters from a string. In particular, I have an unbounded
string and want to filter out of it all characters not in 'a'..'z',
'A'..'Z', '0'..'9'. So far I've only thought of tedious ways to do
it. Is there an easy way to do it using the string handling facilities
in Ada? I think I almost got there with the idea of using
Maps.Character_Set, and so on, but I haven't quite pieced it together
yet.

Thanks.

David

-- 

David Trudgett
http://www.zeta.org.au/~wpower/

We must learn to live together as brothers or perish together as
fools. 

    -- Martin Luther King, Jr.




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27  6:27 String filtering David Trudgett
@ 2005-09-27  7:38 ` Jacob Sparre Andersen
  2005-09-27  9:13   ` David Trudgett
  2005-09-27  7:41 ` tmoran
  2005-09-28  1:54 ` Steve
  2 siblings, 1 reply; 71+ messages in thread
From: Jacob Sparre Andersen @ 2005-09-27  7:38 UTC (permalink / raw)


David Trudgett wrote:

> I've been puzzling for a little bit over a good way to filter out
> unwanted characters from a string. In particular, I have an
> unbounded string and want to filter out of it all characters not in
> 'a'..'z', 'A'..'Z', '0'..'9'. So far I've only thought of tedious
> ways to do it. Is there an easy way to do it using the string
> handling facilities in Ada? I think I almost got there with the idea
> of using Maps.Character_Set, and so on, but I haven't quite pieced
> it together yet.

I would probably simply iterate over the elements in the string and
copy those which a call to "function Is_In (Element : in Character;
Set : in Character_Set) return Boolean;" indicate to the target
string.

You could use "function Count (Source : in Unbounded_String; Set : in
Maps.Character_Set) return Natural;" to preallocate the target string,
if you're afraid appending to an unbounded string is too slow for your
purpose.

Greetings,

Jacob
-- 
Atheism is a non-prophet organisation.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27  6:27 String filtering David Trudgett
  2005-09-27  7:38 ` Jacob Sparre Andersen
@ 2005-09-27  7:41 ` tmoran
  2005-09-27  9:17   ` David Trudgett
  2005-09-28  1:54 ` Steve
  2 siblings, 1 reply; 71+ messages in thread
From: tmoran @ 2005-09-27  7:41 UTC (permalink / raw)


> I've been puzzling for a little bit over a good way to filter out
What do you mean by "filter out"?  Replace by blanks?  Shorten string?



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27  7:38 ` Jacob Sparre Andersen
@ 2005-09-27  9:13   ` David Trudgett
  2005-09-27  9:49     ` Dmitry A. Kazakov
  2005-09-27 17:47     ` Jeffrey R. Carter
  0 siblings, 2 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-27  9:13 UTC (permalink / raw)


Jacob Sparre Andersen <sparre@nbi.dk> writes:

> David Trudgett wrote:
>
>> I've been puzzling for a little bit over a good way to filter out
>> unwanted characters from a string. In particular, I have an
>> unbounded string and want to filter out of it all characters not in
>> 'a'..'z', 'A'..'Z', '0'..'9'. So far I've only thought of tedious
>> ways to do it. Is there an easy way to do it using the string
>> handling facilities in Ada? I think I almost got there with the idea
>> of using Maps.Character_Set, and so on, but I haven't quite pieced
>> it together yet.
>
> I would probably simply iterate over the elements in the string and
> copy those which a call to "function Is_In (Element : in Character;
> Set : in Character_Set) return Boolean;" indicate to the target
> string.
>
> You could use "function Count (Source : in Unbounded_String; Set : in
> Maps.Character_Set) return Natural;" to preallocate the target string,
> if you're afraid appending to an unbounded string is too slow for your
> purpose.

OK, thanks for those hints. I've come up with the following, which
seems to do the job:

    with Ada.Strings.Maps, Ada.Strings.Unbounded;
    use  Ada.Strings.Maps, Ada.Strings.Unbounded;

    Lower_Chars : constant Character_Range := ('a', 'z');
    Upper_Chars : constant Character_Range := ('A', 'Z');
    Numer_Chars : constant Character_Range := ('0', '9');
    Alphanumeric : constant Character_Ranges
      := (Lower_Chars, Upper_Chars, Numer_Chars);
    Alphanumeric_Set : constant Character_Set := To_Set(Alphanumeric);

    function Strip_Non_Alphanumeric
      (Str : in Unbounded_String) return Unbounded_String
    is
       New_Str : Unbounded_String
         := To_Unbounded_String(Count(Str, Alphanumeric_Set));
    begin
       New_Str := To_Unbounded_String("");
       for Char in 1 .. Length(Str) loop
          if Is_In(Element(Str, Char), Alphanumeric_Set) then
             Append(New_Str, Element(Str, Char));
          end if;
       end loop;
       return New_Str;
    end Strip_Non_Alphanumeric;


Is something like that what y'all do in situations like this?


Cheers,

David



-- 

David Trudgett
http://www.zeta.org.au/~wpower/

Every war, even the most humanely conducted, with all its ordinary
consequences, the destruction of harvests, robberies, the license and
debauchery, and the murder with the justifications of its necessity
and justice, the exaltation and glorification of military exploits,
the worship of the flag, the patriotic sentiments, the feigned
solicitude for the wounded, and so on, does more in one year to
pervert men's minds than thousands of robberies, murders, and arsons
perpetrated during hundreds of years by individual men under the
influence of passion.

    -- Leo Tolstoy, "The Kingdom of God is Within You"




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27  7:41 ` tmoran
@ 2005-09-27  9:17   ` David Trudgett
  0 siblings, 0 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-27  9:17 UTC (permalink / raw)


tmoran@acm.org writes:

>> I've been puzzling for a little bit over a good way to filter out
> What do you mean by "filter out"?  Replace by blanks?  Shorten string?

See my earlier reply. Answer: drop them, shorten string. Sorry if I
wasn't clear.

David

-- 

David Trudgett  ... who doesn't really hate parenthesiseses...
http://www.zeta.org.au/~wpower/

"Nasty, tricksy parenthesises. We hates them!"

    -- Sampo Smolander




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27  9:13   ` David Trudgett
@ 2005-09-27  9:49     ` Dmitry A. Kazakov
  2005-09-27 11:01       ` Martin Dowie
  2005-09-27 11:15       ` David Trudgett
  2005-09-27 17:47     ` Jeffrey R. Carter
  1 sibling, 2 replies; 71+ messages in thread
From: Dmitry A. Kazakov @ 2005-09-27  9:49 UTC (permalink / raw)


On Tue, 27 Sep 2005 19:13:17 +1000, David Trudgett wrote:

>     with Ada.Strings.Maps, Ada.Strings.Unbounded;
>     use  Ada.Strings.Maps, Ada.Strings.Unbounded;
> 
>     Lower_Chars : constant Character_Range := ('a', 'z');
>     Upper_Chars : constant Character_Range := ('A', 'Z');
>     Numer_Chars : constant Character_Range := ('0', '9');
>     Alphanumeric : constant Character_Ranges
>       := (Lower_Chars, Upper_Chars, Numer_Chars);
>     Alphanumeric_Set : constant Character_Set := To_Set(Alphanumeric);

with Strings.Maps.Constants;
use  Strings.Maps.Constants;

-- use defined there Alphanumeric_Set

>     function Strip_Non_Alphanumeric
>       (Str : in Unbounded_String) return Unbounded_String
>     is
>        New_Str : Unbounded_String
>          := To_Unbounded_String(Count(Str, Alphanumeric_Set));

If you do this, then use String (1..Count (...));

>     begin
>        New_Str := To_Unbounded_String("");

No need for that, it is initially an empty string.

>        for Char in 1 .. Length(Str) loop
>           if Is_In(Element(Str, Char), Alphanumeric_Set) then
>              Append(New_Str, Element(Str, Char));
>           end if;
>        end loop;
>        return New_Str;
>     end Strip_Non_Alphanumeric;
> 
> Is something like that what y'all do in situations like this?

I don't.

Firstly it is not clear why characters need to be filtered out. Or better
to say, how did it happen, that you get garbage in a string? Either, you
need a character *stream* filtering, long before you get a string token out
of it, or, more realistically an error message (exception), should have
happened, for example if you take some text from a GUI widget.

Secondly, unbounded strings are rarely needed. Especially in text parsing
etc. It is quite uncommon to change a string content there. In your example
you don't do it either. You create a new string. Also both the source and
the result strings have *known* length. So you don't need unbounded strings
here. Usually, after making some trivial analysis like that you'll find out
that only 2% or so really need to be unbounded.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27  9:49     ` Dmitry A. Kazakov
@ 2005-09-27 11:01       ` Martin Dowie
  2005-09-27 11:12         ` Martin Dowie
  2005-09-27 11:22         ` David Trudgett
  2005-09-27 11:15       ` David Trudgett
  1 sibling, 2 replies; 71+ messages in thread
From: Martin Dowie @ 2005-09-27 11:01 UTC (permalink / raw)


Dmitry A. Kazakov wrote:
[snip]
> with Strings.Maps.Constants;
> use  Strings.Maps.Constants;
>
> -- use defined there Alphanumeric_Set

Nope - that's not what the OP has defined - he wants the ASCII subset of
Alphanumeric_Set (no accents, etc).



[snip]
>>        New_Str := To_Unbounded_String("");
>
> No need for that, it is initially an empty string.

Nope - it's an uninitialized string. But I agree that it you've already
determined the size then there is no need for this as each character in
New_Str will be certain to be filled.

Cheers

-- Martin





^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 11:01       ` Martin Dowie
@ 2005-09-27 11:12         ` Martin Dowie
  2005-09-27 12:54           ` Dmitry A. Kazakov
  2005-09-27 11:22         ` David Trudgett
  1 sibling, 1 reply; 71+ messages in thread
From: Martin Dowie @ 2005-09-27 11:12 UTC (permalink / raw)


Martin Dowie wrote:
> Dmitry A. Kazakov wrote:
> [snip]
>>>        New_Str := To_Unbounded_String("");
>>
>> No need for that, it is initially an empty string.
>
> Nope - it's an uninitialized string. But I agree that it you've
> already determined the size then there is no need for this as each
> character in New_Str will be certain to be filled.

Actually, the OP has it right, because it's uninitialised it is filled with
rubbish and of the size requested and the OP is going to /append/ items to
it, so it needs to be empty before the loop starts.

Cheers

-- Martin





^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27  9:49     ` Dmitry A. Kazakov
  2005-09-27 11:01       ` Martin Dowie
@ 2005-09-27 11:15       ` David Trudgett
  2005-09-27 13:21         ` Dmitry A. Kazakov
                           ` (4 more replies)
  1 sibling, 5 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-27 11:15 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> On Tue, 27 Sep 2005 19:13:17 +1000, David Trudgett wrote:
>
>>     with Ada.Strings.Maps, Ada.Strings.Unbounded;
>>     use  Ada.Strings.Maps, Ada.Strings.Unbounded;
>> 
>>     Lower_Chars : constant Character_Range := ('a', 'z');
>>     Upper_Chars : constant Character_Range := ('A', 'Z');
>>     Numer_Chars : constant Character_Range := ('0', '9');
>>     Alphanumeric : constant Character_Ranges
>>       := (Lower_Chars, Upper_Chars, Numer_Chars);
>>     Alphanumeric_Set : constant Character_Set := To_Set(Alphanumeric);
>
> with Strings.Maps.Constants;
> use  Strings.Maps.Constants;
>
> -- use defined there Alphanumeric_Set

OK, now I have:

   Alpha_Num_Space_Set : constant Character_Set
     := Alphanumeric_Set or To_Set(' ');

since I realised I also need space.


>
>>     function Strip_Non_Alphanumeric
>>       (Str : in Unbounded_String) return Unbounded_String
>>     is
>>        New_Str : Unbounded_String
>>          := To_Unbounded_String(Count(Str, Alphanumeric_Set));
>
> If you do this, then use String (1..Count (...));

If I did that then I would need to convert back to unbounded_string
when I return the function result. Would that be significantly faster
than working on a pre-allocated unbounded string?


>
>>     begin
>>        New_Str := To_Unbounded_String("");
>
> No need for that, it is initially an empty string.

I at first thought so myself, until I discovered that New_Str was
uninitialised, as it says in the ARM. Hence, I added that line.


>
>>        for Char in 1 .. Length(Str) loop
>>           if Is_In(Element(Str, Char), Alphanumeric_Set) then
>>              Append(New_Str, Element(Str, Char));
>>           end if;
>>        end loop;
>>        return New_Str;
>>     end Strip_Non_Alphanumeric;
>> 
>> Is something like that what y'all do in situations like this?
>
> I don't.
>

> Firstly it is not clear why characters need to be filtered out. Or
> better to say, how did it happen, that you get garbage in a string?

I am sanitising data received over a socket, which may be of any
length. Hence my use of unbounded_string, and my desire to strip out
non-alphanumeric characters.


> Either, you need a character *stream* filtering, 

Possibly, but I'm not using a socket stream interface at the current
time. The socket library I'm using right now doesn't do streams.


> long before you get a string token out of it, or, more realistically
> an error message (exception), should have happened, for example if
> you take some text from a GUI widget.
>
> Secondly, unbounded strings are rarely needed. 

For some definition of 'rarely', I suppose. :-) I'm sure some people
must use them all the time, so it wouldn't be rare for them.

Ada does make it a pain to use unbounded_strings, so it can seem like
a virtue to avoid them, but other languages use them by default, with
no ill-effects to show for it ;-)

Still, in Ada, I do try to use plain fixed strings where they are
sufficient for the purpose.


> Especially in text parsing etc. It is quite uncommon to change a
> string content there. In your example you don't do it either. You
> create a new string. 

Yes, well, functions work that way in Ada (fortunately, or
unfortunately, I don't know). I could have made it a procedure with an
"in out" parameter, but I like functional programming better.
Unfortunately, I haven't been able to do proper functional style
programming in Ada so far, having been thwarted by strong typing and
lack of "out" parameters in functions.


> Also both the source and the result strings have *known* length.

Known but variable, with no particular bounds.


> So you don't need unbounded strings here. Usually, after making some
> trivial analysis like that you'll find out that only 2% or so really
> need to be unbounded.

It seems to me that to use fixed strings here, I would have to convert
the source to a fixed string, do my working on fixed string, then
convert the result to an unbounded string. It sounds like unnecessary
work to me... ;-)  

Thanks for your tips, though, Dmitry, and I'll definitely keep an eye
out for abuse of unbounded_strings.

Cheers,

David



-- 

David Trudgett
http://www.zeta.org.au/~wpower/

Equally, our immoral person must get away with any crimes he
undertakes in the proper fashion, if he is to be outstandingly
immoral; getting caught must be taken to be a sign of incompetence,
since the acme of immorality is to give an impression of morality
while actually being immoral. So we must attribute consummate
immorality to our consummate criminal, and if we are to leave it
intact, we should have him equipped with a colossal reputation for
morality even though he is a colossal criminal. He should be capable
of correcting any mistakes he makes. He must have the ability to argue
plausibly, in case any of his crimes are ever found out, and to use
force wherever necessary, by making use of his courage and strength and
by drawing on his fund of friends and his financial resources.

  -- Plato, in "Republic", 361a-361b, the words of Glaucon.
  



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 11:01       ` Martin Dowie
  2005-09-27 11:12         ` Martin Dowie
@ 2005-09-27 11:22         ` David Trudgett
  1 sibling, 0 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-27 11:22 UTC (permalink / raw)


"Martin Dowie" <martin.dowie@baesystems.com> writes:

> Dmitry A. Kazakov wrote:
> [snip]
>> with Strings.Maps.Constants;
>> use  Strings.Maps.Constants;
>>
>> -- use defined there Alphanumeric_Set
>
> Nope - that's not what the OP has defined - he wants the ASCII subset of
> Alphanumeric_Set (no accents, etc).

You're right! I couldn't find what Ada's definition of "alphanumeric"
actually was, so I hoped it was the same as mine! Looks like it's back
to the original solution.

>
>
>
> [snip]
>>>        New_Str := To_Unbounded_String("");
>>
>> No need for that, it is initially an empty string.
>
> Nope - it's an uninitialized string. But I agree that it you've already
> determined the size then there is no need for this as each character in
> New_Str will be certain to be filled.

Yeah, except I couldn't use "append" then. What would I use instead?

Thanks

David


-- 

David Trudgett
http://www.zeta.org.au/~wpower/

Capitalism is about sharing things in the same way that
Tug-Of-War is about sharing the rope.




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 11:12         ` Martin Dowie
@ 2005-09-27 12:54           ` Dmitry A. Kazakov
  2005-09-27 13:42             ` Martin Dowie
  0 siblings, 1 reply; 71+ messages in thread
From: Dmitry A. Kazakov @ 2005-09-27 12:54 UTC (permalink / raw)


On Tue, 27 Sep 2005 12:12:32 +0100, Martin Dowie wrote:

> Martin Dowie wrote:
>> Dmitry A. Kazakov wrote:
>> [snip]
>>>>        New_Str := To_Unbounded_String("");
>>>
>>> No need for that, it is initially an empty string.
>>
>> Nope - it's an uninitialized string. But I agree that it you've
>> already determined the size then there is no need for this as each
>> character in New_Str will be certain to be filled.
> 
> Actually, the OP has it right, because it's uninitialised it is filled with
> rubbish and of the size requested and the OP is going to /append/ items to
> it, so it needs to be empty before the loop starts.

No. ARM 95 A.4.5(49) reads:

"...If an object of type Unbounded_String is not otherwise initialized, it
will be initialized to the same value as Null_Unbounded_String."

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 11:15       ` David Trudgett
@ 2005-09-27 13:21         ` Dmitry A. Kazakov
  2005-09-27 13:43           ` Martin Dowie
                             ` (2 more replies)
  2005-09-27 13:52         ` Jacob Sparre Andersen
                           ` (3 subsequent siblings)
  4 siblings, 3 replies; 71+ messages in thread
From: Dmitry A. Kazakov @ 2005-09-27 13:21 UTC (permalink / raw)


On Tue, 27 Sep 2005 21:15:15 +1000, David Trudgett wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> 
> OK, now I have:
> 
>    Alpha_Num_Space_Set : constant Character_Set
>      := Alphanumeric_Set or To_Set(' ');
> 
> since I realised I also need space.

And HT? And VT? LF, CR,... (:-))

>>>     function Strip_Non_Alphanumeric
>>>       (Str : in Unbounded_String) return Unbounded_String
>>>     is
>>>        New_Str : Unbounded_String
>>>          := To_Unbounded_String(Count(Str, Alphanumeric_Set));
>>
>> If you do this, then use String (1..Count (...));
> 
> If I did that then I would need to convert back to unbounded_string
> when I return the function result. Would that be significantly faster
> than working on a pre-allocated unbounded string?

It cannot be slower, because you already have one To_Unbounded_String to
initialize it. 

>>
>>>     begin
>>>        New_Str := To_Unbounded_String("");
>>
>> No need for that, it is initially an empty string.
> 
> I at first thought so myself, until I discovered that New_Str was
> uninitialised, as it says in the ARM. Hence, I added that line.

It is, with Null_Unbounded_String.

>> Firstly it is not clear why characters need to be filtered out. Or
>> better to say, how did it happen, that you get garbage in a string?
> 
> I am sanitising data received over a socket, which may be of any
> length. Hence my use of unbounded_string, and my desire to strip out
> non-alphanumeric characters.

But sockets normally work either as a stream or with a Storage_Element's
array. Thus you don't have Unbounded_String, you make it later. Do a String
instead.

>> Either, you need a character *stream* filtering, 
> 
> Possibly, but I'm not using a socket stream interface at the current
> time. The socket library I'm using right now doesn't do streams.

Anyway, you have some protocol, and non-alpha characters seem to violate
it. So, what your filter does, is inventing some meaning out of meaningless
rubbish. Usually it is rather a bad idea, see PL/1 and HTML. Errors should
be reported as early as possible.

>> long before you get a string token out of it, or, more realistically
>> an error message (exception), should have happened, for example if
>> you take some text from a GUI widget.
>>
>> Secondly, unbounded strings are rarely needed. 
> 
> For some definition of 'rarely', I suppose. :-) I'm sure some people
> must use them all the time, so it wouldn't be rare for them.

You must use it only under certain conditions. Which are: mutability and
"sufficiently" unknown in advance length. In your case they aren't
satisfied, so you don't have to, if you don't want to... (:-))

> Ada does make it a pain to use unbounded_strings, so it can seem like
> a virtue to avoid them, but other languages use them by default, with
> no ill-effects to show for it ;-)

It is a different story. Unbounded_String is a nasty kludge. But that does
not mean that if they were designed properly, they would be more needed!
(:-))

>> Especially in text parsing etc. It is quite uncommon to change a
>> string content there. In your example you don't do it either. You
>> create a new string. 
> 
> Yes, well, functions work that way in Ada (fortunately, or
> unfortunately, I don't know). I could have made it a procedure with an
> "in out" parameter, but I like functional programming better.
> Unfortunately, I haven't been able to do proper functional style
> programming in Ada so far, having been thwarted by strong typing and
> lack of "out" parameters in functions.

Well, out parameters in functions are much desired by almost anybody,
except the ARG members. (:-)) But that won't help. Try access parameters
instead and you will see. The problem is that an out parameter cannot
"return" constraints as the proper result can. So functional style is only
possible through the result. And you perfectly can create a local string of
needed length and return it as the result.

>> Also both the source and the result strings have *known* length.
> 
> Known but variable, with no particular bounds.

That's no matter. All string operations can be implemented this way:

function Op (...) return String is
   Result_Length : Natural;
begin
   -- evaluate Result_Length
   declare
      Result : String (1..Result_Length);
   begin
      -- Fill Result
      return Result;
   end;
end Op;

Such operations can always be used as:

declare
   X : String renames Op (...);
begin
   -- Using X
end;

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 12:54           ` Dmitry A. Kazakov
@ 2005-09-27 13:42             ` Martin Dowie
  2005-09-27 14:24               ` Dmitry A. Kazakov
  0 siblings, 1 reply; 71+ messages in thread
From: Martin Dowie @ 2005-09-27 13:42 UTC (permalink / raw)


Dmitry A. Kazakov wrote:
> No. ARM 95 A.4.5(49) reads:
>
> "...If an object of type Unbounded_String is not otherwise
> initialized, it will be initialized to the same value as
> Null_Unbounded_String."

But it was "otherwise initialized":

         := To_Unbounded_String(Count(Str, Alphanumeric_Set));

2nd sentence of ARM 95 A.4.5 (76) reads:

  "The function To_Unbounded_String(Length : in Natural)
    returns an Unbounded_String that represents an uninitialized
    String whose length is Length."

Cheers

-- Martin





^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 13:21         ` Dmitry A. Kazakov
@ 2005-09-27 13:43           ` Martin Dowie
  2005-09-28  0:51           ` David Trudgett
  2005-09-29 22:42           ` Randy Brukardt
  2 siblings, 0 replies; 71+ messages in thread
From: Martin Dowie @ 2005-09-27 13:43 UTC (permalink / raw)


Dmitry A. Kazakov wrote:
>>>>     begin
>>>>        New_Str := To_Unbounded_String("");
>>>
>>> No need for that, it is initially an empty string.
>>
>> I at first thought so myself, until I discovered that New_Str was
>> uninitialised, as it says in the ARM. Hence, I added that line.
>
> It is, with Null_Unbounded_String.

Where does it say that in the RM?!

Cheers

-- Martin





^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 11:15       ` David Trudgett
  2005-09-27 13:21         ` Dmitry A. Kazakov
@ 2005-09-27 13:52         ` Jacob Sparre Andersen
  2005-09-28  1:01           ` David Trudgett
  2005-09-27 14:08         ` Georg Bauhaus
                           ` (2 subsequent siblings)
  4 siblings, 1 reply; 71+ messages in thread
From: Jacob Sparre Andersen @ 2005-09-27 13:52 UTC (permalink / raw)


David Trudgett wrote:
> Dmitry A. Kazakov wrote:
>> On Tue, 27 Sep 2005 19:13:17 +1000, David Trudgett wrote:

>>>        New_Str : Unbounded_String
>>>          := To_Unbounded_String(Count(Str, Alphanumeric_Set));
>>
>> If you do this, then use String (1..Count (...));
>
> If I did that then I would need to convert back to unbounded_string
> when I return the function result. Would that be significantly
> faster than working on a pre-allocated unbounded string?

I think it would, but it clearly depends on the implementation of
Ada.Strings.Unbounded.

>> Either, you need a character *stream* filtering, 
>
> Possibly, but I'm not using a socket stream interface at the current
> time. The socket library I'm using right now doesn't do streams.

It definitely looks like an ideal example for the use of streams.  Is
there something that makes it a bad idea to switch to a stream-capable
socket package?

>> Especially in text parsing etc. It is quite uncommon to change a
>> string content there.

I do it quite often, but I've never pretended to be common.

>> In your example you don't do it either. You create a new string.
>
> Yes, well, functions work that way in Ada (fortunately, or
> unfortunately, I don't know). I could have made it a procedure with
> an "in out" parameter, but I like functional programming better.
> Unfortunately, I haven't been able to do proper functional style
> programming in Ada so far, having been thwarted by strong typing and
> lack of "out" parameters in functions.

You should use the programming styles supported by Ada, when you're
programming in Ada.  Otherwise it may be very frustrating to program
in Ada.  For functional programming you should use SML, Erlang, OCaml
or another "proper" functional programming language.

>> Also both the source and the result strings have *known* length.
>
> Known but variable, with no particular bounds.

That's not enough to force you to use unbounded strings.  If you were
doing in-place modifications of the string, you could argue for
unbounded strings, but what you've shown us so far does not support
the use of unbounded strings for your problem.

Greetings,

Jacob
-- 
xsnow | xshovel > /dev/null



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 11:15       ` David Trudgett
  2005-09-27 13:21         ` Dmitry A. Kazakov
  2005-09-27 13:52         ` Jacob Sparre Andersen
@ 2005-09-27 14:08         ` Georg Bauhaus
  2005-09-27 14:09         ` Marc A. Criley
  2005-09-27 17:59         ` tmoran
  4 siblings, 0 replies; 71+ messages in thread
From: Georg Bauhaus @ 2005-09-27 14:08 UTC (permalink / raw)


David Trudgett wrote:

> Yes, well, functions work that way in Ada (fortunately, or
> unfortunately, I don't know). I could have made it a procedure with an
> "in out" parameter, but I like functional programming better.
> Unfortunately, I haven't been able to do proper functional style
> programming in Ada so far, having been thwarted by strong typing and
> lack of "out" parameters in functions.


Here's a filter then.

with Ada.Containers.Vectors;

package Character_Vectors is
   new Ada.Containers.Vectors(Element_Type => Character,
                              Index_Type => Positive);

with Character_Vectors;
with Ada.Strings.Maps.Constants;

procedure filter is

   use Character_Vectors;

   input: Vector;
   output: Vector;


   procedure save_good_ones(c: Cursor) is
      use Ada.Strings.Maps;

      Alpha_Num_Space_Set : constant Character_Set
        := Constants.Alphanumeric_Set or To_Set(' ');
   begin
      if Is_In(Element(c), Alpha_Num_Space_Set) then
         append(output, Element(c));
      end if;
   end save_good_ones;

begin
   Iterate(input, save_good_ones'access);
end filter;



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 11:15       ` David Trudgett
                           ` (2 preceding siblings ...)
  2005-09-27 14:08         ` Georg Bauhaus
@ 2005-09-27 14:09         ` Marc A. Criley
  2005-09-28  1:09           ` David Trudgett
  2005-09-28 21:09           ` Simon Wright
  2005-09-27 17:59         ` tmoran
  4 siblings, 2 replies; 71+ messages in thread
From: Marc A. Criley @ 2005-09-27 14:09 UTC (permalink / raw)


David Trudgett wrote:
> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>>Either, you need a character *stream* filtering, 
> 
> Possibly, but I'm not using a socket stream interface at the current
> time. The socket library I'm using right now doesn't do streams.

Just as an FYI, I wrote an article a few years ago on how to put an Ada 
stream interface onto a socket.  It's at 
http://portal.acm.org/ft_gateway.cfm?id=568950&type=pdf.  And I believe 
Samuel Tardieu enhanced his AdaSockets implementation 
(http://www.rfc1149.net/devel/adasockets) around that time to do the 
same thing.

-- Marc A. Criley
-- McKae Technologies
-- www.mckae.com
-- DTraq - XPath In Ada - XML EZ Out



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 13:42             ` Martin Dowie
@ 2005-09-27 14:24               ` Dmitry A. Kazakov
  2005-09-28  0:06                 ` David Trudgett
  0 siblings, 1 reply; 71+ messages in thread
From: Dmitry A. Kazakov @ 2005-09-27 14:24 UTC (permalink / raw)


On Tue, 27 Sep 2005 14:42:02 +0100, Martin Dowie wrote:

> Dmitry A. Kazakov wrote:
>> No. ARM 95 A.4.5(49) reads:
>>
>> "...If an object of type Unbounded_String is not otherwise
>> initialized, it will be initialized to the same value as
>> Null_Unbounded_String."
> 
> But it was "otherwise initialized":
> 
>          := To_Unbounded_String(Count(Str, Alphanumeric_Set));
> 
> 2nd sentence of ARM 95 A.4.5 (76) reads:
> 
>   "The function To_Unbounded_String(Length : in Natural)
>     returns an Unbounded_String that represents an uninitialized
>     String whose length is Length."

Ah, now I see what you meant!

Right, though my point was that one should use either

   New_Str : String (1...Count (...));

or

   New_Str : Unbounded_String; -- "Uninitialized"

So my comment about empty string concerned the latter case.

If To_Unbounded_String (Count) is ever used then of course Replace_Element
should be in place of assignment + Append. Because, assignment might
reclaim the memory allocated by To_Unbounded_String (Count).

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27  9:13   ` David Trudgett
  2005-09-27  9:49     ` Dmitry A. Kazakov
@ 2005-09-27 17:47     ` Jeffrey R. Carter
  2005-09-28  1:29       ` David Trudgett
  1 sibling, 1 reply; 71+ messages in thread
From: Jeffrey R. Carter @ 2005-09-27 17:47 UTC (permalink / raw)


David Trudgett wrote:

It seems odd to do this:

>        New_Str : Unbounded_String
>          := To_Unbounded_String(Count(Str, Alphanumeric_Set));

if you're also going to do this:

>        New_Str := To_Unbounded_String("");

>              Append(New_Str, Element(Str, Char));

If you allocate the expected length (and not overwrite that with a null string), 
then you can use Replace_Element.

Is there some reason you're not doing this in place, using Delete? Is there some 
reason you're not doing this when you create the original string?

-- 
Jeff Carter
"You tiny-brained wipers of other people's bottoms!"
Monty Python & the Holy Grail
18



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 11:15       ` David Trudgett
                           ` (3 preceding siblings ...)
  2005-09-27 14:09         ` Marc A. Criley
@ 2005-09-27 17:59         ` tmoran
  2005-09-28  1:20           ` David Trudgett
  4 siblings, 1 reply; 71+ messages in thread
From: tmoran @ 2005-09-27 17:59 UTC (permalink / raw)


>I am sanitising data received over a socket, which may be of any
>length. Hence my use of unbounded_string, ...
   Yes indeed.  For instance some bad program might send you an unlimited
series of characters and your program would work, then crawl as it
thrashed in virtual memory, then eventually crash as everything was full.
   Surely there's some number above which "this can't be right!"
and that can be your Fixed or Bounded string size.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 14:24               ` Dmitry A. Kazakov
@ 2005-09-28  0:06                 ` David Trudgett
  2005-09-28  8:15                   ` Dmitry A. Kazakov
                                     ` (3 more replies)
  0 siblings, 4 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-28  0:06 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> On Tue, 27 Sep 2005 14:42:02 +0100, Martin Dowie wrote:
>> 2nd sentence of ARM 95 A.4.5 (76) reads:
>> 
>>   "The function To_Unbounded_String(Length : in Natural)
>>     returns an Unbounded_String that represents an uninitialized
>>     String whose length is Length."
>
> Ah, now I see what you meant!

Yep, that's what I meant, too.


>
> Right, though my point was that one should use either
>
>    New_Str : String (1...Count (...));
>
> or
>
>    New_Str : Unbounded_String; -- "Uninitialized"


I've made some revisions based on various comments, and this is what I
have at the moment (incorporating both a string and unbounded_string
version):

    Space_Char  : constant Character_Range := (' ', ' ');
    Lower_Chars : constant Character_Range := ('a', 'z');
    Upper_Chars : constant Character_Range := ('A', 'Z');
    Numer_Chars : constant Character_Range := ('0', '9');
    Alpha_Num_Space : constant Character_Ranges
      := (Space_Char, Lower_Chars, Upper_Chars, Numer_Chars);
    Alpha_Num_Space_Set : constant Character_Set
      := To_Set(Alpha_Num_Space);


   function Strip_Non_Alphanumeric
     (Str : in Unbounded_String) return Unbounded_String
   is
      Dest_Size : Natural := Count(Str, Alpha_Num_Space_Set);
      New_Str : Unbounded_String := Null_Unbounded_String;
      Dest_Char : Natural := 0;
   begin
      if Dest_Size > 0 then
         New_Str := To_Unbounded_String(Dest_Size);
         for Src_Char in 1 .. Length(Str) loop
            if Is_In(Element(Str, Src_Char), Alpha_Num_Space_Set) then
               Dest_Char := Dest_Char + 1;
               Replace_Element
                 (New_Str, Dest_Char, Element(Str, Src_Char));
            end if;
         end loop;
      end if;
      return New_Str;
   end Strip_Non_Alphanumeric;


   function Strip_Non_Alphanumeric
     (Str : in String) return String
   is
      New_Str : String(1 .. Count(Str, Alpha_Num_Space_Set));
      Dest_Char : Natural := 0;
   begin
      if New_Str'Last > 0 then
         for Src_Char in Str'Range loop
            if Is_In(Str(Src_Char), Alpha_Num_Space_Set) then
               Dest_Char := Dest_Char + 1;
               New_Str(Dest_Char) := Str(Src_Char);
            end if;
         end loop;
      else
         New_Str := "";
      end if;
      return New_Str;
   end Strip_Non_Alphanumeric;



In the unbounded version, I decided to use replace_element instead of
append (with its assignment to "", which might perhaps unallocate
memory, depending on implementation??, thus potentially undoing the
purpose of the preallocation). This version might gain a bit in
efficiency, but code-wise, it is a bit more complex.

The string version also seems to work under testing. 

Profiling would show no difference in performance between the two for
my current purposes, but in a different situation, involving large
amounts of data, for instance, the fixed string version would no doubt
out-perform speed-wise. Space-wise, the unbounded strings would
probably win out in many situations.



>
> So my comment about empty string concerned the latter case.
>
> If To_Unbounded_String (Count) is ever used then of course
> Replace_Element should be in place of assignment + Append. 

This is what I have done. Have I mucked up anything else while doing so?


> Because, assignment might reclaim the memory allocated by
> To_Unbounded_String (Count).

I assume this is left up to the compiler implementation?


David


-- 

David Trudgett
http://www.zeta.org.au/~wpower/

We come here upon what, in a large proportion of cases, forms the
source of the grossest errors of mankind. Men on a lower level of
understanding, when brought into contact with phenomena of a higher
order, instead of making efforts to understand them, to raise
themselves up to the point of view from which they must look at the
subject, judge it from their lower standpoint, and the less they
understand what they are talking about, the more confidently and
unhesitatingly they pass judgment on it.

    -- Leo Tolstoy, "The Kingdom of God is Within You"




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 13:21         ` Dmitry A. Kazakov
  2005-09-27 13:43           ` Martin Dowie
@ 2005-09-28  0:51           ` David Trudgett
  2005-09-28 12:02             ` Dmitry A. Kazakov
  2005-09-28 13:25             ` Marc A. Criley
  2005-09-29 22:42           ` Randy Brukardt
  2 siblings, 2 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-28  0:51 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> On Tue, 27 Sep 2005 21:15:15 +1000, David Trudgett wrote:
>
>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> 
>> OK, now I have:
>> 
>>    Alpha_Num_Space_Set : constant Character_Set
>>      := Alphanumeric_Set or To_Set(' ');
>> 
>> since I realised I also need space.
>
> And HT? And VT? LF, CR,... (:-))

;-) Uh, nope.


>>>
>>>>     begin
>>>>        New_Str := To_Unbounded_String("");
>>>
>>> No need for that, it is initially an empty string.
>> 
>> I at first thought so myself, until I discovered that New_Str was
>> uninitialised, as it says in the ARM. Hence, I added that line.
>
> It is, with Null_Unbounded_String.

Why would I say, "it says in the ARM" if it doesn't? ;-) But someone
else has already pointed out the bit to you. (I also said that I
*discovered* that it was uninitialised. You may, however, have
interpreted that to mean that I discovered it in the ARM, which was
not the case.)


>
>>> Firstly it is not clear why characters need to be filtered out. Or
>>> better to say, how did it happen, that you get garbage in a string?
>> 
>> I am sanitising data received over a socket, which may be of any
>> length. Hence my use of unbounded_string, and my desire to strip out
>> non-alphanumeric characters.
>
> But sockets normally work either as a stream or with a
> Storage_Element's array. Thus you don't have Unbounded_String, you
> make it later. 

I get it as a string from the adasockets (0.1.6) library Get_Line
function. Perhaps this is an old version, I don't know. I noticed a
couple of people in this group have contributed to it (such as Pascal
Obry), so they might be able to say.

However, you are right that I *could* use string instead, by changing
the way I do things. But I'm happy with unbounded strings for the time
being and for my present purposes. (I'm just learning Ada, you know! :-))



> Do a String instead.
>
>>> Either, you need a character *stream* filtering, 
>> 
>> Possibly, but I'm not using a socket stream interface at the current
>> time. The socket library I'm using right now doesn't do streams.
>
> Anyway, you have some protocol, and non-alpha characters seem to
> violate it. So, what your filter does, is inventing some meaning out
> of meaningless rubbish. Usually it is rather a bad idea, see PL/1
> and HTML. Errors should be reported as early as possible.

Your points are theoretically valid, but... :-) I'm working on a toy
program, Dmitry, and being forgiving with protocol is one thing that I
do in toy programs. :-) If the remote player (it's a battleship game,
oh, the violence, the violence ;-)) wants to send ANSI escape
sequences in his name, well he can do so, but they're not going to
mess up my ANSI terminal screen! ;-) I could, of course, do better
than just screening out non-alphanumeric characters, but I'm lazy :-)
and... it's a toy program (did I mention that :-)).

I've also designed the whole thing in easily maintained modules, so
added robustness is a simple add-on in future.


>
>>> long before you get a string token out of it, or, more realistically
>>> an error message (exception), should have happened, for example if
>>> you take some text from a GUI widget.

A GUI is a future enhancement! :-) (But one that will be easy to add,
by design.)



>>>
>>> Secondly, unbounded strings are rarely needed. 
>> 
>> For some definition of 'rarely', I suppose. :-) I'm sure some people
>> must use them all the time, so it wouldn't be rare for them.
>
> You must use it only under certain conditions. Which are: mutability and
> "sufficiently" unknown in advance length. In your case they aren't
> satisfied, so you don't have to, if you don't want to... (:-))

But I want to, Dmitry, I want to! :-) 


>
>> Ada does make it a pain to use unbounded_strings, so it can seem like
>> a virtue to avoid them, but other languages use them by default, with
>> no ill-effects to show for it ;-)
>
> It is a different story. Unbounded_String is a nasty kludge. But
> that does not mean that if they were designed properly, they would
> be more needed!  (:-))

I'm almost afraid to ask... :-) What is it about Unbounded_String that
makes it a kludge, in your opinion? Is there something unecessarily
inefficient (space/time) about the way they are specified in the Ada95
standard? Or is it that existing implementations of it are a kludge?



>> Yes, well, functions work that way in Ada (fortunately, or
>> unfortunately, I don't know). I could have made it a procedure with an
>> "in out" parameter, but I like functional programming better.
>> Unfortunately, I haven't been able to do proper functional style
>> programming in Ada so far, having been thwarted by strong typing and
>> lack of "out" parameters in functions.
>
> Well, out parameters in functions are much desired by almost
> anybody, except the ARG members. (:-)) But that won't help. Try
> access parameters instead and you will see. 

Alright... that sounds like an invitation to dance in a mine field. ;-)


> The problem is that an out parameter cannot "return" constraints as
> the proper result can. 

[I suppose I can guess what you mean by this.] It's true that out
parameters don't cure all ills.


> function Op (...) return String is
>    Result_Length : Natural;
> begin
>    -- evaluate Result_Length
>    declare
>       Result : String (1..Result_Length);
>    begin
>       -- Fill Result
>       return Result;
>    end;
> end Op;
>
> Such operations can always be used as:
>
> declare
>    X : String renames Op (...);
> begin
>    -- Using X
> end;

That's handy, isn't it? Very nice. I must remember to experiment with
it.

David




-- 

David Trudgett
http://www.zeta.org.au/~wpower/

On another level there is a principle laid down, much in line with
common sense and with the original American ideal, that governments
should never do what small bodies can accomplish: unions, credit
unions, cooperatives, St. Vincent de Paul Societies. Peter Maurin's
anarchism was on one level based on this principle of subsidiarity,
and on a higher level on that scene at the Last Supper where Christ
washed the feet of His Apostles. He came to serve, to show the new
Way, the way of the powerless. In the face of Empire, the Way of
Love.

    -- Dorothy Day, The Catholic Worker, May 1972. 
       
       (Dorothy Day Library on the Web at
        http://www.catholicworker.org/dorothyday/)



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 13:52         ` Jacob Sparre Andersen
@ 2005-09-28  1:01           ` David Trudgett
  2005-09-28  1:50             ` David Trudgett
  0 siblings, 1 reply; 71+ messages in thread
From: David Trudgett @ 2005-09-28  1:01 UTC (permalink / raw)


Jacob Sparre Andersen <sparre@nbi.dk> writes:

> David Trudgett wrote:
>> Dmitry A. Kazakov wrote:
>>> On Tue, 27 Sep 2005 19:13:17 +1000, David Trudgett wrote:
>
>>>>        New_Str : Unbounded_String
>>>>          := To_Unbounded_String(Count(Str, Alphanumeric_Set));
>>>
>>> If you do this, then use String (1..Count (...));
>>
>> If I did that then I would need to convert back to unbounded_string
>> when I return the function result. Would that be significantly
>> faster than working on a pre-allocated unbounded string?
>
> I think it would, but it clearly depends on the implementation of
> Ada.Strings.Unbounded.

You may be right about that, but there is probably irreducible
complexity in unbounded strings. For my particular current purposes,
there is nothing between them performance-wise.


>
>>> Either, you need a character *stream* filtering, 
>>
>> Possibly, but I'm not using a socket stream interface at the current
>> time. The socket library I'm using right now doesn't do streams.
>
> It definitely looks like an ideal example for the use of streams.  Is
> there something that makes it a bad idea to switch to a stream-capable
> socket package?

Only the fact that I'm just toying with a game program, and the socket
library already works (adasockets 0.1.6 IIRC), and the comms layer I
wrote works on top of it. Changing it in future won't be any big deal,
however, since it's abstracted away behind my comms layer.


> You should use the programming styles supported by Ada, when you're
> programming in Ada.  

Yes, indeed.


> Otherwise it may be very frustrating to program in Ada.  For
> functional programming you should use SML, Erlang, OCaml or another
> "proper" functional programming language.

Well, I don't do pure "functional" as some people define it. I only
tend in that direction where I find it helpful. Ada is very much
procedural, however.

David


-- 

David Trudgett
http://www.zeta.org.au/~wpower/

No one is given a map to their dreams
All we can do is to trace it.
See where we go to, know where we've been
Build up the courage to face it.

    -- Sandy Denny
    



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 14:09         ` Marc A. Criley
@ 2005-09-28  1:09           ` David Trudgett
  2005-09-28 21:09           ` Simon Wright
  1 sibling, 0 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-28  1:09 UTC (permalink / raw)


"Marc A. Criley" <mcNOSPAM@mckae.com> writes:

> David Trudgett wrote:
>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>>> Either, you need a character *stream* filtering,
>> Possibly, but I'm not using a socket stream interface at the current
>> time. The socket library I'm using right now doesn't do streams.
>
> Just as an FYI, I wrote an article a few years ago on how to put an
> Ada stream interface onto a socket.  It's at
> http://portal.acm.org/ft_gateway.cfm?id=568950&type=pdf.  

Seems to require a user name and password, unfortunately.


> And I believe Samuel Tardieu enhanced his AdaSockets implementation
> (http://www.rfc1149.net/devel/adasockets) around that time to do the
> same thing.

Thanks. I'm actually using that one. I thought it was version 0.1.6
(by looking at the release version in sockets.ads), but it is actually
version 1.8.4.7). I didn't see any stream interface (looking through
the documentation). Perhaps it's there in the source and I didn't
notice it.

David



-- 

David Trudgett
http://www.zeta.org.au/~wpower/

As for those who profit by the privileges gained by previous acts of
violence, they often forget and like to forget how these privileges
were obtained. But one need only recall the facts of history, not the
history of the exploits of different dynasties of rulers, but real
history, the history of the oppression of the majority by a small
number of men, to see that all the advantages the rich have over the
poor are based on nothing but flogging, imprisonment, and murder.

    -- Leo Tolstoy, "The Kingdom of God is Within You"



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 17:59         ` tmoran
@ 2005-09-28  1:20           ` David Trudgett
  0 siblings, 0 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-28  1:20 UTC (permalink / raw)


tmoran@acm.org writes:

>>I am sanitising data received over a socket, which may be of any
>>length. Hence my use of unbounded_string, ...
>    Yes indeed.  For instance some bad program might send you an
> unlimited series of characters and your program would work, then
> crawl as it thrashed in virtual memory, then eventually crash as
> everything was full.  Surely there's some number above which "this
> can't be right!"  and that can be your Fixed or Bounded string size.

Yes, I had thought of that... ;-) As it happens, according to the
documentation, Get_Line will truncate to the buffer size set with
Set_Buffer (or a default of 1500 IIRC).

So, I could use String(1 .. 1500) (or whatever), wasting a bit of
space, or I could just use... unbounded strings.

Cheers,

David

-- 

David Trudgett
http://www.zeta.org.au/~wpower/

As a computer, I find your faith in technology amusing.




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 17:47     ` Jeffrey R. Carter
@ 2005-09-28  1:29       ` David Trudgett
  2005-09-28 18:32         ` Jeffrey R. Carter
  0 siblings, 1 reply; 71+ messages in thread
From: David Trudgett @ 2005-09-28  1:29 UTC (permalink / raw)


"Jeffrey R. Carter" <spam@spam.com> writes:

> David Trudgett wrote:
>
> It seems odd to do this:
>
>>        New_Str : Unbounded_String
>>          := To_Unbounded_String(Count(Str, Alphanumeric_Set));
>
> if you're also going to do this:
>
>>        New_Str := To_Unbounded_String("");
>
>>              Append(New_Str, Element(Str, Char));
>
> If you allocate the expected length (and not overwrite that with a
> null string), then you can use Replace_Element.

Yes, it does seem odd. As it happens, the code is easier and shorter
that way, but <<New_Str := To_Unbounded_String("");>> might free
storage in some implementations, so I created a new version using
Replace_Element instead.


>
> Is there some reason you're not doing this in place, using Delete? 

Repeated deletes would cause a lot of copying (though not a problem
for my particular purposes at present).


Now, go away or I shall taunt you a second time! ;-)

David


P.S. Thanks everyone for your comments on this one. Stay tuned for the
next exciting function call! :-)


-- 

David Trudgett
http://www.zeta.org.au/~wpower/

"While the popular understanding of anarchism is of a violent,
anti-State movement, anarchism is a much more subtle and nuanced
tradition than a simple opposition to government power. Anarchists
oppose the idea that power and domination are necessary for society,
and instead advocate more co-operative, anti-hierarchical forms of
social, political and economic organisation." 

    -- The Politics of Individualism, p. 106



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  1:01           ` David Trudgett
@ 2005-09-28  1:50             ` David Trudgett
  0 siblings, 0 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-28  1:50 UTC (permalink / raw)


David Trudgett <wpower@zeta.org.au.nospamplease> writes:

>> It definitely looks like an ideal example for the use of streams.
>> Is there something that makes it a bad idea to switch to a
>> stream-capable socket package?
>
> Only the fact that I'm just toying with a game program, and the
> socket library already works (adasockets 0.1.6 IIRC), and the comms
> layer I wrote works on top of it. Changing it in future won't be any
> big deal, however, since it's abstracted away behind my comms layer.

I've looked through the adasockets docs again and I see that there is
a lower level Receive function/procedure that uses streams. So, I have
to correct myself when I said it doesn't do streams.

So, when I get around to it, I might have a look at using streams, but
for the present, the Get_Line interface works just fine and is nice
and easy.

Sorry for the confusion. Sometimes the memory plays tricks. I should
have gone back and double checked.

David



-- 

David Trudgett
http://www.zeta.org.au/~wpower/

You need only reflect that one of the best ways to get yourself a
reputation as a dangerous citizen these days is to go about repeating
the very phrases which our founding fathers used in the great struggle
for independence.

    -- Charles Austin Beard (American historian/educator 1874-1948) 




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27  6:27 String filtering David Trudgett
  2005-09-27  7:38 ` Jacob Sparre Andersen
  2005-09-27  7:41 ` tmoran
@ 2005-09-28  1:54 ` Steve
  2005-09-28  2:20   ` David Trudgett
  2 siblings, 1 reply; 71+ messages in thread
From: Steve @ 2005-09-28  1:54 UTC (permalink / raw)


"David Trudgett" <wpower@zeta.org.au.nospamplease> wrote in message 
news:m31x3b6n5v.fsf@rr.trudgett...
>
> Hi all,
>
> I've been puzzling for a little bit over a good way to filter out
> unwanted characters from a string. In particular, I have an unbounded
> string and want to filter out of it all characters not in 'a'..'z',
> 'A'..'Z', '0'..'9'. So far I've only thought of tedious ways to do
> it. Is there an easy way to do it using the string handling facilities
> in Ada? I think I almost got there with the idea of using
> Maps.Character_Set, and so on, but I haven't quite pieced it together
> yet.
>
> Thanks.
>
> David
>

If you're just looking for simple code, I would suggest using the Index and 
Delete functions in Ada.Strings.Unbounded.  Something along the lines of:

  loop
      deleteIndex := Index( source, charSet, test => outside );
      exit when deleteIndex = 0;
      Delete( source, deleteIndex );
  end loop;

It's not very efficient, but I think it is about as simple as you can get. 
You can probably use Find_Token to get rid of consecutive chunks of 
characters you want to remove, but it would be a little bit more messy (and 
efficient).

Steve
(The Duck)


> -- 
>
> David Trudgett
> http://www.zeta.org.au/~wpower/
>
> We must learn to live together as brothers or perish together as
> fools.
>
>    -- Martin Luther King, Jr.
> 





^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  1:54 ` Steve
@ 2005-09-28  2:20   ` David Trudgett
  0 siblings, 0 replies; 71+ messages in thread
From: David Trudgett @ 2005-09-28  2:20 UTC (permalink / raw)


"Steve" <nospam_steved94@comcast.net> writes:

> If you're just looking for simple code, I would suggest using the
> Index and Delete functions in Ada.Strings.Unbounded.  Something
> along the lines of:
>
>   loop
>       deleteIndex := Index( source, charSet, test => outside );
>       exit when deleteIndex = 0;
>       Delete( source, deleteIndex );
>   end loop;
>
> It's not very efficient, but I think it is about as simple as you
> can get.  You can probably use Find_Token to get rid of consecutive
> chunks of characters you want to remove, but it would be a little
> bit more messy (and efficient).

Yes, that's interesting, Steve. I wasn't sure what the use of outside
and inside was until now. (It's obvious once you see it, though!)

You'll probably see in subsequent messages that I eventually came up
with a solution that is at least somewhat efficient (thanks to
everyone's input), reasonably straightforward, and does the job for
me.

Cheers,

David


-- 

David Trudgett
http://www.zeta.org.au/~wpower/

The most basic processes of living things are accomplished by
molecular engines as complex as man's greatest inventions.

    -- Jeremy L. Walter, B.S., M.S., Ph.D., Mechanical Engineering.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  0:06                 ` David Trudgett
@ 2005-09-28  8:15                   ` Dmitry A. Kazakov
  2005-09-28 10:39                     ` David Trudgett
  2005-09-28  9:08                   ` Jacob Sparre Andersen
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 71+ messages in thread
From: Dmitry A. Kazakov @ 2005-09-28  8:15 UTC (permalink / raw)


On Wed, 28 Sep 2005 10:06:44 +1000, David Trudgett wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> 
>> On Tue, 27 Sep 2005 14:42:02 +0100, Martin Dowie wrote:
>>> 2nd sentence of ARM 95 A.4.5 (76) reads:
>>> 
>>>   "The function To_Unbounded_String(Length : in Natural)
>>>     returns an Unbounded_String that represents an uninitialized
>>>     String whose length is Length."
>>
>> Ah, now I see what you meant!
> 
> Yep, that's what I meant, too.

No. What Martin meant is that:

   X : Unbounded_String := To_Unbounded_String (Count);

is filled with rubbish

What I meant is that:

   X : Unbounded_String;

is an empty string, being formally uninitialized. You can imagine it as
Unbounded_Strings having a default constructor setting them empty.

> I've made some revisions based on various comments, and this is what I
> have at the moment (incorporating both a string and unbounded_string
> version):
> 
>     Space_Char  : constant Character_Range := (' ', ' ');
>     Lower_Chars : constant Character_Range := ('a', 'z');
>     Upper_Chars : constant Character_Range := ('A', 'Z');
>     Numer_Chars : constant Character_Range := ('0', '9');
>     Alpha_Num_Space : constant Character_Ranges
>       := (Space_Char, Lower_Chars, Upper_Chars, Numer_Chars);
>     Alpha_Num_Space_Set : constant Character_Set
>       := To_Set(Alpha_Num_Space);
> 
> 
>    function Strip_Non_Alphanumeric
>      (Str : in Unbounded_String) return Unbounded_String
>    is
>       Dest_Size : Natural := Count(Str, Alpha_Num_Space_Set);
>       New_Str : Unbounded_String := Null_Unbounded_String;

You don't need initialization here. Or you can do with Dest_Size. The
parameter of To_Unbounded_String is a Natural.

>       Dest_Char : Natural := 0;
>    begin
>       if Dest_Size > 0 then

You don't need this if. Ada's loops are safe for zero-run.

>          New_Str := To_Unbounded_String(Dest_Size);
>          for Src_Char in 1 .. Length(Str) loop
>             if Is_In(Element(Str, Src_Char), Alpha_Num_Space_Set) then
>                Dest_Char := Dest_Char + 1;
>                Replace_Element
>                  (New_Str, Dest_Char, Element(Str, Src_Char));
>             end if;
>          end loop;
>       end if;
>       return New_Str;
>    end Strip_Non_Alphanumeric;
> 
> 
>    function Strip_Non_Alphanumeric
>      (Str : in String) return String
>    is
>       New_Str : String(1 .. Count(Str, Alpha_Num_Space_Set));

You also can do instead:

   New_Str : String(1 .. Length (Str));

and save one extra scan of the string by Count. Memory is cheap and in
worst case scenario you will allocate that amount anyway.

>       Dest_Char : Natural := 0;
>    begin
>       if New_Str'Last > 0 then

No need in this if.

>          for Src_Char in Str'Range loop
>             if Is_In(Str(Src_Char), Alpha_Num_Space_Set) then
>                Dest_Char := Dest_Char + 1;
>                New_Str(Dest_Char) := Str(Src_Char);
>             end if;
>          end loop;

here you do:

   return New_Str (1..Dest_Char);

Ada strings has slices!

>       else
>          New_Str := "";
>       end if;
>       return New_Str;
>    end Strip_Non_Alphanumeric;
> 
>> Because, assignment might reclaim the memory allocated by
>> To_Unbounded_String (Count).
> 
> I assume this is left up to the compiler implementation?

Yes

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  0:06                 ` David Trudgett
  2005-09-28  8:15                   ` Dmitry A. Kazakov
@ 2005-09-28  9:08                   ` Jacob Sparre Andersen
  2005-09-28  9:54                     ` David Trudgett
  2005-09-28 18:21                   ` Jeffrey R. Carter
  2005-09-28 21:00                   ` Simon Wright
  3 siblings, 1 reply; 71+ messages in thread
From: Jacob Sparre Andersen @ 2005-09-28  9:08 UTC (permalink / raw)


David Trudgett wrote:

> I've made some revisions based on various comments, and this is what
> I have at the moment (incorporating both a string and
> unbounded_string version):
>
>     Space_Char  : constant Character_Range := (' ', ' ');
>     Lower_Chars : constant Character_Range := ('a', 'z');
>     Upper_Chars : constant Character_Range := ('A', 'Z');
>     Numer_Chars : constant Character_Range := ('0', '9');
>     Alpha_Num_Space : constant Character_Ranges
>       := (Space_Char, Lower_Chars, Upper_Chars, Numer_Chars);
>     Alpha_Num_Space_Set : constant Character_Set
>       := To_Set(Alpha_Num_Space);
>
>
>    function Strip_Non_Alphanumeric
>      (Str : in Unbounded_String) return Unbounded_String
>    is
>       Dest_Size : Natural := Count(Str, Alpha_Num_Space_Set);
>       New_Str : Unbounded_String := Null_Unbounded_String;
>       Dest_Char : Natural := 0;
>    begin
>       if Dest_Size > 0 then
>          New_Str := To_Unbounded_String(Dest_Size);
>          for Src_Char in 1 .. Length(Str) loop
>             if Is_In(Element(Str, Src_Char), Alpha_Num_Space_Set) then
>                Dest_Char := Dest_Char + 1;
>                Replace_Element
>                  (New_Str, Dest_Char, Element(Str, Src_Char));
>             end if;
>          end loop;
>       end if;
>       return New_Str;
>    end Strip_Non_Alphanumeric;

   procedure Strip_Non_Alphanumeric (Str : in out Unbounded_String) is
      --  This version does in-place modification of the unbounded
      --  string, and is thus actually making use of Str being an
      --  unbounded string and not a fixed length string.
      Position : Natural := 1;
   begin
      while Position =< Length (Str) loop
         if Is_In (Element (Str, Position), Alpha_Num_Space_Set) then
            Position := Position + 1;
         else
            Delete (Source  => Str,
                    From    => Position,
                    Through => Position);
         end if;
      end loop;
   end Strip_Non_Alphanumeric;

>    function Strip_Non_Alphanumeric
>      (Str : in String) return String
>    is
>       New_Str : String(1 .. Count(Str, Alpha_Num_Space_Set));
>       Dest_Char : Natural := 0;
>    begin
>       if New_Str'Last > 0 then
>          for Src_Char in Str'Range loop
>             if Is_In(Str(Src_Char), Alpha_Num_Space_Set) then
>                Dest_Char := Dest_Char + 1;
>                New_Str(Dest_Char) := Str(Src_Char);
>             end if;
>          end loop;
>       else
>          New_Str := "";
>       end if;
>       return New_Str;
>    end Strip_Non_Alphanumeric;
>
> In the unbounded version, I decided to use replace_element instead of
> append (with its assignment to "", which might perhaps unallocate
> memory, depending on implementation??, thus potentially undoing the
> purpose of the preallocation).

Good.  It makes the String and Unbounded_String versions practically
equivalent - probably both in CPU and memory use.

> Profiling would show no difference in performance between the two
> for my current purposes, but in a different situation, involving
> large amounts of data, for instance, the fixed string version would
> no doubt out-perform speed-wise. Space-wise, the unbounded strings
> would probably win out in many situations.

I don't expect that your two functions would show significant
differences in space or CPU use, depending on how long strings you
throw at them.

>> Because, assignment might reclaim the memory allocated by
>> To_Unbounded_String (Count).
>
> I assume this is left up to the compiler implementation?

Exactly.

Jacob
-- 
"There is nothing worse than having only one drunk head."



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  9:08                   ` Jacob Sparre Andersen
@ 2005-09-28  9:54                     ` David Trudgett
  2005-09-29 14:05                       ` Georg Bauhaus
  0 siblings, 1 reply; 71+ messages in thread
From: David Trudgett @ 2005-09-28  9:54 UTC (permalink / raw)


Jacob Sparre Andersen <sparre@nbi.dk> writes:

>    procedure Strip_Non_Alphanumeric (Str : in out Unbounded_String) is
>       --  This version does in-place modification of the unbounded
>       --  string, and is thus actually making use of Str being an
>       --  unbounded string and not a fixed length string.
>       Position : Natural := 1;
>    begin
>       while Position =< Length (Str) loop
>          if Is_In (Element (Str, Position), Alpha_Num_Space_Set) then
>             Position := Position + 1;
>          else
>             Delete (Source  => Str,
>                     From    => Position,
>                     Through => Position);
>          end if;
>       end loop;
>    end Strip_Non_Alphanumeric;
>

Yes, that works, too, and I've added it in just for completeness.
Thanks. (Your =< should have been <= though. There's always something
to fix, isn't there? ;-))


>
> Good.  It makes the String and Unbounded_String versions practically
> equivalent - probably both in CPU and memory use.

Much of a muchness, I would guess. Profiling particular applications
on particular compilers is the only way to tell for sure, though.


Thanks for your help, Jacob.

David


-- 

David Trudgett
http://www.zeta.org.au/~wpower/

There is a theory which states that if ever anyone discovers exactly
what the Universe is for and why it is here, it will instantly
disappear and be replaced by something even more bizarre and
inexplicable. There is another theory which states that this has
already happened.
      
      -- Douglas Adams, "The Hitchhiker's Guide to the Galaxy"




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  8:15                   ` Dmitry A. Kazakov
@ 2005-09-28 10:39                     ` David Trudgett
  2005-09-28 20:55                       ` Simon Wright
  0 siblings, 1 reply; 71+ messages in thread
From: David Trudgett @ 2005-09-28 10:39 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> On Wed, 28 Sep 2005 10:06:44 +1000, David Trudgett wrote:
>
>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> 
>>> On Tue, 27 Sep 2005 14:42:02 +0100, Martin Dowie wrote:
>>>> 2nd sentence of ARM 95 A.4.5 (76) reads:
>>>> 
>>>>   "The function To_Unbounded_String(Length : in Natural)
>>>>     returns an Unbounded_String that represents an uninitialized
>>>>     String whose length is Length."
>>>
>>> Ah, now I see what you meant!
>> 
>> Yep, that's what I meant, too.
>
> No. What Martin meant is that:
>
>    X : Unbounded_String := To_Unbounded_String (Count);
>
> is filled with rubbish

Yes, that's what I meant, too. There was obviously a language problem
happening there, so let's go on to something else, hey? :-)


>
> What I meant is that:
>
>    X : Unbounded_String;
>
> is an empty string, being formally uninitialized. 

Looking back at your original message, it is possible to see how you
meant your words to be taken. However, the way you said it was far
from clear, and at least two people (myself included) understood your
statement differently. You could have better said something like: "And
this line would then become unnecessary."


> You can imagine it as Unbounded_Strings having a default constructor
> setting them empty.

Thanks for pointing that out, by the way, because I wasn't sure
initially. I'm loathe, however, to depend upon default initialisation,
even when it's specified in a standard.


>>    function Strip_Non_Alphanumeric
>>      (Str : in Unbounded_String) return Unbounded_String
>>    is
>>       Dest_Size : Natural := Count(Str, Alpha_Num_Space_Set);
>>       New_Str : Unbounded_String := Null_Unbounded_String;
>
> You don't need initialization here. 

I don't need it, that's true, but I prefer to be explicit about it. I
might change my mind about that when I'm more experienced in Ada.


> Or you can do with Dest_Size. The
> parameter of To_Unbounded_String is a Natural.
>
>>       Dest_Char : Natural := 0;
>>    begin
>>       if Dest_Size > 0 then
>
> You don't need this if. Ada's loops are safe for zero-run.

Its purpose is not to avoid a zero run, though. The test allows an
entire repeat scan of the source string to be avoided in the case that
all of the source characters are not alphanumeric.

The initialisation of New_Str to a null string means that it is ready
to return immediately without code execution passing through the
loop. This avoids the situation of returning an unitialised
unbounded_string when all source string characters are
non-alphanumeric.


>>    function Strip_Non_Alphanumeric
>>      (Str : in String) return String
>>    is
>>       New_Str : String(1 .. Count(Str, Alpha_Num_Space_Set));
>
> You also can do instead:
>
>    New_Str : String(1 .. Length (Str));


Yes, an interesting idea, combined with the sliced return.


>
> here you do:
>
>    return New_Str (1..Dest_Char);
>
> Ada strings has slices!

Interesting. I hadn't considered that!

David




-- 

David Trudgett
http://www.zeta.org.au/~wpower/

    
Whoever publicly profanes the Reich or one of the states incorporated
into it, its constitution, colors or flag or the German armed forces,
or maliciously and with premeditation exposes them to contempt, shall 
be punished by imprisonment.

    -- Statutory Criminal Law of Germany
       19 December 1932, RGB 1-1

The Congress and the States shall have the power to prohibit the act
of desecration of the flag of the United States and to set criminal
penalties for that act.

    -- Proposed Amendment to Constitution
       22 June 1989, H.J. Res. 305




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  0:51           ` David Trudgett
@ 2005-09-28 12:02             ` Dmitry A. Kazakov
  2005-09-28 13:25             ` Marc A. Criley
  1 sibling, 0 replies; 71+ messages in thread
From: Dmitry A. Kazakov @ 2005-09-28 12:02 UTC (permalink / raw)


On Wed, 28 Sep 2005 10:51:35 +1000, David Trudgett wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> 
>> It is a different story. Unbounded_String is a nasty kludge. But
>> that does not mean that if they were designed properly, they would
>> be more needed!  (:-))
> 
> I'm almost afraid to ask... :-) What is it about Unbounded_String that
> makes it a kludge, in your opinion? Is there something unecessarily
> inefficient (space/time) about the way they are specified in the Ada95
> standard? Or is it that existing implementations of it are a kludge?

It is much simpler, they are *not* strings: ARM 95 3.6.3 (1):

"A one-dimensional array type whose component type is a character type is
called a string type."

Unbounded_String design was a compromise, to make them right would have
required too many changes in the language. And there already were many.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  0:51           ` David Trudgett
  2005-09-28 12:02             ` Dmitry A. Kazakov
@ 2005-09-28 13:25             ` Marc A. Criley
  1 sibling, 0 replies; 71+ messages in thread
From: Marc A. Criley @ 2005-09-28 13:25 UTC (permalink / raw)


David Trudgett wrote:

> I get it as a string from the adasockets (0.1.6) library Get_Line
> function. Perhaps this is an old version, I don't know. I noticed a
> couple of people in this group have contributed to it (such as Pascal
> Obry), so they might be able to say.

Good grief, yes that's an OLD version!!  The current stable version of 
AdaSockets at http://www.rfc1149.net/devel/adasockets is 1.8.4.7.

-- Marc A. Criley
-- McKae Technologies
-- www.mckae.com
-- DTraq - XPath In Ada - XML EZ Out



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  0:06                 ` David Trudgett
  2005-09-28  8:15                   ` Dmitry A. Kazakov
  2005-09-28  9:08                   ` Jacob Sparre Andersen
@ 2005-09-28 18:21                   ` Jeffrey R. Carter
  2005-09-28 21:00                   ` Simon Wright
  3 siblings, 0 replies; 71+ messages in thread
From: Jeffrey R. Carter @ 2005-09-28 18:21 UTC (permalink / raw)


David Trudgett wrote:

>          New_Str := To_Unbounded_String(Dest_Size);

Why not do this in the declaration of New_Str, and avoid the double 
initialization? In the rare (I presume) case where you will return a null 
string, this will initialize New_Str with a null string [Length (New_Str) = 0], 
and in all other cases, it avoids the double initialization. Not a big 
efficiency concern, but an aesthetic one.

-- 
Jeff Carter
"You tiny-brained wipers of other people's bottoms!"
Monty Python & the Holy Grail
18



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  1:29       ` David Trudgett
@ 2005-09-28 18:32         ` Jeffrey R. Carter
  0 siblings, 0 replies; 71+ messages in thread
From: Jeffrey R. Carter @ 2005-09-28 18:32 UTC (permalink / raw)


David Trudgett wrote:

> Yes, it does seem odd. As it happens, the code is easier and shorter
> that way, but <<New_Str := To_Unbounded_String("");>> might free
> storage in some implementations, so I created a new version using
> Replace_Element instead.

Well, no (if I interpret your use of "that way" the same way you meant it). It's 
shorter and easier to use the default initial value of null string, and then 
Append to it. Initializing it to an uninitialized string of the known result 
length, then replacing that by a null string, and then appending, is longer, 
harder, and makes the reader stop and say "Huh?"

Your version using Replace_Element is more along the lines that I was suggesting.

> Repeated deletes would cause a lot of copying (though not a problem
> for my particular purposes at present).

Probably (a version based on a linked list of Characters would be optimized for 
Delete, but less than optimal for other operations :). The question is which 
version is clearest for your system. If the clean up in place is the clearest, 
then you have the questions of what the timing requirements are for your system, 
and whether using Delete causes the system to fail to meet those requirements.

> Now, go away or I shall taunt you a second time! ;-)

Taunt away! I'm not planning on going anywhere, obnoxious French chevalier!

-- 
Jeff Carter
"You tiny-brained wipers of other people's bottoms!"
Monty Python & the Holy Grail
18



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28 10:39                     ` David Trudgett
@ 2005-09-28 20:55                       ` Simon Wright
  2005-09-28 21:53                         ` Martin Dowie
  0 siblings, 1 reply; 71+ messages in thread
From: Simon Wright @ 2005-09-28 20:55 UTC (permalink / raw)


David Trudgett <wpower@zeta.org.au.nospamplease> writes:

> Thanks for pointing that out, by the way, because I wasn't sure
> initially. I'm loathe, however, to depend upon default
> initialisation, even when it's specified in a standard.

Where the standard specifies initialization (eg for access types, or
here) you probably won't get a warning for 'used before written
to'. But in other cases, if you don't know what to set the variable to
it is better _not_ to initialize it; that way the compiler has at
least a chance to tell you if you haven't set the value before using
it.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  0:06                 ` David Trudgett
                                     ` (2 preceding siblings ...)
  2005-09-28 18:21                   ` Jeffrey R. Carter
@ 2005-09-28 21:00                   ` Simon Wright
  3 siblings, 0 replies; 71+ messages in thread
From: Simon Wright @ 2005-09-28 21:00 UTC (permalink / raw)


David Trudgett <wpower@zeta.org.au.nospamplease> writes:

>       Dest_Size : Natural := Count(Str, Alpha_Num_Space_Set);

        Dest_Size : constant Natural := Count(Str, Alpha_Num_Space_Set);

>       New_Str : Unbounded_String := Null_Unbounded_String;
>       Dest_Char : Natural := 0;

It would be more like the uses in the standard IO libraries to call
Dest_Char 'Last'. (I for one thought it would hold a character rather
than an index to a character).



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 14:09         ` Marc A. Criley
  2005-09-28  1:09           ` David Trudgett
@ 2005-09-28 21:09           ` Simon Wright
  1 sibling, 0 replies; 71+ messages in thread
From: Simon Wright @ 2005-09-28 21:09 UTC (permalink / raw)


"Marc A. Criley" <mcNOSPAM@mckae.com> writes:

> Just as an FYI, I wrote an article a few years ago on how to put an
> Ada stream interface onto a socket.  It's at
> http://portal.acm.org/ft_gateway.cfm?id=568950&type=pdf.  And I
> believe Samuel Tardieu enhanced his AdaSockets implementation
> (http://www.rfc1149.net/devel/adasockets) around that time to do the
> same thing.

I don't know when GNAT.Sockets appeared in GNAT (it's been in the
supported versions for some time now, various degrees of
satisfactoriness). So far as I know it has always had streams.

I know it's in 3.16, not much help to you; it's in FSF 4.0.0 and is
bound to be in 2005 GPL.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28 20:55                       ` Simon Wright
@ 2005-09-28 21:53                         ` Martin Dowie
  0 siblings, 0 replies; 71+ messages in thread
From: Martin Dowie @ 2005-09-28 21:53 UTC (permalink / raw)


Simon Wright wrote:
> David Trudgett <wpower@zeta.org.au.nospamplease> writes:
> 
> 
>>Thanks for pointing that out, by the way, because I wasn't sure
>>initially. I'm loathe, however, to depend upon default
>>initialisation, even when it's specified in a standard.
> 
> 
> Where the standard specifies initialization (eg for access types, or
> here) you probably won't get a warning for 'used before written
> to'. But in other cases, if you don't know what to set the variable to
> it is better _not_ to initialize it; that way the compiler has at
> least a chance to tell you if you haven't set the value before using
> it.

...and use "pragma Normalize_Scalars;". :-)

Cheers

-- Martin



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-28  9:54                     ` David Trudgett
@ 2005-09-29 14:05                       ` Georg Bauhaus
  2005-10-01 19:02                         ` tmoran
  0 siblings, 1 reply; 71+ messages in thread
From: Georg Bauhaus @ 2005-09-29 14:05 UTC (permalink / raw)


David Trudgett wrote:

> 
>>Good.  It makes the String and Unbounded_String versions practically
>>equivalent - probably both in CPU and memory use.
> 
> 
> Much of a muchness, I would guess. Profiling particular applications
> on particular compilers is the only way to tell for sure, though.

Got some figures. As expected, String is always faster than Unbounded_String.
Maybe surprisingly, Vector is somewhat faster than Unbounded_String
in all cases, provided inlining is used. Heap means the String objects
have been allocated using new.
Compiler is GCC 4.1 on GNU/Linux x86.

-O2 -gnatn -gnato:

 1. iteration, 10 chars, 1000000 runs.
Fixed:      2.084710000
Heap:       1.498224000
Unbounded:  7.608056000
Vector:     5.686385000
 2. iteration, 10000 chars, 1000 runs.
Fixed:      0.421747000
Heap:       0.477814000
Unbounded:  0.787875000
Vector:     0.515643000
 3. iteration, 1000000 chars, 10 runs.
Fixed:      0.560290000
Heap:       0.622039000
Unbounded:  1.137758000
Vector:     0.917281000

-O2 -gnato

 1. iteration, 10 chars, 1000000 runs.
Fixed:      1.730108000
Heap:       1.604875000
Unbounded:  7.659804000
Vector:     6.483596000
 2. iteration, 10000 chars, 1000 runs.
Fixed:      0.510872000
Heap:       0.566339000
Unbounded:  0.872703000
Vector:     1.044757000
 3. iteration, 1000000 chars, 10 runs.
Fixed:      0.650525000
Heap:       0.710203000
Unbounded:  1.213516000
Vector:     1.437887000


The Vector function uses Vec_String in place of Unbounded_String,
where subtype Vec_String is Character_Vectors.Vector:


   function Strip_Non_Alphanumeric
     (Str: in Vec_String) return Vec_String
   is
      use Character_Vectors, Ada.Containers;

      Dest_Char: Index_Subtype'Base := 0;
      New_Str: Vec_String;
      Dest_Size: constant Count_Type := Length(Str);

   begin
      if Dest_Size > 0 then
         New_Str := To_Vector(Dest_Size);
         for Src_Char in 1 .. Last_Index(Str) loop
            if Is_In(Element(Str, Src_Char), Alpha_Num_Space_Set) then
               Dest_Char := Dest_Char + 1;
               Replace_Element
                 (New_Str, Dest_Char, Element(Str, Src_Char));
            end if;
         end loop;
      else
         null;
      end if;
      return New_Str;
   end Strip_Non_Alphanumeric;



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-27 13:21         ` Dmitry A. Kazakov
  2005-09-27 13:43           ` Martin Dowie
  2005-09-28  0:51           ` David Trudgett
@ 2005-09-29 22:42           ` Randy Brukardt
  2005-09-30 17:54             ` Robert A Duff
  2 siblings, 1 reply; 71+ messages in thread
From: Randy Brukardt @ 2005-09-29 22:42 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message
news:1b54lwg8s1gk8.1t3jp1cmc2x32$.dlg@40tude.net...
...
> Well, out parameters in functions are much desired by almost anybody,
> except the ARG members. (:-))

You mean about 50% of ARG members. (*This* ARG member is strongly in favor
of such parameters, to the point that I put an AI on the agenda to look at
the issue again (AI-323), But no joy in Mudville, er, Adaland.) Access
parameters are a very incomplete and aggravating substitute.

                     Randy.







^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-29 22:42           ` Randy Brukardt
@ 2005-09-30 17:54             ` Robert A Duff
  2005-10-02  6:57               ` Steve Whalen
  0 siblings, 1 reply; 71+ messages in thread
From: Robert A Duff @ 2005-09-30 17:54 UTC (permalink / raw)


"Randy Brukardt" <randy@rrsoftware.com> writes:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message
> news:1b54lwg8s1gk8.1t3jp1cmc2x32$.dlg@40tude.net...
> ...
> > Well, out parameters in functions are much desired by almost anybody,
> > except the ARG members. (:-))
> 
> You mean about 50% of ARG members. (*This* ARG member is strongly in favor
> of such parameters, to the point that I put an AI on the agenda to look at
> the issue again (AI-323), But no joy in Mudville, er, Adaland.) Access
> parameters are a very incomplete and aggravating substitute.

I'm an ARG member, and I agree with Randy here.  I don't know if the
ratio is more or less than 50%.  I have a feeling that more than 50%
would allow 'out' parameters on functions, but the minority who disagree
feel much more strongly about it.  By the way, Tucker disagrees with
Randy and I on this point.

- Bob



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-29 14:05                       ` Georg Bauhaus
@ 2005-10-01 19:02                         ` tmoran
  2005-10-02  6:38                           ` David Trudgett
  2005-10-03 10:33                           ` Georg Bauhaus
  0 siblings, 2 replies; 71+ messages in thread
From: tmoran @ 2005-10-01 19:02 UTC (permalink / raw)


I've missed part of this thread.  Was "Fixed" something like:
   function Strip_Non_Alphanumeric (Str: in String) return String is
     Result : String(Str'range);
     Last : Natural := Result'first-1;
   begin
     for i in Str'range loop
       if not Is_In(Str(i), Alphanumeric_Set) then
         Last := Last+1;
         Result(Last) := Str(i);
       end if;
     end loop;
     return Result(Result'first .. Last);
   end Strip_Non_Alphanumeric;



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-01 19:02                         ` tmoran
@ 2005-10-02  6:38                           ` David Trudgett
  2005-10-02 14:11                             ` Martin Dowie
  2005-10-03 10:33                           ` Georg Bauhaus
  1 sibling, 1 reply; 71+ messages in thread
From: David Trudgett @ 2005-10-02  6:38 UTC (permalink / raw)


tmoran@acm.org writes:

> I've missed part of this thread.  Was "Fixed" something like:
>    function Strip_Non_Alphanumeric (Str: in String) return String is
>      Result : String(Str'range);
>      Last : Natural := Result'first-1;
>    begin
>      for i in Str'range loop
>        if not Is_In(Str(i), Alphanumeric_Set) then
>          Last := Last+1;
>          Result(Last) := Str(i);
>        end if;
>      end loop;
>      return Result(Result'first .. Last);
>    end Strip_Non_Alphanumeric;

Your version looks very neat and concise. It looks correct, though I
haven't actually compiled and tested it. The version I actually have
now is:

   function Strip_Non_Alphanumeric
     (Str : in String) return String
   is
      New_Str :
        String(1 .. Ada.Strings.Fixed.Count(Str, Alpha_Num_Space_Set));
      Dest_Char : Natural := 0;
   begin
      if New_Str'Last > 0 then
         for Src_Char in Str'Range loop
            if Is_In(Str(Src_Char), Alpha_Num_Space_Set) then
               Dest_Char := Dest_Char + 1;
               New_Str(Dest_Char) := Str(Src_Char);
            end if;
         end loop;
      else
         New_Str := "";
      end if;
      return New_Str;
   end Strip_Non_Alphanumeric;

Your version is probably better (lower complexity, greater efficiency).


David

-- 

David Trudgett
http://www.zeta.org.au/~wpower/

The governor delivers an address in which he demands submission. The
excited crowd, generally deluded by their leaders, don't understand a
word of what the representative of authority is saying in the pompous
official language, and their excitement continues. Then the governor
announces that if they do not submit and disperse, he will be obliged
to have recourse to force. If the crowd does not disperse even on
this, the governor gives the order to fire over the heads of the
crowd. If the crowd does not even then disperse, the governor gives
the order to fire straight into the crowd; the soldiers fire and the
killed and wounded fall about the street.

    -- Leo Tolstoy, "The Kingdom of God is Within You"





^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-09-30 17:54             ` Robert A Duff
@ 2005-10-02  6:57               ` Steve Whalen
  2005-10-02 14:14                 ` Martin Dowie
  2005-10-03  1:21                 ` Robert A Duff
  0 siblings, 2 replies; 71+ messages in thread
From: Steve Whalen @ 2005-10-02  6:57 UTC (permalink / raw)


> I'm an ARG member, and I agree with Randy here ...
> I have a feeling that more than 50% would allow 'out' parameters on functions,
> but the minority who disagree feel much more strongly about it.
> By the way, Tucker disagrees with Randy and I on this point.

I'm curious why you and Randy and others want "out" parameters to be
allowed in functions.  Since I have great respect for you and Randy
(and Tucker), I'd like to hear more about why you all want this.

I seem to have been brainwashed into thinking that functions with side
effects of any kind were a "bad thing", especially in a language like
Ada.

Is there a class of problems that would be significantly clearer when
expressed in Ada if "out" parameters were allowed? 

Steve




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-02  6:38                           ` David Trudgett
@ 2005-10-02 14:11                             ` Martin Dowie
  2005-10-02 22:40                               ` David Trudgett
  0 siblings, 1 reply; 71+ messages in thread
From: Martin Dowie @ 2005-10-02 14:11 UTC (permalink / raw)


David Trudgett wrote:
>       if New_Str'Last > 0 then

Why do you do this test? If New_Str'Range = (1 .. 0) then loop simply 
won't be 'run' and you should still be returning "".

-- Martin



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-02  6:57               ` Steve Whalen
@ 2005-10-02 14:14                 ` Martin Dowie
  2005-10-03  1:21                 ` Robert A Duff
  1 sibling, 0 replies; 71+ messages in thread
From: Martin Dowie @ 2005-10-02 14:14 UTC (permalink / raw)


Steve Whalen wrote:
>>I'm an ARG member, and I agree with Randy here ...
>>I have a feeling that more than 50% would allow 'out' parameters on functions,
>>but the minority who disagree feel much more strongly about it.
>>By the way, Tucker disagrees with Randy and I on this point.
> 
> 
> I'm curious why you and Randy and others want "out" parameters to be
> allowed in functions.  Since I have great respect for you and Randy
> (and Tucker), I'd like to hear more about why you all want this.
> 
> I seem to have been brainwashed into thinking that functions with side
> effects of any kind were a "bad thing", especially in a language like
> Ada.

I think that's part of the problem - function are allowed to have side 
effects in Ada already, so why not just allow 'in out' and allow side 
effects on parameters?

If functions having side effects were /that/ bad why allow them at all?

-- Martin



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-02 14:11                             ` Martin Dowie
@ 2005-10-02 22:40                               ` David Trudgett
  2005-10-03  5:56                                 ` Martin Dowie
  0 siblings, 1 reply; 71+ messages in thread
From: David Trudgett @ 2005-10-02 22:40 UTC (permalink / raw)


Martin Dowie <martin.dowie@btopenworld.com> writes:

> David Trudgett wrote:
>>       if New_Str'Last > 0 then
>
> Why do you do this test? If New_Str'Range = (1 .. 0) then loop simply
> won't be 'run' and you should still be returning "".

Hi Martin,

Where do I say "New_Str'Range" in my loop? :-) Therein lies the answer
to your question.

Cheers,

David




-- 

David Trudgett
http://www.zeta.org.au/~wpower/

Our lives begin to end the day we become silent about things that
matter.

    -- Martin Luther King



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-02  6:57               ` Steve Whalen
  2005-10-02 14:14                 ` Martin Dowie
@ 2005-10-03  1:21                 ` Robert A Duff
  2005-10-03  7:44                   ` Jacob Sparre Andersen
                                     ` (2 more replies)
  1 sibling, 3 replies; 71+ messages in thread
From: Robert A Duff @ 2005-10-03  1:21 UTC (permalink / raw)


"Steve Whalen" <SteveWhalen001@hotmail.com> writes:

> > I'm an ARG member, and I agree with Randy here ...
> > I have a feeling that more than 50% would allow 'out' parameters on functions,
> > but the minority who disagree feel much more strongly about it.
> > By the way, Tucker disagrees with Randy and I on this point.
> 
> I'm curious why you and Randy and others want "out" parameters to be
> allowed in functions.  Since I have great respect for you and Randy
> (and Tucker), I'd like to hear more about why you all want this.

Well, thanks for saying so.  :-)

> I seem to have been brainwashed into thinking that functions with side
> effects of any kind were a "bad thing", especially in a language like
> Ada.

I think that side effects in functions are usually a bad idea.
But not always.

I think the programmer, not the language designer, should make such
decisions.  I like restrictive rules that prevent me (as a programmer)
from doing bad things by accident.  But I don't like restrictive rules
that try to prevent me from doing things I deliberately choose to do.

Ada allows side effects in functions.  Functions can write upon global
variables.  And you can pass pointers to functions, and they can write
upon the referenced data.  And you can pass objects (perhaps of a
private type) containing pointers, and do the same thing.  A limited
private object can contain a pointer to itself (the "Rosen trick").
These side effects are hidden, and therefore (often) worse than an 'in
out' parameter.  That is, it makes no sense to me to tell programmers,
"You can have side effects, but you're not allowed to make it clear in
the code."  If side effects on parameters are evil, side effects on
globals are worse.

Some abstractions have side effects on the implementation level, which
are not visible to clients.  For example, consider something like Lisp
"symbols", implemented in Ada:

    function Intern(Table: in out Symbol_Table; -- illegal!
                    X: String) return Symbol;

Symbol is represented as an index into some table.
The Intern function looks up X in a hash table,
and returns the value found, if it's there.
If it's not there, it adds X to the table.
Thus, Intern(Table, "hello") will always return the same value as
Intern(Table, "hello") called later.  The "side effect" of adding
X to the table the first time "hello" is interned is not really
a side effect from the client's point of view.

If Table is global, it works fine.  But passing it as a parameter as
above is illegal.  I don't want language designers forcing such
choices.

We can solve this in Ada by adding various levels of indirection,
which is rather a pain.  For example, the "Rosen trick".

Another example of side effects that I think are OK is when you're
getting items from a stream.  E.g. in a recursive-descent parser, you
call a Get_Token routine that returns the next token from the stream.
It has the side effect of consuming one item from the stream.  It seems
reasonable to me for each parsing routine to be a function that returns
a syntax tree, and consumes the input represented by that tree.  But
these side effects are uniform throughout the parser, and fairly easy to
understand.

Initializing variables on their declaration is a Good Thing:

    Tok: Token := Get_Token(Stream);

is better than:

    Tok: Token;
    ...
    Get_Token(Stream, Tok);

because it's easier to understand that Tok is properly initialized,
and because it's more consise.

Constants are a good thing:

    Tok: constant Token := Get_Token(Stream);

because you don't have to read all the code to find out where Tok is
modified.

There are cases where Ada forces use of functions instead of
procedures.  For example, a return type String means something
completely different than an 'out' parameter of type String (perhaps
another bad language design decision).  So if you want to get the next
line from a stream of characters, and you don't know how long it might
be, you want:

    function Get_Line(S: in out Stream) return String; -- illegal!
    ...
    Line: constant String := Get_Line(...);

If you try to turn Get_Line into a procedure, because you hate
side-effecting functions, you will end up unnecessarily using the heap,
or arbitrarily limiting line lengths, both of which are bad.

All of these Get_Token and Get_Line sorts of functions work just fine if
the stream is a global variable.  But global variables are (usually) a
bad idea -- parameters are (usually) better.

Take a look at Ada's predefined random number generator.  Would you
prefer it to be a procedure?  It's a function, and it requires either
the Rosen trick, or gratuitous heap usage.  It has a side effect, and I
think it would be preferable to make that side effect clear in the code,
by declaring the generator 'in out'.

> Is there a class of problems that would be significantly clearer when
> expressed in Ada if "out" parameters were allowed? 

Both 'out' and 'in out' should be allowed.  If I ran the circus,
I would also require some sort of syntactic indication on the _call_ for
'[in] out' parameters.  For both procedures and functions.

- Bob

P.S. "Side effect" is a pejorative term.  (Some drug cures your disease,
but makes you nauseous.)  If the effect is intended by the programmer,
and clear in the code, perhaps we should call it an "effect", without
the "side".



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-02 22:40                               ` David Trudgett
@ 2005-10-03  5:56                                 ` Martin Dowie
  0 siblings, 0 replies; 71+ messages in thread
From: Martin Dowie @ 2005-10-03  5:56 UTC (permalink / raw)


David Trudgett wrote:
> Martin Dowie <martin.dowie@btopenworld.com> writes:
> 
> 
>>David Trudgett wrote:
>>
>>>      if New_Str'Last > 0 then
>>
>>Why do you do this test? If New_Str'Range = (1 .. 0) then loop simply
>>won't be 'run' and you should still be returning "".
> 
> 
> Hi Martin,
> 
> Where do I say "New_Str'Range" in my loop? :-) Therein lies the answer
> to your question.

Sorry! I think I saw what I expected not what was there! ;-)

Cheers

-- Martin



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-03  1:21                 ` Robert A Duff
@ 2005-10-03  7:44                   ` Jacob Sparre Andersen
  2005-10-03  8:56                     ` Dmitry A. Kazakov
                                       ` (2 more replies)
  2005-10-03 10:06                   ` Steve Whalen
  2005-10-03 17:43                   ` tmoran
  2 siblings, 3 replies; 71+ messages in thread
From: Jacob Sparre Andersen @ 2005-10-03  7:44 UTC (permalink / raw)


Robert A Duff wrote:
> Steve Whalen wrote:

>> I seem to have been brainwashed into thinking that functions with
>> side effects of any kind were a "bad thing", especially in a
>> language like Ada.
>
> I think that side effects in functions are usually a bad idea.
> But not always.

My whole understanding of the concept "a function" is that it doesn't
have side effects. - But I'm a mathematician (and physicist), not a
computer scientist.

> I think the programmer, not the language designer, should make such
> decisions.  I like restrictive rules that prevent me (as a
> programmer) from doing bad things by accident.  But I don't like
> restrictive rules that try to prevent me from doing things I
> deliberately choose to do.

The problem is then to introduce a way to make it clear if you want a
function to have side effects or not.  Just coding a function with
side effects isn't the most clear way to do it (although using an "in
out" parameter is pretty close).  Would it be to go too far to have to
specify functions as either:

   function Name (...) return Subtype;

or:

   function Name with side effects (...) Subtype;

I know that new keywords aren't exactly popular, but I hope the basic
idea still is clear.

It might be possible to do something similar with a pragma, but I
would prefer to have it more explicit in the declaration of the
function.

Introducing a clear separation of functions in those which can have
side effects and those which can't have side effects, might also allow
us to tighten the current, somewhat lax definition of functions
without side effects.

> We can solve this in Ada by adding various levels of indirection,
> which is rather a pain.  For example, the "Rosen trick".

Would it be possible to tighten the rules for functions without side
effects to prevent the "Rosen trick"?  Or will that workaround
effectively always be possible?

> Another example of side effects that I think are OK is when you're
> getting items from a stream.

Agreed.  But I wouldn't mind being forced to document that the
function had side effects.

> Both 'out' and 'in out' should be allowed.  If I ran the circus, I
> would also require some sort of syntactic indication on the _call_
> for '[in] out' parameters.  For both procedures and functions.

That might make more sense than just indicating it in the declaration.

> P.S. "Side effect" is a pejorative term.  (Some drug cures your
> disease, but makes you nauseous.)  If the effect is intended by the
> programmer, and clear in the code, perhaps we should call it an
> "effect", without the "side".

"Effect" is not quite clear enough.  "Global effect" might be to
exaggerate a bit.  "Modifying function"?

Greetings,

Jacob
-- 
"Three can keep a secret if two of them are dead."




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-03  7:44                   ` Jacob Sparre Andersen
@ 2005-10-03  8:56                     ` Dmitry A. Kazakov
  2005-10-03  9:25                       ` Jean-Pierre Rosen
  2005-10-04 17:00                     ` String filtering Robert A Duff
  2005-10-04 19:47                     ` Björn Persson
  2 siblings, 1 reply; 71+ messages in thread
From: Dmitry A. Kazakov @ 2005-10-03  8:56 UTC (permalink / raw)


On Mon, 03 Oct 2005 09:44:10 +0200, Jacob Sparre Andersen wrote:

> Robert A Duff wrote:
>> Steve Whalen wrote:
> 
>>> I seem to have been brainwashed into thinking that functions with
>>> side effects of any kind were a "bad thing", especially in a
>>> language like Ada.
>>
>> I think that side effects in functions are usually a bad idea.
>> But not always.
> 
> My whole understanding of the concept "a function" is that it doesn't
> have side effects.

What is a side effect? Reading a hardware register out is one? Consuming
CPU cycles? Cache manipulation? Heating the mainboard? There is no anything
without side effects. The whole program execution is just one big side
effect. You should specify what you wish to abstract away as unimportant
for the program. If the language allows to do this in a consistent way,
then there is no reason to care about side effects.

>> I think the programmer, not the language designer, should make such
>> decisions.  I like restrictive rules that prevent me (as a
>> programmer) from doing bad things by accident.  But I don't like
>> restrictive rules that try to prevent me from doing things I
>> deliberately choose to do.
> 
> The problem is then to introduce a way to make it clear if you want a
> function to have side effects or not.

IMO the source of confusion is in mixing three independent concepts:

1. Parameter modes (in, in out, out)
2. Distinguished parameters
3. Purity

> Just coding a function with
> side effects isn't the most clear way to do it (although using an "in
> out" parameter is pretty close).  Would it be to go too far to have to
> specify functions as either:
> 
>    function Name (...) return Subtype;
> 
> or:
> 
>    function Name with side effects (...) Subtype;
> 
> I know that new keywords aren't exactly popular, but I hope the basic
> idea still is clear.

No. There could be pure procedures as well. Clearly it should be the pragma
Pure allowed for subprograms. Purity requirement makes sense as an
implementation detail. I'm not sure if it does as a part of the contract,
which a keyword would imply.

Perhaps, Pure should have a second parameter to specify the purity context,
which might be useful for recursive subprograms and other cases of relative
purity.

>> We can solve this in Ada by adding various levels of indirection,
>> which is rather a pain.  For example, the "Rosen trick".
> 
> Would it be possible to tighten the rules for functions without side
> effects to prevent the "Rosen trick"?  Or will that workaround
> effectively always be possible?

But Rosen trick applies to objects! You can either:

1. Prohibit passing such objects in in-mode - a very bad idea, IMO.

2. Change the access type to a constant access type if the object is passed
as in. This probably could have some sense. One could use "renames" instead
of "access":

type Rosen is ... record
   Unsafe_Self : access Rosen'Class := Rosen'Access;
      -- This is in-out [new Ada 2005 syntax]
   Safe_Self : Rosen'Class renames Rosen'Class;
      -- This becomes constant in in-mode
   ...
end record;

>> Another example of side effects that I think are OK is when you're
>> getting items from a stream.
> 
> Agreed.  But I wouldn't mind being forced to document that the
> function had side effects.

What for? The contract is - you call it and it does the job. How it does
this is up to implementation. Much more important is how to deal with:

Get_Token (Stream) & Get_Token (Stream)
   -- What would be the result?

It would be nice if the compiler would reject the above when Get_Token were
impure and & did not specify any evaluation order. But that again brings
back the issue of what is the contract and what is an implementation
detail. Clearly a balance need to be found.

>> Both 'out' and 'in out' should be allowed.  If I ran the circus, I
>> would also require some sort of syntactic indication on the _call_
>> for '[in] out' parameters.  For both procedures and functions.
> 
> That might make more sense than just indicating it in the declaration.

I see no reason why. Consider changing a constant to variable somewhere in
the program. That could require massive changes throughout the sources.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-03  8:56                     ` Dmitry A. Kazakov
@ 2005-10-03  9:25                       ` Jean-Pierre Rosen
  2005-10-03 20:17                         ` Ada Notation Jeffrey R. Carter
  0 siblings, 1 reply; 71+ messages in thread
From: Jean-Pierre Rosen @ 2005-10-03  9:25 UTC (permalink / raw)


Dmitry A. Kazakov a �crit :
> Get_Token (Stream) & Get_Token (Stream)
>    -- What would be the result?
> 
> It would be nice if the compiler would reject the above when Get_Token were
> impure and & did not specify any evaluation order. But that again brings
> back the issue of what is the contract and what is an implementation
> detail. Clearly a balance need to be found.
> 
<shameless plug>
This is one of the things (dependence on evaluation order) that can be 
checked with AdaControl.

Rule: Side_Effect_Parameters
</shameless plug>



-- 
---------------------------------------------------------
            J-P. Rosen (rosen@adalog.fr)
Visit Adalog's web site at http://www.adalog.fr



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-03  1:21                 ` Robert A Duff
  2005-10-03  7:44                   ` Jacob Sparre Andersen
@ 2005-10-03 10:06                   ` Steve Whalen
  2005-10-03 17:43                   ` tmoran
  2 siblings, 0 replies; 71+ messages in thread
From: Steve Whalen @ 2005-10-03 10:06 UTC (permalink / raw)


Robert A Duff wrote:

> ... Ada allows side effects in functions.   ...

...

[Good examples of idioms where permitting "out" parameters in functions
would aid clarity of Ada code snipped out]

[examples included recursive descent parsers, "get token" or "get line"
type functions, and function parameter initializations]

...

Thanks for the explanation.

You've sold me. I was leaning toward Tucker's position that "out"
parameters had no place in Ada, but now I think they probably should be
allowed (especially since the other kinds of side effects functions are
allowed to have are probably generally worse and have to found using
other tools or code inspections anyway).

If I'm managing a project and want to see where programmers used "out"
parameters in functions (if they were allowed), I could run an ASIS
based tool to find them and review them to be sure the programmer was
making sensible use of the (proposed) feature (as in your examples) and
not getting "cute".

...

> - Bob
>
> P.S. "Side effect" is a pejorative term.  (Some drug cures your disease,
> but makes you nauseous.)  If the effect is intended by the programmer,
> and clear in the code, perhaps we should call it an "effect", without
> the "side".

You're right. I guess "side effect" as I was using it is really a
contraction of "unintended side effect" which is a bad thing, but as
you point out, not all side effects are bad or unintended.  And since
the language doesn't prevent ALL side effects for functions,
eliminating the "out" parameters adds limitations that really don't add
much counterbalancing safety.

Steve




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-01 19:02                         ` tmoran
  2005-10-02  6:38                           ` David Trudgett
@ 2005-10-03 10:33                           ` Georg Bauhaus
  1 sibling, 0 replies; 71+ messages in thread
From: Georg Bauhaus @ 2005-10-03 10:33 UTC (permalink / raw)


tmoran@acm.org wrote:
> I've missed part of this thread.

For the comparison, I had taken the functions from
<m38xxi6oob.fsf@rr.trudgett>, Sep. 28.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-03  1:21                 ` Robert A Duff
  2005-10-03  7:44                   ` Jacob Sparre Andersen
  2005-10-03 10:06                   ` Steve Whalen
@ 2005-10-03 17:43                   ` tmoran
  2005-10-03 17:59                     ` Robert A Duff
  2 siblings, 1 reply; 71+ messages in thread
From: tmoran @ 2005-10-03 17:43 UTC (permalink / raw)


>   function Intern(Table: in out Symbol_Table; -- illegal!
>                   X: String) return Symbol;
>...
>   Tok: Token := Get_Token(Stream);  [-- illegal]
>...
>   function Get_Line(S: in out Stream) return String; -- illegal!
Isn't that why
    function S'Input(Stream : access Ada.Streams.Root_Stream_Type'Class)
    return T;
takes an access parameter?
Is it very difficult to do the same with Intern or Get_Token or Get_Line?



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-03 17:43                   ` tmoran
@ 2005-10-03 17:59                     ` Robert A Duff
  2005-10-05 23:04                       ` Randy Brukardt
  0 siblings, 1 reply; 71+ messages in thread
From: Robert A Duff @ 2005-10-03 17:59 UTC (permalink / raw)


tmoran@acm.org writes:

> >   function Intern(Table: in out Symbol_Table; -- illegal!
> >                   X: String) return Symbol;
> >...
> >   Tok: Token := Get_Token(Stream);  [-- illegal]
> >...
> >   function Get_Line(S: in out Stream) return String; -- illegal!
> Isn't that why
>     function S'Input(Stream : access Ada.Streams.Root_Stream_Type'Class)
>     return T;
> takes an access parameter?

Yes.

> Is it very difficult to do the same with Intern or Get_Token or Get_Line?

It's a pain to have to declare things 'aliased' all over the place.
It's a "cry wolf" thing -- 'aliased' should mean I'm making
possibly-permanent pointers to that thing.  But here, we don't want
a pointer (except as a temporary by-reference parameter).

"aliased Blah'Class" means "Warning Will Robinson: this procedure might
save the pointer in a global data structure."  But that's not what
S'Input is doing.

And there's some run-time overhead for access parameters -- they carry
run-time accessibility-level info with them.

Furthermore, there's (annoyingly) no way to declare a formal parameter
aliased.  So you use 'access' where 'in out' should suffice, or you
declared tagged types that have no need for a tag.

Yes, there are workarounds for the lack of [in] out params on functions
-- but they have global consequences on your code.

- Bob



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Ada Notation
  2005-10-03  9:25                       ` Jean-Pierre Rosen
@ 2005-10-03 20:17                         ` Jeffrey R. Carter
  2005-10-03 20:41                           ` Georg Bauhaus
  2005-10-04 15:13                           ` brian.b.mcguinness
  0 siblings, 2 replies; 71+ messages in thread
From: Jeffrey R. Carter @ 2005-10-03 20:17 UTC (permalink / raw)


Jean-Pierre Rosen wrote:

Innocent_Mode : begin

> <shameless plug>
...
> </shameless plug>

I have no idea what these <...> things mean. This is c.l.Ada. Ada uses begin and 
end. Therefore, we should use begin and end. QED.

end Innocent_Mode;

-- 
Jeff Carter
"English bed-wetting types."
Monty Python & the Holy Grail
15



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Ada Notation
  2005-10-03 20:17                         ` Ada Notation Jeffrey R. Carter
@ 2005-10-03 20:41                           ` Georg Bauhaus
  2005-10-05 17:16                             ` Andre
  2005-10-04 15:13                           ` brian.b.mcguinness
  1 sibling, 1 reply; 71+ messages in thread
From: Georg Bauhaus @ 2005-10-03 20:41 UTC (permalink / raw)


Jeffrey R. Carter wrote:

> This is c.l.Ada. Ada uses 
> begin and end. Therefore, we should use begin and end. QED.
 
I beg to differ. Since this is c.l.Ada but not Ada, we should
use some text markup, not Ada brackets around some confusingly
illegal non-Ada text. ;-)




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Ada Notation
  2005-10-03 20:17                         ` Ada Notation Jeffrey R. Carter
  2005-10-03 20:41                           ` Georg Bauhaus
@ 2005-10-04 15:13                           ` brian.b.mcguinness
  1 sibling, 0 replies; 71+ messages in thread
From: brian.b.mcguinness @ 2005-10-04 15:13 UTC (permalink / raw)


The <...> tags are a spoof of the HTML (hypertext markup language) tags
used in
web pages.  For example, in a web page the tags <table> and </table>
begin and
end a table, <ul> and </ul> begin and end a bullet list, and so on.  So
someone
who wants to mark a section of his usenet posting as a flame might
surround it
with <flame> and </flame> tags.  This sort of thing is not uncommon in
postings
these days, so you are likely to see it again in the future.




^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-03  7:44                   ` Jacob Sparre Andersen
  2005-10-03  8:56                     ` Dmitry A. Kazakov
@ 2005-10-04 17:00                     ` Robert A Duff
  2005-10-05  8:19                       ` Jean-Pierre Rosen
  2005-10-04 19:47                     ` Björn Persson
  2 siblings, 1 reply; 71+ messages in thread
From: Robert A Duff @ 2005-10-04 17:00 UTC (permalink / raw)


Jacob Sparre Andersen <sparre@nbi.dk> writes:

> Robert A Duff wrote:
> > Steve Whalen wrote:
> 
> >> I seem to have been brainwashed into thinking that functions with
> >> side effects of any kind were a "bad thing", especially in a
> >> language like Ada.
> >
> > I think that side effects in functions are usually a bad idea.
> > But not always.
> 
> My whole understanding of the concept "a function" is that it doesn't
> have side effects. - But I'm a mathematician (and physicist), not a
> computer scientist.

Yes, of course.  But what Ada calls a "function" is not (necessarily) a
maths function.  (And what Ada calls "Integer" is not what a
mathematician calls "integer".  Ada is not the first language to misuse
maths terminology.)

Anyway, if you want to calculate (say) the Nth Fibonacci number, you can
say:

    function Fib(N: ...) return ...;

or:

    procedure Fib(N: ...; Result: out ...);

Progammer's choice.  They're just two different notations for doing the
same thing.  And they both calculate a result that is a _function_ (in
the maths sense) of N.

If I ran the circus, I would call both kinds of subprograms "procedures"
and use the reserved word "procedure" for both syntaxes.  What Ada calls
"function" I would call "value-returning procedure".

In many cases, there are advantages to using the value-returning
notation.  As I said in my previous not, these advantages _usually_
apply in side-effect-free cases, but not always (IMHO of course).

Now that I think about it: one of my examples of OK side effects was
reading tokens from a stream.  From one point of view, that's
side-effectful -- input is consumed.  But from another point of view,
the sequence of tokens produced is a (maths) function of the input
sequence.

> > I think the programmer, not the language designer, should make such
> > decisions.  I like restrictive rules that prevent me (as a
> > programmer) from doing bad things by accident.  But I don't like
> > restrictive rules that try to prevent me from doing things I
> > deliberately choose to do.
> 
> The problem is then to introduce a way to make it clear if you want a
> function to have side effects or not.  Just coding a function with
> side effects isn't the most clear way to do it (although using an "in
> out" parameter is pretty close).  Would it be to go too far to have to
> specify functions as either:
> 
>    function Name (...) return Subtype;
> 
> or:
> 
>    function Name with side effects (...) Subtype;

Maybe, but that would require massive changes to Ada.
You would need to make global variable access part of the contract
of each subprogram.  SPARK does that, but it has no pointers,
which makes the job a lot simpler.

Also, you would want a way to "cheat".  My example of "Intern" should be
considered a pure function, but internally, it has side effects (which
are invisible to the client).  Memoizing functions are another example.

> > We can solve this in Ada by adding various levels of indirection,
> > which is rather a pain.  For example, the "Rosen trick".
> 
> Would it be possible to tighten the rules for functions without side
> effects to prevent the "Rosen trick"?  Or will that workaround
> effectively always be possible?

Well, Dmitry gave some ideas of how to go about outlawing the Rosen
trick in cases where that's desirable.  Something like that might work.

- Bob

P.S. Ada function take time to evaluate.  Maths functions do not -- they
just "exist".  There's no way you're going to eliminate _that_ side
effect!  And it matters, at least in real-time systemss.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-03  7:44                   ` Jacob Sparre Andersen
  2005-10-03  8:56                     ` Dmitry A. Kazakov
  2005-10-04 17:00                     ` String filtering Robert A Duff
@ 2005-10-04 19:47                     ` Björn Persson
  2005-10-05 14:14                       ` Dmitry A. Kazakov
  2 siblings, 1 reply; 71+ messages in thread
From: Björn Persson @ 2005-10-04 19:47 UTC (permalink / raw)


Jacob Sparre Andersen wrote:
>    function Name with side effects (...) Subtype;

Perhaps it's time to invent the funcedure, a function/procedure hybrid 
that has both a return value and out parameters?

funcedure Name(Object : in out Type) return Subtype

-- 
Bj�rn Persson                              PGP key A88682FD
                    omb jor ers @sv ge.
                    r o.b n.p son eri nu



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-04 17:00                     ` String filtering Robert A Duff
@ 2005-10-05  8:19                       ` Jean-Pierre Rosen
  2005-10-05 11:25                         ` Robert A Duff
  0 siblings, 1 reply; 71+ messages in thread
From: Jean-Pierre Rosen @ 2005-10-05  8:19 UTC (permalink / raw)


Robert A Duff a �crit :
> If I ran the circus, I would call both kinds of subprograms "procedures"
> and use the reserved word "procedure" for both syntaxes.  What Ada calls
> "function" I would call "value-returning procedure".
> 
Hmmm... A bit reminiscent of Ada 1980, isn't it?
-- 
---------------------------------------------------------
            J-P. Rosen (rosen@adalog.fr)
Visit Adalog's web site at http://www.adalog.fr



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-05  8:19                       ` Jean-Pierre Rosen
@ 2005-10-05 11:25                         ` Robert A Duff
  0 siblings, 0 replies; 71+ messages in thread
From: Robert A Duff @ 2005-10-05 11:25 UTC (permalink / raw)


Jean-Pierre Rosen <rosen@adalog.fr> writes:

> Robert A Duff a �crit :
> > If I ran the circus, I would call both kinds of subprograms "procedures"
> > and use the reserved word "procedure" for both syntaxes.  What Ada calls
> > "function" I would call "value-returning procedure".
> >
> Hmmm... A bit reminiscent of Ada 1980, isn't it?

Yes, exactly, except that Ada 80 had these in addition to functions;
I would prefer instead-of.

- Bob



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-04 19:47                     ` Björn Persson
@ 2005-10-05 14:14                       ` Dmitry A. Kazakov
  0 siblings, 0 replies; 71+ messages in thread
From: Dmitry A. Kazakov @ 2005-10-05 14:14 UTC (permalink / raw)


On Tue, 04 Oct 2005 19:47:25 GMT, Bj�rn Persson wrote:

> Jacob Sparre Andersen wrote:
>>    function Name with side effects (...) Subtype;
> 
> Perhaps it's time to invent the funcedure, a function/procedure hybrid 
> that has both a return value and out parameters?
> 
> funcedure Name(Object : in out Type) return Subtype

procedure Name (Object : in out Type) return Subtype;

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Ada Notation
  2005-10-03 20:41                           ` Georg Bauhaus
@ 2005-10-05 17:16                             ` Andre
  2005-10-05 18:23                               ` Ludovic Brenta
  2005-10-05 18:24                               ` Jeffrey R. Carter
  0 siblings, 2 replies; 71+ messages in thread
From: Andre @ 2005-10-05 17:16 UTC (permalink / raw)



Georg Bauhaus wrote:
> Jeffrey R. Carter wrote:
> 
>> This is c.l.Ada. Ada uses begin and end. Therefore, we should use 
>> begin and end. QED.
> 
> 
> I beg to differ. Since this is c.l.Ada but not Ada, we should
> use some text markup, not Ada brackets around some confusingly
> illegal non-Ada text. ;-)
> 

So, Jeffrey will in future use the following notation:

--  <shameless plug>
-- 
--  ...
-- 
--  </shameless plug>
--
--  This is c.l.Ada. Ada uses begin and end. Therefore, we should use
--  begin and end. QED.


Then he can compile the c.l.ada.

Andre



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Ada Notation
  2005-10-05 17:16                             ` Andre
@ 2005-10-05 18:23                               ` Ludovic Brenta
  2005-10-05 18:24                               ` Jeffrey R. Carter
  1 sibling, 0 replies; 71+ messages in thread
From: Ludovic Brenta @ 2005-10-05 18:23 UTC (permalink / raw)


Andre <avsaway@hotmail.com> writes:

> Georg Bauhaus wrote:
>> Jeffrey R. Carter wrote:
>>
>>> This is c.l.Ada. Ada uses begin and end. Therefore, we should use
>>> begin and end. QED.
>> I beg to differ. Since this is c.l.Ada but not Ada, we should
>> use some text markup, not Ada brackets around some confusingly
>> illegal non-Ada text. ;-)
>>
>
> So, Jeffrey will in future use the following notation:
>
> --  <shameless plug>
> -- 
> --  ...
> -- 
> --  </shameless plug>

This is reminiscent of markup intended for consumption by AdaBrowse :)

-- 
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Ada Notation
  2005-10-05 17:16                             ` Andre
  2005-10-05 18:23                               ` Ludovic Brenta
@ 2005-10-05 18:24                               ` Jeffrey R. Carter
  1 sibling, 0 replies; 71+ messages in thread
From: Jeffrey R. Carter @ 2005-10-05 18:24 UTC (permalink / raw)


Andre wrote:

> So, Jeffrey will in future use the following notation:
> 
> --  <shameless plug>
 >
> Then he can compile the c.l.ada.

No. I will continue to use notation that I know people familiar with Ada know, 
without assuming familiarity with notations from other sources. For example, I 
won't use

{Beginning of my statement of intentions}
...
{End of my statement of intentions}

even though I'm sure everyone on c.l.ada must be familiar with the notation.

-- 
Jeff Carter
"We'll make Rock Ridge think it's a chicken
that got caught in a tractor's nuts!"
Blazing Saddles
87



^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: String filtering
  2005-10-03 17:59                     ` Robert A Duff
@ 2005-10-05 23:04                       ` Randy Brukardt
  0 siblings, 0 replies; 71+ messages in thread
From: Randy Brukardt @ 2005-10-05 23:04 UTC (permalink / raw)


"Robert A Duff" <bobduff@shell01.TheWorld.com> wrote in message
news:wccu0fya3c3.fsf@shell01.TheWorld.com...
...
> And there's some run-time overhead for access parameters -- they carry
> run-time accessibility-level info with them.

That's not just an overhead issue, but also potentially a safety issue, as
the accessibility checks are done at run-time. Sometimes you need that, but
the result is that the location of the call determines whether or not
Program_Error is raised -- which means that only testing (rather than the
Ada compiler) will ferret out problems. Wheras the same code with an in out
parameter would be illegal (and have no run-time overhead). Much better from
a safety perspective.

To give an example (using procedures so it's legal Ada code):

     type T is tagged ...;
     type AT is access all T;

     procedure P1 (O : in out T) is
        P : AT := O'access; -- Illegal, accessibility fails.
     begin
        ....
     end P1;

     procedure P2 (O : access T) is
       P : AT := O; -- Might raise Program_Error, depending on the point of
the call.
     begin
       ...
     end P2;

     Obj : T;
     Obj2 : aliased T; -- Extra aliased needed here, even though it has no
effect.

     P1 (Obj);
     P2 (Obj2'Access); -- Extra 'Access here, even though the code is
essentially the same.

Using access parameters clutters both the objects and the call site, adds
run-time overhead, and in Ada 200Y, even might add null checks (unless you
hadd even more clutter with "not null" in the parameter declaration).

And what's the difference? For a tagged type, the code generated (and the
evaluation order issues) for "in out" and "access" are essentially
identical, but one is allowed and the other isn't. Makes lots of sense.

> Furthermore, there's (annoyingly) no way to declare a formal parameter
> aliased.  So you use 'access' where 'in out' should suffice, or you
> declared tagged types that have no need for a tag.

There are no such types. :-) [OK, I'm kidding, but there certainly aren't
many such types.]

> Yes, there are workarounds for the lack of [in] out params on functions
> -- but they have global consequences on your code.

Yes,  indeed.

                        Randy.







^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2005-10-05 23:04 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-27  6:27 String filtering David Trudgett
2005-09-27  7:38 ` Jacob Sparre Andersen
2005-09-27  9:13   ` David Trudgett
2005-09-27  9:49     ` Dmitry A. Kazakov
2005-09-27 11:01       ` Martin Dowie
2005-09-27 11:12         ` Martin Dowie
2005-09-27 12:54           ` Dmitry A. Kazakov
2005-09-27 13:42             ` Martin Dowie
2005-09-27 14:24               ` Dmitry A. Kazakov
2005-09-28  0:06                 ` David Trudgett
2005-09-28  8:15                   ` Dmitry A. Kazakov
2005-09-28 10:39                     ` David Trudgett
2005-09-28 20:55                       ` Simon Wright
2005-09-28 21:53                         ` Martin Dowie
2005-09-28  9:08                   ` Jacob Sparre Andersen
2005-09-28  9:54                     ` David Trudgett
2005-09-29 14:05                       ` Georg Bauhaus
2005-10-01 19:02                         ` tmoran
2005-10-02  6:38                           ` David Trudgett
2005-10-02 14:11                             ` Martin Dowie
2005-10-02 22:40                               ` David Trudgett
2005-10-03  5:56                                 ` Martin Dowie
2005-10-03 10:33                           ` Georg Bauhaus
2005-09-28 18:21                   ` Jeffrey R. Carter
2005-09-28 21:00                   ` Simon Wright
2005-09-27 11:22         ` David Trudgett
2005-09-27 11:15       ` David Trudgett
2005-09-27 13:21         ` Dmitry A. Kazakov
2005-09-27 13:43           ` Martin Dowie
2005-09-28  0:51           ` David Trudgett
2005-09-28 12:02             ` Dmitry A. Kazakov
2005-09-28 13:25             ` Marc A. Criley
2005-09-29 22:42           ` Randy Brukardt
2005-09-30 17:54             ` Robert A Duff
2005-10-02  6:57               ` Steve Whalen
2005-10-02 14:14                 ` Martin Dowie
2005-10-03  1:21                 ` Robert A Duff
2005-10-03  7:44                   ` Jacob Sparre Andersen
2005-10-03  8:56                     ` Dmitry A. Kazakov
2005-10-03  9:25                       ` Jean-Pierre Rosen
2005-10-03 20:17                         ` Ada Notation Jeffrey R. Carter
2005-10-03 20:41                           ` Georg Bauhaus
2005-10-05 17:16                             ` Andre
2005-10-05 18:23                               ` Ludovic Brenta
2005-10-05 18:24                               ` Jeffrey R. Carter
2005-10-04 15:13                           ` brian.b.mcguinness
2005-10-04 17:00                     ` String filtering Robert A Duff
2005-10-05  8:19                       ` Jean-Pierre Rosen
2005-10-05 11:25                         ` Robert A Duff
2005-10-04 19:47                     ` Björn Persson
2005-10-05 14:14                       ` Dmitry A. Kazakov
2005-10-03 10:06                   ` Steve Whalen
2005-10-03 17:43                   ` tmoran
2005-10-03 17:59                     ` Robert A Duff
2005-10-05 23:04                       ` Randy Brukardt
2005-09-27 13:52         ` Jacob Sparre Andersen
2005-09-28  1:01           ` David Trudgett
2005-09-28  1:50             ` David Trudgett
2005-09-27 14:08         ` Georg Bauhaus
2005-09-27 14:09         ` Marc A. Criley
2005-09-28  1:09           ` David Trudgett
2005-09-28 21:09           ` Simon Wright
2005-09-27 17:59         ` tmoran
2005-09-28  1:20           ` David Trudgett
2005-09-27 17:47     ` Jeffrey R. Carter
2005-09-28  1:29       ` David Trudgett
2005-09-28 18:32         ` Jeffrey R. Carter
2005-09-27  7:41 ` tmoran
2005-09-27  9:17   ` David Trudgett
2005-09-28  1:54 ` Steve
2005-09-28  2:20   ` David Trudgett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox