Data_Error and Enumeration

comp.lang.ada
 help / color / mirror / Atom feed

* Data_Error and Enumeration_IO
@ 2012-01-12 22:41 John McCormick
  2012-01-12 23:02 ` Jeffrey Carter
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: John McCormick @ 2012-01-12 22:41 UTC (permalink / raw)


I need some assistance in understanding the consumption of input
characters when a Data_Error is raised during the input of an
enumeration value.  My experiments show that there is a different
behavior depending on whether the illegal input consists of alphabetic
characters or digits.

Here is a short example to illustrate my ingorance.  The following
code fragment reads in an enumeration value using a Data_Error
exception to catch invalid entries.


type    Choice_Type is (Encode, Decode);
package Choice_IO   is new Ada.Text_IO.Enumeration_IO (Enum =>
Choice_Type);

loop
   Ada.Text_IO.Put_Line ("Would you like to encode or decode a
message");
   begin
      Choice_IO.Get (Choice);
      exit; -- the data validation loop
   exception
      when Ada.IO_Exceptions.Data_Error =>
         Ada.Text_IO.Put_Line ("Invalid entry. Please enter Encode or
Decode.");
         Ada.Text_IO.Skip_Line;  -- Skip over the offending data
   end;
end loop;


This code is my usual pattern for data validation and works for
whatever invalid input I enter.

However, when my students leave out the call to Skip_Line in the
Data_Error exception handler the behavior depends on the actual value
entered.  It still works as desired when the input is alphabetic
characters such as “abc”.  But when the input consists of digits such
as “123” the loop becomes infinite, repeating the prompt and error
message.  It appears to me that the call to Choice_IO.Get consumes the
alphabetic input but does not consume the digit input leaving it
available for the next call to Get.  I could not find anything in the
language reference manual about how Enumeration_IO consumes input
characters when Data_Error is raised.  Can anyone give me an
explanation?  Is it implementation defined?  I am running GNAT GPL
2011.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data_Error and Enumeration_IO
  2012-01-12 22:41 Data_Error and Enumeration_IO John McCormick
@ 2012-01-12 23:02 ` Jeffrey Carter
  2012-01-12 23:28 ` Georg Bauhaus
  2012-01-13  0:10 ` Randy Brukardt
  2 siblings, 0 replies; 10+ messages in thread
From: Jeffrey Carter @ 2012-01-12 23:02 UTC (permalink / raw)


On 01/12/2012 03:41 PM, John McCormick wrote:
>
> However, when my students leave out the call to Skip_Line in the
> Data_Error exception handler the behavior depends on the actual value
> entered.  It still works as desired when the input is alphabetic
> characters such as ï¿½abcï¿½.  But when the input consists of digits such
> as ï¿½123ï¿½ the loop becomes infinite, repeating the prompt and error
> message.  It appears to me that the call to Choice_IO.Get consumes the
> alphabetic input but does not consume the digit input leaving it
> available for the next call to Get.  I could not find anything in the
> language reference manual about how Enumeration_IO consumes input
> characters when Data_Error is raised.  Can anyone give me an
> explanation?  Is it implementation defined?  I am running GNAT GPL
> 2011.

ARM A.10.10 says of Get, "After skipping any leading blanks, line terminators, 
or page terminators, reads an identifier according to the syntax of this lexical 
element (lower and upper case being considered equivalent), or a character 
literal according to the syntax of this lexical element (including the 
apostrophes)."

Ignoring the part about character literals, it reads characters as long as they 
could be an identifier ("abc", for example; "a123" could also be an identifier). 
If the identifier so read is not a value of the type, you get Data_Error, but 
the characters have been skipped in the input. Input that begins with something 
that could not be the 1st character of an identifier is left to be read again, 
so "123" gives you the infinite loop.

-- 
Jeff Carter
"My legs are gray, my ears are gnarled, my eyes are old and bent."
Monty Python's Life of Brian
81

--- Posted via news://freenews.netfront.net/ - Complaints to news@netfront.net ---



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data_Error and Enumeration_IO
  2012-01-12 22:41 Data_Error and Enumeration_IO John McCormick
  2012-01-12 23:02 ` Jeffrey Carter
@ 2012-01-12 23:28 ` Georg Bauhaus
  2012-01-13  0:10 ` Randy Brukardt
  2 siblings, 0 replies; 10+ messages in thread
From: Georg Bauhaus @ 2012-01-12 23:28 UTC (permalink / raw)

On 1/12/12 11:41 PM, John McCormick wrote:

> However, when my students leave out the call to Skip_Line in the
> Data_Error exception handler the behavior depends on the actual value
> entered.  It still works as desired when the input is alphabetic
> characters such as ï¿½abcï¿½.  But when the input consists of digits such
> as ï¿½123ï¿½ the loop becomes infinite, repeating the prompt and error
> message.  It appears to me that the call to Choice_IO.Get consumes the
> alphabetic input but does not consume the digit input leaving it
> available for the next call to Get.  I could not find anything in the
> language reference manual about how Enumeration_IO consumes input
> characters when Data_Error is raised.  Can anyone give me an
> explanation?  Is it implementation defined?  I am running GNAT GPL
> 2011.

This scenario looks almost exactly like that of subsection 23.3.8,
"Invalid Data", in section 23.3, "Text Input and Output", of [1].
If my understanding is correct. The subsection explains the infinite
loop and also suggest Skip_Line. (These books are smarter than I am,
which is most helpful, but OTOH this means I might be wrong
in applying the text to your example.)  The difference is that
the book's example uses an instance of Float_IO and its example input
text is "123.45E*".

"Only those characters conforming to the syntax of the specified
type are input.  When the call to Get encounters the asterisk,
Data_Error is raised since an asterisk does not conform to the
syntax of a real number.  The asterisk is never input by the call
to Get, even though it has been examined by Get.  The exception
is handled, and "[some message/prompt]" is output.  Upon each iteration,
Get encounters the same invalid asterisk character, raises Data_Error,
and so on ad infinitum.
  "(...)
"In general, the Get routines of the four generic packages in Text_Io
begin by skipping preceding blanks, tabs, line, and page terminators.
A sequence of characters is then read until the syntax of values
of the target type is no longer satisfied or a line terminator is
encountered. The character sequence read is then checked to see
that it is a syntactically legal value and a value of the target subtype.
If either check fails, then Data_Error is raised.
  "(...)
"A third solution is to use only Get_Line. The string read can then
be analysed using the Get subprograms that read from strings."
  "(...)
"[Ada83]  14.3.5(1 ..2, 5 .. 6, 10); 14.3..8)8 .. 11, 17 .. 19); 14.4(8)"

HTH
__
[1] (Author =>
         (1 => "Mendal, Geoffrey O.",
         2 => "Bryan, Douglas L."),
      Year => 1992,
      Title => "Exploring Ada",
      Volume => 2,
      Publisher => "Prentice-Hall, Inc.",
      ISBN => 0-13-297227-1)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data_Error and Enumeration_IO
  2012-01-12 22:41 Data_Error and Enumeration_IO John McCormick
  2012-01-12 23:02 ` Jeffrey Carter
  2012-01-12 23:28 ` Georg Bauhaus
@ 2012-01-13  0:10 ` Randy Brukardt
  2012-01-13  8:33   ` Dmitry A. Kazakov
  2012-01-13 21:30   ` John McCormick
  2 siblings, 2 replies; 10+ messages in thread
From: Randy Brukardt @ 2012-01-13  0:10 UTC (permalink / raw)


"John McCormick" <mccormick@cs.uni.edu> wrote in message 
news:3f3d626a-1b8c-49af-aa85-9e586029a817@z12g2000yqm.googlegroups.com...
>I need some assistance in understanding the consumption of input
>characters when a Data_Error is raised during the input of an
>enumeration value.  My experiments show that there is a different
>behavior depending on whether the illegal input consists of alphabetic
>characters or digits.

Jeff and Georg already explained what is going on. But I have to admit I'm 
surprised that you aren't aware of this, since it has been a problem with 
Ada.Text_IO since the beginning of time (1980 in Ada's case).

My recommendation is always to read the input into a string and then process 
it there (using the string Gets that the language provides). The reason for 
this is that it always a better error message in the failure case, because 
you still have the string in hand. That way, you can avoid a generic message 
that puzzles the user.

[Aside: For some reason, this reminds me of the first C compiler I used, way 
back at the University of Wisconsin in 1978. It was a PDP-11 compiler for an 
early version of Unix, and it essentially had two error messages: "lvalue 
expected" for any compile-time mistake, and "bus error - core dumped" for 
any run-time mistake. Debugging programs using that compiler were almost 
completely trial-and-error - you would guess what the error might have been, 
and try something else to see if it fixed it. The compiler, and the fact 
that early PC compilers were very much like it, had a lot to do with our 
creating Janus/Ada a couple of years later. And that is why we always had 
runtime trace backs and verbose runtime messages from the very beginning...]

                                               Randy.






^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data_Error and Enumeration_IO
  2012-01-13  0:10 ` Randy Brukardt
@ 2012-01-13  8:33   ` Dmitry A. Kazakov
  2012-01-13 21:30   ` John McCormick
  1 sibling, 0 replies; 10+ messages in thread
From: Dmitry A. Kazakov @ 2012-01-13  8:33 UTC (permalink / raw)


On Thu, 12 Jan 2012 18:10:37 -0600, Randy Brukardt wrote:

> My recommendation is always to read the input into a string and then process 
> it there (using the string Gets that the language provides). The reason for 
> this is that it always a better error message in the failure case, because 
> you still have the string in hand. That way, you can avoid a generic message 
> that puzzles the user.

Another reason is that you can always return back in the string and
re-parse improperly matched parts of it.
 
> [Aside: For some reason, this reminds me of the first C compiler I used, way 
> back at the University of Wisconsin in 1978. It was a PDP-11 compiler for an 
> early version of Unix, and it essentially had two error messages: "lvalue 
> expected" for any compile-time mistake, and "bus error - core dumped" for 
> any run-time mistake. Debugging programs using that compiler were almost 
> completely trial-and-error - you would guess what the error might have been, 
> and try something else to see if it fixed it. The compiler, and the fact 
> that early PC compilers were very much like it, had a lot to do with our 
> creating Janus/Ada a couple of years later. And that is why we always had 
> runtime trace backs and verbose runtime messages from the very beginning...]

Early Turbo Parscal compilers had only one to say "error in expression." 

But it is still sometimes a problem in C++ that the error message
incomprehensible. Then I would use the same technique I did 25 years ago:
comment everything out until it compiles and then uncomment line by line
compiling it each time.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data_Error and Enumeration_IO
  2012-01-13  0:10 ` Randy Brukardt
  2012-01-13  8:33   ` Dmitry A. Kazakov
@ 2012-01-13 21:30   ` John McCormick
  2012-01-13 22:00     ` Jeffrey Carter
  1 sibling, 1 reply; 10+ messages in thread
From: John McCormick @ 2012-01-13 21:30 UTC (permalink / raw)

On Jan 12, 6:10 pm, "Randy Brukardt" <ra...@rrsoftware.com> wrote:
>
> Jeff and Georg already explained what is going on. But I have to admit I'm
> surprised that you aren't aware of this, since it has been a problem with
> Ada.Text_IO since the beginning of time (1980 in Ada's case).

Never be surprised about another's ignorance.  I wasn't in the Ada
world during the discussions in the 1980s.  The only thing in Jeff's
and Georg's notes of which I had not known was the reference to the
material on Data_Error and real types.  From that it seems that the
parsing of a potential floating point number uses knowledge of
previously entered characters.  For example, if you enter "1.2E." it
recognizes that the second decimal point is an error.  Yet when you
enter "abc" in my original example for enumeration IO, it cannot
recognize that no valid enumeration literal starts with an a.  It has
to process the entire identifier before it can see that. So the crux
of my question is why doesn't the consumption of characters for
enumeration input behave like that of real input. It doesn't really
matter other than to satisfy my curiosity.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data_Error and Enumeration_IO
  2012-01-13 21:30   ` John McCormick
@ 2012-01-13 22:00     ` Jeffrey Carter
  2012-01-14  0:09       ` Randy Brukardt
  0 siblings, 1 reply; 10+ messages in thread
From: Jeffrey Carter @ 2012-01-13 22:00 UTC (permalink / raw)


On 01/13/2012 02:30 PM, John McCormick wrote:
>
> Never be surprised about another's ignorance.  I wasn't in the Ada
> world during the discussions in the 1980s.  The only thing in Jeff's
> and Georg's notes of which I had not known was the reference to the
> material on Data_Error and real types.  From that it seems that the
> parsing of a potential floating point number uses knowledge of
> previously entered characters.  For example, if you enter "1.2E." it
> recognizes that the second decimal point is an error.  Yet when you
> enter "abc" in my original example for enumeration IO, it cannot
> recognize that no valid enumeration literal starts with an a.  It has
> to process the entire identifier before it can see that. So the crux
> of my question is why doesn't the consumption of characters for
> enumeration input behave like that of real input. It doesn't really
> matter other than to satisfy my curiosity.

The consumption of characters is the same in both case: they consume characters 
as long as the characters follow the syntax of a literal for the type class (the 
class of floating-point types in one case, of enumeration types in the other). 
For floating-point types, that's anything that's a valid real literal, even if 
it's outside the range of the type. For an enumeration it's anything that's a 
valid identifier or character literal. Equivalent examples to "1.2E." for an 
enumeration type are "'ab" or "a_*". Note that enumeration input will consume 
all of "'a'" even if the enumeration type has no character literals.

-- 
Jeff Carter
"Sir Robin the not-quite-so-brave-as-Sir-Lancelot,
who had nearly fought the Dragon of Angnor,
who nearly stood up to the vicious Chicken of Bristol,
and who had personally wet himself at the
Battle of Badon Hill."
Monty Python & the Holy Grail
68

--- Posted via news://freenews.netfront.net/ - Complaints to news@netfront.net ---



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data_Error and Enumeration_IO
  2012-01-13 22:00     ` Jeffrey Carter
@ 2012-01-14  0:09       ` Randy Brukardt
  2012-01-14  9:51         ` Dmitry A. Kazakov
  2012-01-14 12:25         ` Niklas Holsti
  0 siblings, 2 replies; 10+ messages in thread
From: Randy Brukardt @ 2012-01-14  0:09 UTC (permalink / raw)

"Jeffrey Carter" <spam.jrcarter.not@spam.not.acm.org> wrote in message 
news:jeq9ib$1ouu$1@adenine.netfront.net...
> On 01/13/2012 02:30 PM, John McCormick wrote:
>>
>> Never be surprised about another's ignorance.  I wasn't in the Ada
>> world during the discussions in the 1980s.  The only thing in Jeff's
>> and Georg's notes of which I had not known was the reference to the
>> material on Data_Error and real types.  From that it seems that the
>> parsing of a potential floating point number uses knowledge of
>> previously entered characters.  For example, if you enter "1.2E." it
>> recognizes that the second decimal point is an error.  Yet when you
>> enter "abc" in my original example for enumeration IO, it cannot
>> recognize that no valid enumeration literal starts with an a.  It has
>> to process the entire identifier before it can see that. So the crux
>> of my question is why doesn't the consumption of characters for
>> enumeration input behave like that of real input. It doesn't really
>> matter other than to satisfy my curiosity.
>
> The consumption of characters is the same in both case: they consume 
> characters as long as the characters follow the syntax of a literal for 
> the type class (the class of floating-point types in one case, of 
> enumeration types in the other). For floating-point types, that's anything 
> that's a valid real literal, even if it's outside the range of the type. 
> For an enumeration it's anything that's a valid identifier or character 
> literal. Equivalent examples to "1.2E." for an enumeration type are "'ab" 
> or "a_*". Note that enumeration input will consume all of "'a'" even if 
> the enumeration type has no character literals.

Right. The important point here is that the consumption of characters is not 
related in any way to the actual subtype being read; it only depends on the 
syntax of the appropriate literals.

When you ask why "abc" is read even if there is no literal that begins with 
'a', imagine a similar case for a real number:

     subtype Nines is Float range 9.0 .. 9.9;

If a Get for Nines is given "1.2", this will be completely read even though 
no legal value of subtype Nines could start with '1'.

As to why it is done this way, it's hard to imagine how else it could be 
done. To reject the "abc" example at the 'a', for instance, we would have to 
do a brute force search in a table of potentially hundreds of enumeration 
literals to see if any start with 'a', then repeat that to see if any start 
with "ab", and so on. If the literals are long and the number of literals is 
high, this is going to be a N**2 algorithm -- and I don't think I want my 
input to do that (there is a built-in denial of service possibility).

                                                     Randy.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data_Error and Enumeration_IO
  2012-01-14  0:09       ` Randy Brukardt
@ 2012-01-14  9:51         ` Dmitry A. Kazakov
  2012-01-14 12:25         ` Niklas Holsti
  1 sibling, 0 replies; 10+ messages in thread
From: Dmitry A. Kazakov @ 2012-01-14  9:51 UTC (permalink / raw)


On Fri, 13 Jan 2012 18:09:48 -0600, Randy Brukardt wrote:

> As to why it is done this way, it's hard to imagine how else it could be 
> done. To reject the "abc" example at the 'a', for instance, we would have to 
> do a brute force search in a table of potentially hundreds of enumeration 
> literals to see if any start with 'a', then repeat that to see if any start 
> with "ab", and so on. If the literals are long and the number of literals is 
> high, this is going to be a N**2 algorithm -- and I don't think I want my 
> input to do that (there is a built-in denial of service possibility).

You have the enumeration type when you are reading its literal. The actual
problem is looking ahead or returning an unbounded number of characters
back. The advantage of using strings is that you can easily maintain the
policy that on any error the cursor is not moved.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data_Error and Enumeration_IO
  2012-01-14  0:09       ` Randy Brukardt
  2012-01-14  9:51         ` Dmitry A. Kazakov
@ 2012-01-14 12:25         ` Niklas Holsti
  1 sibling, 0 replies; 10+ messages in thread
From: Niklas Holsti @ 2012-01-14 12:25 UTC (permalink / raw)


On 12-01-14 02:09 , Randy Brukardt wrote:
>> On 01/13/2012 02:30 PM, John McCormick wrote:
>>>
...
>>>  Yet when you
>>> enter "abc" in my original example for enumeration IO, it cannot
>>> recognize that no valid enumeration literal starts with an a.  It has
>>> to process the entire identifier before it can see that. So the crux
>>> of my question is why doesn't the consumption of characters for
>>> enumeration input behave like that of real input. It doesn't really
>>> matter other than to satisfy my curiosity.

> Right. The important point here is that the consumption of characters is not
> related in any way to the actual subtype being read; it only depends on the
> syntax of the appropriate literals.
...
> As to why it is done this way, it's hard to imagine how else it could be
> done. To reject the "abc" example at the 'a', for instance, we would have to
> do a brute force search in a table of potentially hundreds of enumeration
> literals to see if any start with 'a', then repeat that to see if any start
> with "ab", and so on. If the literals are long and the number of literals is
> high, this is going to be a N**2 algorithm -- and I don't think I want my
> input to do that (there is a built-in denial of service possibility).

While I am satisfied with the way that Ada does enumeration input now, 
the literals in an enumerated type could be arranged in a data structure 
(a trie) that would allow a (nearly) constant-time, 
character-by-character check that the input read so far can be a prefix 
of a literal of the given type or subtype. The complexity order of input 
would still be (nearly) linear in the number of characters.

In fact, while the input scanning phase would no doubt be a bit slower, 
the final conversion from the text string into an enumeration value 
would need no additional time, so the whole input function might become 
faster, and the 'Value function might also become faster.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-01-14 12:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-12 22:41 Data_Error and Enumeration_IO John McCormick
2012-01-12 23:02 ` Jeffrey Carter
2012-01-12 23:28 ` Georg Bauhaus
2012-01-13  0:10 ` Randy Brukardt
2012-01-13  8:33   ` Dmitry A. Kazakov
2012-01-13 21:30   ` John McCormick
2012-01-13 22:00     ` Jeffrey Carter
2012-01-14  0:09       ` Randy Brukardt
2012-01-14  9:51         ` Dmitry A. Kazakov
2012-01-14 12:25         ` Niklas Holsti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox