* Data_Error and Enumeration_IO @ 2012-01-12 22:41 John McCormick 2012-01-12 23:02 ` Jeffrey Carter ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: John McCormick @ 2012-01-12 22:41 UTC (permalink / raw) I need some assistance in understanding the consumption of input characters when a Data_Error is raised during the input of an enumeration value. My experiments show that there is a different behavior depending on whether the illegal input consists of alphabetic characters or digits. Here is a short example to illustrate my ingorance. The following code fragment reads in an enumeration value using a Data_Error exception to catch invalid entries. type Choice_Type is (Encode, Decode); package Choice_IO is new Ada.Text_IO.Enumeration_IO (Enum => Choice_Type); loop Ada.Text_IO.Put_Line ("Would you like to encode or decode a message"); begin Choice_IO.Get (Choice); exit; -- the data validation loop exception when Ada.IO_Exceptions.Data_Error => Ada.Text_IO.Put_Line ("Invalid entry. Please enter Encode or Decode."); Ada.Text_IO.Skip_Line; -- Skip over the offending data end; end loop; This code is my usual pattern for data validation and works for whatever invalid input I enter. However, when my students leave out the call to Skip_Line in the Data_Error exception handler the behavior depends on the actual value entered. It still works as desired when the input is alphabetic characters such as “abc”. But when the input consists of digits such as “123” the loop becomes infinite, repeating the prompt and error message. It appears to me that the call to Choice_IO.Get consumes the alphabetic input but does not consume the digit input leaving it available for the next call to Get. I could not find anything in the language reference manual about how Enumeration_IO consumes input characters when Data_Error is raised. Can anyone give me an explanation? Is it implementation defined? I am running GNAT GPL 2011. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Data_Error and Enumeration_IO 2012-01-12 22:41 Data_Error and Enumeration_IO John McCormick @ 2012-01-12 23:02 ` Jeffrey Carter 2012-01-12 23:28 ` Georg Bauhaus 2012-01-13 0:10 ` Randy Brukardt 2 siblings, 0 replies; 10+ messages in thread From: Jeffrey Carter @ 2012-01-12 23:02 UTC (permalink / raw) On 01/12/2012 03:41 PM, John McCormick wrote: > > However, when my students leave out the call to Skip_Line in the > Data_Error exception handler the behavior depends on the actual value > entered. It still works as desired when the input is alphabetic > characters such as �abc�. But when the input consists of digits such > as �123� the loop becomes infinite, repeating the prompt and error > message. It appears to me that the call to Choice_IO.Get consumes the > alphabetic input but does not consume the digit input leaving it > available for the next call to Get. I could not find anything in the > language reference manual about how Enumeration_IO consumes input > characters when Data_Error is raised. Can anyone give me an > explanation? Is it implementation defined? I am running GNAT GPL > 2011. ARM A.10.10 says of Get, "After skipping any leading blanks, line terminators, or page terminators, reads an identifier according to the syntax of this lexical element (lower and upper case being considered equivalent), or a character literal according to the syntax of this lexical element (including the apostrophes)." Ignoring the part about character literals, it reads characters as long as they could be an identifier ("abc", for example; "a123" could also be an identifier). If the identifier so read is not a value of the type, you get Data_Error, but the characters have been skipped in the input. Input that begins with something that could not be the 1st character of an identifier is left to be read again, so "123" gives you the infinite loop. -- Jeff Carter "My legs are gray, my ears are gnarled, my eyes are old and bent." Monty Python's Life of Brian 81 --- Posted via news://freenews.netfront.net/ - Complaints to news@netfront.net --- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Data_Error and Enumeration_IO 2012-01-12 22:41 Data_Error and Enumeration_IO John McCormick 2012-01-12 23:02 ` Jeffrey Carter @ 2012-01-12 23:28 ` Georg Bauhaus 2012-01-13 0:10 ` Randy Brukardt 2 siblings, 0 replies; 10+ messages in thread From: Georg Bauhaus @ 2012-01-12 23:28 UTC (permalink / raw) On 1/12/12 11:41 PM, John McCormick wrote: > However, when my students leave out the call to Skip_Line in the > Data_Error exception handler the behavior depends on the actual value > entered. It still works as desired when the input is alphabetic > characters such as �abc�. But when the input consists of digits such > as �123� the loop becomes infinite, repeating the prompt and error > message. It appears to me that the call to Choice_IO.Get consumes the > alphabetic input but does not consume the digit input leaving it > available for the next call to Get. I could not find anything in the > language reference manual about how Enumeration_IO consumes input > characters when Data_Error is raised. Can anyone give me an > explanation? Is it implementation defined? I am running GNAT GPL > 2011. This scenario looks almost exactly like that of subsection 23.3.8, "Invalid Data", in section 23.3, "Text Input and Output", of [1]. If my understanding is correct. The subsection explains the infinite loop and also suggest Skip_Line. (These books are smarter than I am, which is most helpful, but OTOH this means I might be wrong in applying the text to your example.) The difference is that the book's example uses an instance of Float_IO and its example input text is "123.45E*". "Only those characters conforming to the syntax of the specified type are input. When the call to Get encounters the asterisk, Data_Error is raised since an asterisk does not conform to the syntax of a real number. The asterisk is never input by the call to Get, even though it has been examined by Get. The exception is handled, and "[some message/prompt]" is output. Upon each iteration, Get encounters the same invalid asterisk character, raises Data_Error, and so on ad infinitum. "(...) "In general, the Get routines of the four generic packages in Text_Io begin by skipping preceding blanks, tabs, line, and page terminators. A sequence of characters is then read until the syntax of values of the target type is no longer satisfied or a line terminator is encountered. The character sequence read is then checked to see that it is a syntactically legal value and a value of the target subtype. If either check fails, then Data_Error is raised. "(...) "A third solution is to use only Get_Line. The string read can then be analysed using the Get subprograms that read from strings." "(...) "[Ada83] 14.3.5(1 ..2, 5 .. 6, 10); 14.3..8)8 .. 11, 17 .. 19); 14.4(8)" HTH __ [1] (Author => (1 => "Mendal, Geoffrey O.", 2 => "Bryan, Douglas L."), Year => 1992, Title => "Exploring Ada", Volume => 2, Publisher => "Prentice-Hall, Inc.", ISBN => 0-13-297227-1) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Data_Error and Enumeration_IO 2012-01-12 22:41 Data_Error and Enumeration_IO John McCormick 2012-01-12 23:02 ` Jeffrey Carter 2012-01-12 23:28 ` Georg Bauhaus @ 2012-01-13 0:10 ` Randy Brukardt 2012-01-13 8:33 ` Dmitry A. Kazakov 2012-01-13 21:30 ` John McCormick 2 siblings, 2 replies; 10+ messages in thread From: Randy Brukardt @ 2012-01-13 0:10 UTC (permalink / raw) "John McCormick" <mccormick@cs.uni.edu> wrote in message news:3f3d626a-1b8c-49af-aa85-9e586029a817@z12g2000yqm.googlegroups.com... >I need some assistance in understanding the consumption of input >characters when a Data_Error is raised during the input of an >enumeration value. My experiments show that there is a different >behavior depending on whether the illegal input consists of alphabetic >characters or digits. Jeff and Georg already explained what is going on. But I have to admit I'm surprised that you aren't aware of this, since it has been a problem with Ada.Text_IO since the beginning of time (1980 in Ada's case). My recommendation is always to read the input into a string and then process it there (using the string Gets that the language provides). The reason for this is that it always a better error message in the failure case, because you still have the string in hand. That way, you can avoid a generic message that puzzles the user. [Aside: For some reason, this reminds me of the first C compiler I used, way back at the University of Wisconsin in 1978. It was a PDP-11 compiler for an early version of Unix, and it essentially had two error messages: "lvalue expected" for any compile-time mistake, and "bus error - core dumped" for any run-time mistake. Debugging programs using that compiler were almost completely trial-and-error - you would guess what the error might have been, and try something else to see if it fixed it. The compiler, and the fact that early PC compilers were very much like it, had a lot to do with our creating Janus/Ada a couple of years later. And that is why we always had runtime trace backs and verbose runtime messages from the very beginning...] Randy. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Data_Error and Enumeration_IO 2012-01-13 0:10 ` Randy Brukardt @ 2012-01-13 8:33 ` Dmitry A. Kazakov 2012-01-13 21:30 ` John McCormick 1 sibling, 0 replies; 10+ messages in thread From: Dmitry A. Kazakov @ 2012-01-13 8:33 UTC (permalink / raw) On Thu, 12 Jan 2012 18:10:37 -0600, Randy Brukardt wrote: > My recommendation is always to read the input into a string and then process > it there (using the string Gets that the language provides). The reason for > this is that it always a better error message in the failure case, because > you still have the string in hand. That way, you can avoid a generic message > that puzzles the user. Another reason is that you can always return back in the string and re-parse improperly matched parts of it. > [Aside: For some reason, this reminds me of the first C compiler I used, way > back at the University of Wisconsin in 1978. It was a PDP-11 compiler for an > early version of Unix, and it essentially had two error messages: "lvalue > expected" for any compile-time mistake, and "bus error - core dumped" for > any run-time mistake. Debugging programs using that compiler were almost > completely trial-and-error - you would guess what the error might have been, > and try something else to see if it fixed it. The compiler, and the fact > that early PC compilers were very much like it, had a lot to do with our > creating Janus/Ada a couple of years later. And that is why we always had > runtime trace backs and verbose runtime messages from the very beginning...] Early Turbo Parscal compilers had only one to say "error in expression." But it is still sometimes a problem in C++ that the error message incomprehensible. Then I would use the same technique I did 25 years ago: comment everything out until it compiles and then uncomment line by line compiling it each time. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Data_Error and Enumeration_IO 2012-01-13 0:10 ` Randy Brukardt 2012-01-13 8:33 ` Dmitry A. Kazakov @ 2012-01-13 21:30 ` John McCormick 2012-01-13 22:00 ` Jeffrey Carter 1 sibling, 1 reply; 10+ messages in thread From: John McCormick @ 2012-01-13 21:30 UTC (permalink / raw) On Jan 12, 6:10 pm, "Randy Brukardt" <ra...@rrsoftware.com> wrote: > > Jeff and Georg already explained what is going on. But I have to admit I'm > surprised that you aren't aware of this, since it has been a problem with > Ada.Text_IO since the beginning of time (1980 in Ada's case). Never be surprised about another's ignorance. I wasn't in the Ada world during the discussions in the 1980s. The only thing in Jeff's and Georg's notes of which I had not known was the reference to the material on Data_Error and real types. From that it seems that the parsing of a potential floating point number uses knowledge of previously entered characters. For example, if you enter "1.2E." it recognizes that the second decimal point is an error. Yet when you enter "abc" in my original example for enumeration IO, it cannot recognize that no valid enumeration literal starts with an a. It has to process the entire identifier before it can see that. So the crux of my question is why doesn't the consumption of characters for enumeration input behave like that of real input. It doesn't really matter other than to satisfy my curiosity. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Data_Error and Enumeration_IO 2012-01-13 21:30 ` John McCormick @ 2012-01-13 22:00 ` Jeffrey Carter 2012-01-14 0:09 ` Randy Brukardt 0 siblings, 1 reply; 10+ messages in thread From: Jeffrey Carter @ 2012-01-13 22:00 UTC (permalink / raw) On 01/13/2012 02:30 PM, John McCormick wrote: > > Never be surprised about another's ignorance. I wasn't in the Ada > world during the discussions in the 1980s. The only thing in Jeff's > and Georg's notes of which I had not known was the reference to the > material on Data_Error and real types. From that it seems that the > parsing of a potential floating point number uses knowledge of > previously entered characters. For example, if you enter "1.2E." it > recognizes that the second decimal point is an error. Yet when you > enter "abc" in my original example for enumeration IO, it cannot > recognize that no valid enumeration literal starts with an a. It has > to process the entire identifier before it can see that. So the crux > of my question is why doesn't the consumption of characters for > enumeration input behave like that of real input. It doesn't really > matter other than to satisfy my curiosity. The consumption of characters is the same in both case: they consume characters as long as the characters follow the syntax of a literal for the type class (the class of floating-point types in one case, of enumeration types in the other). For floating-point types, that's anything that's a valid real literal, even if it's outside the range of the type. For an enumeration it's anything that's a valid identifier or character literal. Equivalent examples to "1.2E." for an enumeration type are "'ab" or "a_*". Note that enumeration input will consume all of "'a'" even if the enumeration type has no character literals. -- Jeff Carter "Sir Robin the not-quite-so-brave-as-Sir-Lancelot, who had nearly fought the Dragon of Angnor, who nearly stood up to the vicious Chicken of Bristol, and who had personally wet himself at the Battle of Badon Hill." Monty Python & the Holy Grail 68 --- Posted via news://freenews.netfront.net/ - Complaints to news@netfront.net --- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Data_Error and Enumeration_IO 2012-01-13 22:00 ` Jeffrey Carter @ 2012-01-14 0:09 ` Randy Brukardt 2012-01-14 9:51 ` Dmitry A. Kazakov 2012-01-14 12:25 ` Niklas Holsti 0 siblings, 2 replies; 10+ messages in thread From: Randy Brukardt @ 2012-01-14 0:09 UTC (permalink / raw) "Jeffrey Carter" <spam.jrcarter.not@spam.not.acm.org> wrote in message news:jeq9ib$1ouu$1@adenine.netfront.net... > On 01/13/2012 02:30 PM, John McCormick wrote: >> >> Never be surprised about another's ignorance. I wasn't in the Ada >> world during the discussions in the 1980s. The only thing in Jeff's >> and Georg's notes of which I had not known was the reference to the >> material on Data_Error and real types. From that it seems that the >> parsing of a potential floating point number uses knowledge of >> previously entered characters. For example, if you enter "1.2E." it >> recognizes that the second decimal point is an error. Yet when you >> enter "abc" in my original example for enumeration IO, it cannot >> recognize that no valid enumeration literal starts with an a. It has >> to process the entire identifier before it can see that. So the crux >> of my question is why doesn't the consumption of characters for >> enumeration input behave like that of real input. It doesn't really >> matter other than to satisfy my curiosity. > > The consumption of characters is the same in both case: they consume > characters as long as the characters follow the syntax of a literal for > the type class (the class of floating-point types in one case, of > enumeration types in the other). For floating-point types, that's anything > that's a valid real literal, even if it's outside the range of the type. > For an enumeration it's anything that's a valid identifier or character > literal. Equivalent examples to "1.2E." for an enumeration type are "'ab" > or "a_*". Note that enumeration input will consume all of "'a'" even if > the enumeration type has no character literals. Right. The important point here is that the consumption of characters is not related in any way to the actual subtype being read; it only depends on the syntax of the appropriate literals. When you ask why "abc" is read even if there is no literal that begins with 'a', imagine a similar case for a real number: subtype Nines is Float range 9.0 .. 9.9; If a Get for Nines is given "1.2", this will be completely read even though no legal value of subtype Nines could start with '1'. As to why it is done this way, it's hard to imagine how else it could be done. To reject the "abc" example at the 'a', for instance, we would have to do a brute force search in a table of potentially hundreds of enumeration literals to see if any start with 'a', then repeat that to see if any start with "ab", and so on. If the literals are long and the number of literals is high, this is going to be a N**2 algorithm -- and I don't think I want my input to do that (there is a built-in denial of service possibility). Randy. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Data_Error and Enumeration_IO 2012-01-14 0:09 ` Randy Brukardt @ 2012-01-14 9:51 ` Dmitry A. Kazakov 2012-01-14 12:25 ` Niklas Holsti 1 sibling, 0 replies; 10+ messages in thread From: Dmitry A. Kazakov @ 2012-01-14 9:51 UTC (permalink / raw) On Fri, 13 Jan 2012 18:09:48 -0600, Randy Brukardt wrote: > As to why it is done this way, it's hard to imagine how else it could be > done. To reject the "abc" example at the 'a', for instance, we would have to > do a brute force search in a table of potentially hundreds of enumeration > literals to see if any start with 'a', then repeat that to see if any start > with "ab", and so on. If the literals are long and the number of literals is > high, this is going to be a N**2 algorithm -- and I don't think I want my > input to do that (there is a built-in denial of service possibility). You have the enumeration type when you are reading its literal. The actual problem is looking ahead or returning an unbounded number of characters back. The advantage of using strings is that you can easily maintain the policy that on any error the cursor is not moved. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Data_Error and Enumeration_IO 2012-01-14 0:09 ` Randy Brukardt 2012-01-14 9:51 ` Dmitry A. Kazakov @ 2012-01-14 12:25 ` Niklas Holsti 1 sibling, 0 replies; 10+ messages in thread From: Niklas Holsti @ 2012-01-14 12:25 UTC (permalink / raw) On 12-01-14 02:09 , Randy Brukardt wrote: >> On 01/13/2012 02:30 PM, John McCormick wrote: >>> ... >>> Yet when you >>> enter "abc" in my original example for enumeration IO, it cannot >>> recognize that no valid enumeration literal starts with an a. It has >>> to process the entire identifier before it can see that. So the crux >>> of my question is why doesn't the consumption of characters for >>> enumeration input behave like that of real input. It doesn't really >>> matter other than to satisfy my curiosity. > Right. The important point here is that the consumption of characters is not > related in any way to the actual subtype being read; it only depends on the > syntax of the appropriate literals. ... > As to why it is done this way, it's hard to imagine how else it could be > done. To reject the "abc" example at the 'a', for instance, we would have to > do a brute force search in a table of potentially hundreds of enumeration > literals to see if any start with 'a', then repeat that to see if any start > with "ab", and so on. If the literals are long and the number of literals is > high, this is going to be a N**2 algorithm -- and I don't think I want my > input to do that (there is a built-in denial of service possibility). While I am satisfied with the way that Ada does enumeration input now, the literals in an enumerated type could be arranged in a data structure (a trie) that would allow a (nearly) constant-time, character-by-character check that the input read so far can be a prefix of a literal of the given type or subtype. The complexity order of input would still be (nearly) linear in the number of characters. In fact, while the input scanning phase would no doubt be a bit slower, the final conversion from the text string into an enumeration value would need no additional time, so the whole input function might become faster, and the 'Value function might also become faster. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ . ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-01-14 12:25 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-01-12 22:41 Data_Error and Enumeration_IO John McCormick 2012-01-12 23:02 ` Jeffrey Carter 2012-01-12 23:28 ` Georg Bauhaus 2012-01-13 0:10 ` Randy Brukardt 2012-01-13 8:33 ` Dmitry A. Kazakov 2012-01-13 21:30 ` John McCormick 2012-01-13 22:00 ` Jeffrey Carter 2012-01-14 0:09 ` Randy Brukardt 2012-01-14 9:51 ` Dmitry A. Kazakov 2012-01-14 12:25 ` Niklas Holsti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox