comp.lang.ada
 help / color / mirror / Atom feed
* character matching
@ 2004-08-13  5:23 John J
  2004-08-13 10:33 ` David C. Hoos
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: John J @ 2004-08-13  5:23 UTC (permalink / raw)


To develop a program that counts words and sentences I'm trying to write
some code that will recognise a word due to it being a series of
alphanumeric characters terminated by one or more of a space, comma,
fullstop, colon, exclamation mark or question mark. I'm going to use a case
to match the condition then perform an advance on a wordcounter.

I also need to include a case match that will recognise sentences by being a
sequence of words terminated by one or more of full stop, exclamation mark,
question mark or colon. The code also needs to be able to accept that there
maybe spaces between the last word of the sentence and the terminating stop
ie. "hope this works  !". I'm not sure on how to match these conditions and
would greatly appreciate some assistance. My skeletion code is as follows:

case input(Character) is
    when ???????       -- find word
        Word_Count:= Word_Count + 1.0;
    when ??????        -- find sentence
        Sentence_Count:= Sentence_Count + 1.0;
end case;

Thanks for any help





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: character matching
  2004-08-13  5:23 character matching John J
@ 2004-08-13 10:33 ` David C. Hoos
  2004-08-13 11:12 ` Nick Roberts
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: David C. Hoos @ 2004-08-13 10:33 UTC (permalink / raw)


Look into the GNAT.Regpat package which does pattern matching
of the sort you describe, and more.

"John J" <g_001@hotmail.com> wrote in message
news:gcYSc.818$Y24.419@cyclops.nntpserver.com...
> To develop a program that counts words and sentences I'm trying to write
> some code that will recognise a word due to it being a series of
> alphanumeric characters terminated by one or more of a space, comma,
> fullstop, colon, exclamation mark or question mark. I'm going to use a
case
> to match the condition then perform an advance on a wordcounter.
>
> I also need to include a case match that will recognise sentences by being
a
> sequence of words terminated by one or more of full stop, exclamation
mark,
> question mark or colon. The code also needs to be able to accept that
there
> maybe spaces between the last word of the sentence and the terminating
stop
> ie. "hope this works  !". I'm not sure on how to match these conditions
and
> would greatly appreciate some assistance. My skeletion code is as follows:
>
> case input(Character) is
>     when ???????       -- find word
>         Word_Count:= Word_Count + 1.0;
>     when ??????        -- find sentence
>         Sentence_Count:= Sentence_Count + 1.0;
> end case;
>
> Thanks for any help
>
>
> _______________________________________________
> comp.lang.ada mailing list
> comp.lang.ada@ada-france.org
> http://www.ada-france.org/mailman/listinfo/comp.lang.ada
>
>




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: character matching
  2004-08-13  5:23 character matching John J
  2004-08-13 10:33 ` David C. Hoos
@ 2004-08-13 11:12 ` Nick Roberts
  2004-08-13 18:41 ` Jeffrey Carter
  2004-08-15 12:36 ` John J
  3 siblings, 0 replies; 9+ messages in thread
From: Nick Roberts @ 2004-08-13 11:12 UTC (permalink / raw)


On Fri, 13 Aug 2004 14:53:07 +0930, John J <g_001@hotmail.com> wrote:

> To develop a program that counts words and sentences I'm trying to
> write some code that will recognise a word due to it being a series
> of alphanumeric characters terminated by one or more of a space,
> comma, fullstop, colon, exclamation mark or question mark. I'm going
> to use a case to match the condition then perform an advance on a
> wordcounter.

Possibly a slightly easier definition of a word would be any
contiguous sequence of characters each of which is a letter, digit,
or hyphen (or apostrophe)?

Is this a teaching exercise (e.g. at university)?

In any event, your problem is not really Ada-related, but one of
'algorithm synthesis'. Frankly I've always had trouble with algorithm
synthesis myself, so you have a lot sympathy from me.

However, I'm afraid there is only one possible way to create a new
algorithm that works: invent an algorithm in your head (or on paper,
in any notation that suits you); 'dry run' it, in your head or on
paper, using what you think are representative test data; code it up
into Ada; run it on some test data!

Sometimes it helps to write out some test data on paper, and then
just think about how you would intuitively set about counting the
words by hand. This can give you some clues as to how a program
should do it.

A couple of hints: try to keep your algorithm/program as simple as
possible. If it starts getting complex, that may be a sign that you
need to make a big change to your approach; don't forget the 'corner
cases', such as the beginning and end of a line and the file.

A technique I use a lot is to initially write the program so that it
outputs lots of messages at each point detailing what the values of
relevant variables are and so on. You can suffer from 'spewage' (one
trick is to capture output into a file and then view the file), but
I find this idea can be invaluable in showing how a program is
working (or failing).

In the end, there's only one way to learn how to do this stuff, and
that's the hard way, by trial and error.

Assuming it's a formal execise, you'll impress your professor by
including some (brief) documentation about: what approaches you
tried and rejected; how (you think) the approach you decided to use
works; any unusual details about how to (compile and) use your
program; (very briefly) any limitations you know it has,
improvements you think could be made, other comments.

-- 
HTH, Nick Roberts



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: character matching
  2004-08-13  5:23 character matching John J
  2004-08-13 10:33 ` David C. Hoos
  2004-08-13 11:12 ` Nick Roberts
@ 2004-08-13 18:41 ` Jeffrey Carter
  2004-08-15 12:36 ` John J
  3 siblings, 0 replies; 9+ messages in thread
From: Jeffrey Carter @ 2004-08-13 18:41 UTC (permalink / raw)


John J wrote:

> To develop a program that counts words and sentences I'm trying to
> write some code that will recognise a word due to it being a series
> of alphanumeric characters terminated by one or more of a space,
> comma, fullstop, colon, exclamation mark or question mark. I'm going
> to use a case to match the condition then perform an advance on a
> wordcounter.
> 
> I also need to include a case match that will recognise sentences by
> being a sequence of words terminated by one or more of full stop,
> exclamation mark, question mark or colon. The code also needs to be
> able to accept that there maybe spaces between the last word of the
> sentence and the terminating stop ie. "hope this works  !". I'm not
> sure on how to match these conditions and would greatly appreciate
> some assistance. My skeletion code is as follows:

This certainly sounds like a homework assignment.

A good approach to this is to consider it a state machine with 2 states: 
you're either in a word or not in a word; initially you're not in a 
word. When you're not in a word, characters that terminate a word are 
junk and leave you in the same state; characters that can be a word put 
you into the in-a-word state. When you're in a word, characters that can 
be a word leave you in the same state; characters that terminate a word 
put you into the not-in-a-word state and terminators that terminate a 
sentence can also increment the sentence count. This kind of approach 
can help you consider "Help me!!!" as one sentence rather than 3.

-- 
Jeff Carter
"Ada has made you lazy and careless. You can write programs in C that
are just as safe by the simple application of super-human diligence."
E. Robert Tisdale
72




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: character matching
  2004-08-13  5:23 character matching John J
                   ` (2 preceding siblings ...)
  2004-08-13 18:41 ` Jeffrey Carter
@ 2004-08-15 12:36 ` John J
  2004-08-15 14:52   ` Ludovic Brenta
  2004-08-15 17:21   ` Steve
  3 siblings, 2 replies; 9+ messages in thread
From: John J @ 2004-08-15 12:36 UTC (permalink / raw)


Thanks for the suggestions; however, I'm trying to learn a bit about the
syntax and capabilities of ADA. Would someone be kind enough to give me some
examples of how I can use ADA to character match. ie, different ways I can
use '*', '&' to successfully recognise words and sentences.

Thanks





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: character matching
  2004-08-15 12:36 ` John J
@ 2004-08-15 14:52   ` Ludovic Brenta
  2004-08-15 21:02     ` Adrian Knoth
  2004-08-15 17:21   ` Steve
  1 sibling, 1 reply; 9+ messages in thread
From: Ludovic Brenta @ 2004-08-15 14:52 UTC (permalink / raw)


"John J" writes:
> Thanks for the suggestions; however, I'm trying to learn a bit about
> the syntax and capabilities of ADA. Would someone be kind enough to
> give me some examples of how I can use ADA to character match. ie,
> different ways I can use '*', '&' to successfully recognise words
> and sentences.
>
> Thanks

type Category is (Whitespace, Punctuation, Letter, Digit, Other);

function Category_Of (C : in Character) return Category is
begin
   case C is
      when ' ' | ASCII.TAB =>                     return Whitespace;
      when ',' | '.' | '!' | ';' | ':' | '?' =>   return Punctuation;
      when 'a' .. 'z' | 'A' .. 'Z' =>             return Letter;
      when '0' .. '9' =>                          return Digit;
      when others =>                              return Other;
   end case;
end Category_Of;

I hope this helps you move forward.  Is this a homework assignment?

(note that in Ada, a "case" statement is required to process all
possible values of the case_expression (here, C); the compiler will
tell you if you forgot some values, unless as above you use "when
others").

-- 
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: character matching
  2004-08-15 12:36 ` John J
  2004-08-15 14:52   ` Ludovic Brenta
@ 2004-08-15 17:21   ` Steve
  1 sibling, 0 replies; 9+ messages in thread
From: Steve @ 2004-08-15 17:21 UTC (permalink / raw)


First have a look at the standard library Ada.Characters.Handling
   You'll find goodies such as:

    function Is_Alphanumeric      (Item : in Character) return Boolean;

Then have alook at Ada.Strings.Maps.  There you'll find:

  function Is_In (Element : in Character;
                         Set     : in Character_Set)
  return Boolean;

I always recommend perusing the standard Ada library headers described in
Annex A of the Ada 95 reference manual.  You'll find lots of tools that do
the grunt work for you.

Steve
(The Duck)


"John J" <g_001@hotmail.com> wrote in message
news:uNITc.3402$BA5.883@hydra.nntpserver.com...
> Thanks for the suggestions; however, I'm trying to learn a bit about the
> syntax and capabilities of ADA. Would someone be kind enough to give me
some
> examples of how I can use ADA to character match. ie, different ways I can
> use '*', '&' to successfully recognise words and sentences.
>
> Thanks
>
>





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: character matching
  2004-08-15 14:52   ` Ludovic Brenta
@ 2004-08-15 21:02     ` Adrian Knoth
  2004-08-16 22:26       ` Ludovic Brenta
  0 siblings, 1 reply; 9+ messages in thread
From: Adrian Knoth @ 2004-08-15 21:02 UTC (permalink / raw)


Ludovic Brenta <ludovic.brenta@insalien.org> wrote:

> type Category is (Whitespace, Punctuation, Letter, Digit, Other);

I support Steve's suggestion. Yours has the advantage of showing
how simple things can be done. I guess this is useful for learning
purposes but may contain more mistakes than the Annex-A-solution ;)


>       when others =>                              return Other;

> (note that in Ada, a "case" statement is required to process all
> possible values of the case_expression (here, C); the compiler will
> tell you if you forgot some values, unless as above you use "when
> others").

Which is considered BAD because when you change the range of a type
your case still works but may misbehave for the new values. Without
the "when others"-line the compiler forces you to adapt your routines
to the new range.

I know you know that, it's just for the original poster (John?).


-- 
mail: adi@thur.de  	http://adi.thur.de	PGP: v2-key via keyserver

Wie kommts das am Ende des Geldes noch soviel Monat �brig ist?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: character matching
  2004-08-15 21:02     ` Adrian Knoth
@ 2004-08-16 22:26       ` Ludovic Brenta
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Brenta @ 2004-08-16 22:26 UTC (permalink / raw)


Adrian Knoth writes:
> From: Adrian Knoth Subject: Re: character matching
> Newsgroups: comp.lang.ada
> Date: 15 Aug 2004 21:02:21 GMT
> Organization: loris.TV
>
> Ludovic Brenta wrote:
>
>> type Category is (Whitespace, Punctuation, Letter, Digit, Other);
>
> I support Steve's suggestion. Yours has the advantage of showing how
> simple things can be done. I guess this is useful for learning
> purposes but may contain more mistakes than the Annex-A-solution ;)

Thanks; that was exactly my intention.  Of course I also support
Steve's suggestion.  In fact, my quick-and-dirty example did have
shortcomings; for one thing I did not even attempt to get the list of
punctuation characters complete.

-- 
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-08-16 22:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-13  5:23 character matching John J
2004-08-13 10:33 ` David C. Hoos
2004-08-13 11:12 ` Nick Roberts
2004-08-13 18:41 ` Jeffrey Carter
2004-08-15 12:36 ` John J
2004-08-15 14:52   ` Ludovic Brenta
2004-08-15 21:02     ` Adrian Knoth
2004-08-16 22:26       ` Ludovic Brenta
2004-08-15 17:21   ` Steve

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox