comp.lang.ada
 help / color / mirror / Atom feed
* GNAT.Regpat problem.
@ 2011-03-22 18:35 Peter C. Chapin
  2011-03-22 19:01 ` Georg Bauhaus
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Peter C. Chapin @ 2011-03-22 18:35 UTC (permalink / raw)


I'm trying to use regular expressions in a program via the package 
GNAT.Regpat. Here is a reduced example:

with Ada.Text_IO; use Ada.Text_IO;
with GNAT.Regpat;

procedure Check is
    Matcher : GNAT.Regpat.Pattern_Matcher(10);

begin
    GNAT.Regpat.Compile(Matcher, "^[:space:]*xyzzy$");

exception
    when Storage_Error =>
       Put_Line("Insufficient space in Pattern_Matcher");

end Check;

The documentation for package GNAT.Regpat says this about the Compile 
procedure I think I'm using: "This function [sic] raises Storage_Error if 
Matcher is too small to hold the resulting code (i.e. Matcher.Size has too 
small a value)."

In the example above the size is intentionally set too small. My program 
fails with a PROGRAM_ERROR exception (EXCEPTION_ACCESS_VIOLATION). If I make 
the size of the Matcher just 1, the program hangs. If I make the size 
something reasonable, the program runs fine.

So is this an issue with GNAT.Regpat? It seems like it does not honor its 
documentation. Perhaps I'm doing something wrong.

Peter




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-22 18:35 GNAT.Regpat problem Peter C. Chapin
@ 2011-03-22 19:01 ` Georg Bauhaus
  2011-03-22 19:17 ` Florian Weimer
  2011-03-22 19:21 ` Adam Beneschan
  2 siblings, 0 replies; 14+ messages in thread
From: Georg Bauhaus @ 2011-03-22 19:01 UTC (permalink / raw)


On 3/22/11 7:35 PM, Peter C. Chapin wrote:
> with Ada.Text_IO; use Ada.Text_IO;
> with GNAT.Regpat;
>
> procedure Check is
>     Matcher : GNAT.Regpat.Pattern_Matcher(10);
>
> begin
>     GNAT.Regpat.Compile(Matcher, "^[:space:]*xyzzy$");
>
> exception
>     when Storage_Error =>
>        Put_Line("Insufficient space in Pattern_Matcher");
>
> end Check;

Does -fstack-check (-fno-stack-check) have an effect?

FWIW, I get

$ ./check
raised SYSTEM.REGPAT.EXPRESSION_ERROR : Pattern_Matcher is too small

with GNAT GPL 2010 on Mac OS X.  This agrees with the source of
Compile in s-regpat.adb, which has

       if Size > Matcher.Size then
          raise Expression_Error with "Pattern_Matcher is too small";
       end if;

Looks like a documentation bug in any case.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-22 18:35 GNAT.Regpat problem Peter C. Chapin
  2011-03-22 19:01 ` Georg Bauhaus
@ 2011-03-22 19:17 ` Florian Weimer
  2011-03-22 19:21 ` Adam Beneschan
  2 siblings, 0 replies; 14+ messages in thread
From: Florian Weimer @ 2011-03-22 19:17 UTC (permalink / raw)


* Peter C. Chapin:

> So is this an issue with GNAT.Regpat?

There used to be a buffer overflow (yes, the same thing that plagues C
programs) in GNAT.Regpat.  Which GNAT version do you use?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-22 18:35 GNAT.Regpat problem Peter C. Chapin
  2011-03-22 19:01 ` Georg Bauhaus
  2011-03-22 19:17 ` Florian Weimer
@ 2011-03-22 19:21 ` Adam Beneschan
  2011-03-22 20:31   ` Simon Wright
  2 siblings, 1 reply; 14+ messages in thread
From: Adam Beneschan @ 2011-03-22 19:21 UTC (permalink / raw)


On Mar 22, 11:35 am, "Peter C. Chapin" <PCha...@vtc.vsc.edu> wrote:
> I'm trying to use regular expressions in a program via the package
> GNAT.Regpat. Here is a reduced example:
>
> with Ada.Text_IO; use Ada.Text_IO;
> with GNAT.Regpat;
>
> procedure Check is
>     Matcher : GNAT.Regpat.Pattern_Matcher(10);
>
> begin
>     GNAT.Regpat.Compile(Matcher, "^[:space:]*xyzzy$");
>
> exception
>     when Storage_Error =>
>        Put_Line("Insufficient space in Pattern_Matcher");
>
> end Check;
>
> The documentation for package GNAT.Regpat says this about the Compile
> procedure I think I'm using: "This function [sic] raises Storage_Error if
> Matcher is too small to hold the resulting code (i.e. Matcher.Size has too
> small a value)."
>
> In the example above the size is intentionally set too small. My program
> fails with a PROGRAM_ERROR exception (EXCEPTION_ACCESS_VIOLATION). If I make
> the size of the Matcher just 1, the program hangs. If I make the size
> something reasonable, the program runs fine.
>
> So is this an issue with GNAT.Regpat? It seems like it does not honor its
> documentation. Perhaps I'm doing something wrong.

I'm not sure whether I'm working with the latest version.  However,
when I took the sources for this package from GNAT 4.5.2 and compiled
and ran them (with your test) using a different compiler, a slice
assignment in Emit_Class failed due to out-of-range bounds.  If that
range check were turned off, I can imagine that the result would be
havoc.

                                 -- Adam



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-22 19:21 ` Adam Beneschan
@ 2011-03-22 20:31   ` Simon Wright
  2011-03-24 10:23     ` Peter C. Chapin
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Wright @ 2011-03-22 20:31 UTC (permalink / raw)


Adam Beneschan <adam@irvine.com> writes:

> I'm not sure whether I'm working with the latest version.  However,
> when I took the sources for this package from GNAT 4.5.2 and compiled
> and ran them (with your test) using a different compiler, a slice
> assignment in Emit_Class failed due to out-of-range bounds.  If that
> range check were turned off, I can imagine that the result would be
> havoc.

There's certanly something fishy with 4.5.2; it appears not to notice
when it goes off the end, heaven knows where the extra elements of the
matcher go.

Looking at the GCC SVN log for s-regpat.adb, I see

2010-06-21  Emmanuel Briot  <briot@adacore.com>

	* s-regpat.adb (Link_Tail): Fix error when size of the pattern matcher
	is too small.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-22 20:31   ` Simon Wright
@ 2011-03-24 10:23     ` Peter C. Chapin
  2011-03-24 10:43       ` Dmitry A. Kazakov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter C. Chapin @ 2011-03-24 10:23 UTC (permalink / raw)


On Tue, 22 Mar 2011, Simon Wright wrote:

> Adam Beneschan <adam@irvine.com> writes:
>
> There's certanly something fishy with 4.5.2; it appears not to notice
> when it goes off the end, heaven knows where the extra elements of the
> matcher go.
>
> Looking at the GCC SVN log for s-regpat.adb, I see
>
> 2010-06-21  Emmanuel Briot  <briot@adacore.com>
>
> 	* s-regpat.adb (Link_Tail): Fix error when size of the pattern matcher
> 	is too small.

Thanks to all who replied to my original post. It seems like there is a 
known bug in GNAT.Regpat that affects certain GNAT versions. I can deal with 
that.

Thanks again.

Peter




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-24 10:23     ` Peter C. Chapin
@ 2011-03-24 10:43       ` Dmitry A. Kazakov
  2011-03-24 14:04         ` Peter C. Chapin
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry A. Kazakov @ 2011-03-24 10:43 UTC (permalink / raw)


On Thu, 24 Mar 2011 06:23:55 -0400, Peter C. Chapin wrote:

> It seems like there is a 
> known bug in GNAT.Regpat that affects certain GNAT versions. I can deal with 
> that.

Just curious, why are you using regular expressions when GNAT offers
Spitbol patterns?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-24 10:43       ` Dmitry A. Kazakov
@ 2011-03-24 14:04         ` Peter C. Chapin
  2011-03-24 14:34           ` Dmitry A. Kazakov
  2011-03-24 16:41           ` Georg Bauhaus
  0 siblings, 2 replies; 14+ messages in thread
From: Peter C. Chapin @ 2011-03-24 14:04 UTC (permalink / raw)


On Thu, 24 Mar 2011, Dmitry A. Kazakov wrote:

> Just curious, why are you using regular expressions when GNAT offers 
> Spitbol patterns?

Spitbol patterns seemed like overkill. Are you saying that there is no valid 
use case for the regular expression packages in GNAT's library (aside, 
perhaps, for legacy support)?

Peter



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-24 14:04         ` Peter C. Chapin
@ 2011-03-24 14:34           ` Dmitry A. Kazakov
  2011-03-24 16:20             ` Georg Bauhaus
  2011-03-24 21:12             ` Peter C. Chapin
  2011-03-24 16:41           ` Georg Bauhaus
  1 sibling, 2 replies; 14+ messages in thread
From: Dmitry A. Kazakov @ 2011-03-24 14:34 UTC (permalink / raw)


On Thu, 24 Mar 2011 10:04:43 -0400, Peter C. Chapin wrote:

> On Thu, 24 Mar 2011, Dmitry A. Kazakov wrote:
> 
>> Just curious, why are you using regular expressions when GNAT offers 
>> Spitbol patterns?
> 
> Spitbol patterns seemed like overkill.

In that case I would suggest wildcard patterns.

> Are you saying that there is no valid 
> use case for the regular expression packages in GNAT's library (aside, 
> perhaps, for legacy support)?

IMO regular expressions as a class of languages is far too weak for things
more complex than trivial, but utterly unnatural for issues like matching
*.adb. The syntax of RE is just horrific.

So, yes, it is difficult to find a case where RE could find place. And in
general, any patterns are unusable for syntax analyzers for many reasons, I
don't want to go into. A manually written scanner is simpler and safer.

For user-defined filters (e.g. for file search etc), lists of wild-card
patterns or some reduced form of BNF is IMO the best choice. The reason why
is that the user should instantly recognize what get matched and what does
not. Any patterns more complex than brain-dead wildcards fail here.

BTW,  there recently was an article criticizing overuse of REs in UNIX,
suggesting SNOBOL-like patterns instead. I don't remember where. Though it
does not tell anything one would not know 25 years ago.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-24 14:34           ` Dmitry A. Kazakov
@ 2011-03-24 16:20             ` Georg Bauhaus
  2011-03-24 17:37               ` Dmitry A. Kazakov
  2011-03-24 21:12             ` Peter C. Chapin
  1 sibling, 1 reply; 14+ messages in thread
From: Georg Bauhaus @ 2011-03-24 16:20 UTC (permalink / raw)


On 24.03.11 15:34, Dmitry A. Kazakov wrote:

> BTW,  there recently was an article criticizing overuse of REs in UNIX,
> suggesting SNOBOL-like patterns instead.

That's odd.  The creators of SNOBOL-4 and SPITBOL have
always say, "Don't use patterns!"

(Debugging clever uses of patterns isn't fun, all
the more when the patterns are powerful.)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-24 14:04         ` Peter C. Chapin
  2011-03-24 14:34           ` Dmitry A. Kazakov
@ 2011-03-24 16:41           ` Georg Bauhaus
  1 sibling, 0 replies; 14+ messages in thread
From: Georg Bauhaus @ 2011-03-24 16:41 UTC (permalink / raw)


On 24.03.11 15:04, Peter C. Chapin wrote:
> On Thu, 24 Mar 2011, Dmitry A. Kazakov wrote:
> 
>> Just curious, why are you using regular expressions when GNAT offers Spitbol
>> patterns?
> 
> Spitbol patterns seemed like overkill.

Just yesterday I wrote me a SPITBOL program to
rummage in log files.  That was quickly done,
performed as expected, and nothing got killed
except the bugs that got traced in the log.

The trick is to use simple patterns. (They are typed,
and can be debugged incrementally as needed. That's
unlike UNIX V7 RE strings, which either work, or don't
work, but how do you inject tracing the point of
failure?)

FTR, in SNOBOL-4 syntax,

        Space = " " CHAR(9)
        Pattern = POS(0) (SPAN(Space) | NULL) "xyzzy" RPOS(0)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-24 16:20             ` Georg Bauhaus
@ 2011-03-24 17:37               ` Dmitry A. Kazakov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry A. Kazakov @ 2011-03-24 17:37 UTC (permalink / raw)


On Thu, 24 Mar 2011 17:20:14 +0100, Georg Bauhaus wrote:

> On 24.03.11 15:34, Dmitry A. Kazakov wrote:
> 
>> BTW,  there recently was an article criticizing overuse of REs in UNIX,
>> suggesting SNOBOL-like patterns instead.
> 
> That's odd.  The creators of SNOBOL-4 and SPITBOL have
> always say, "Don't use patterns!"

But if you decided not to follow this wise advice, better use the SNOBOL
ones than REs.

> (Debugging clever uses of patterns isn't fun, all
> the more when the patterns are powerful.)

True. Any declarative approach to programming suffers this problem.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-24 14:34           ` Dmitry A. Kazakov
  2011-03-24 16:20             ` Georg Bauhaus
@ 2011-03-24 21:12             ` Peter C. Chapin
  2011-03-25  9:02               ` Dmitry A. Kazakov
  1 sibling, 1 reply; 14+ messages in thread
From: Peter C. Chapin @ 2011-03-24 21:12 UTC (permalink / raw)


On Thu, 24 Mar 2011, Dmitry A. Kazakov wrote:

> So, yes, it is difficult to find a case where RE could find place. And in 
> general, any patterns are unusable for syntax analyzers for many reasons, 
> I don't want to go into. A manually written scanner is simpler and safer.

I did notice that it was harder, I thought, to get high quality error 
reporting when using packaged REs. I also had a case that used a hand 
written finite state machine. The code was longer but the error reporting 
was very specific. I'm not saying it would be impossible to get the same 
error reporting using REs but it seemed unnatural. Right now my code using 
REs just reports "Unrecognized input" (or something to that effect) and 
doesn't bother trying to figure out what's unrecognized and why. It was no 
big deal to get the hand written FSM to do that, however.

Peter



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: GNAT.Regpat problem.
  2011-03-24 21:12             ` Peter C. Chapin
@ 2011-03-25  9:02               ` Dmitry A. Kazakov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry A. Kazakov @ 2011-03-25  9:02 UTC (permalink / raw)


On Thu, 24 Mar 2011 17:12:04 -0400, Peter C. Chapin wrote:

> On Thu, 24 Mar 2011, Dmitry A. Kazakov wrote:
> 
>> So, yes, it is difficult to find a case where RE could find place. And in 
>> general, any patterns are unusable for syntax analyzers for many reasons, 
>> I don't want to go into. A manually written scanner is simpler and safer.
> 
> I did notice that it was harder, I thought, to get high quality error 
> reporting when using packaged REs. I also had a case that used a hand 
> written finite state machine.

I prefer to split it into smaller elements (e.g. "comment," "literal") for
a recursive descent parser and then code them without FSM. I hate FSMs. I
have to use FSM when dealing with layered protocols, which makes me hating
it more and more.

> The code was longer but the error reporting 
> was very specific. I'm not saying it would be impossible to get the same 
> error reporting using REs but it seemed unnatural. Right now my code using 
> REs just reports "Unrecognized input" (or something to that effect) and 
> doesn't bother trying to figure out what's unrecognized and why. It was no 
> big deal to get the hand written FSM to do that, however.

That reminds me one case for patterns I forgot. When you get something
incomprehensible you might wish to jump forward to a place where parser
could recover. E.g. to the end of a comment, to the n-th closing bracket
etc (standard REs cannot count brackets). For this kind of superficial +
error tolerant analysis patterns indeed might be usable. A similar case
would be syntax coloring.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-03-25  9:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-22 18:35 GNAT.Regpat problem Peter C. Chapin
2011-03-22 19:01 ` Georg Bauhaus
2011-03-22 19:17 ` Florian Weimer
2011-03-22 19:21 ` Adam Beneschan
2011-03-22 20:31   ` Simon Wright
2011-03-24 10:23     ` Peter C. Chapin
2011-03-24 10:43       ` Dmitry A. Kazakov
2011-03-24 14:04         ` Peter C. Chapin
2011-03-24 14:34           ` Dmitry A. Kazakov
2011-03-24 16:20             ` Georg Bauhaus
2011-03-24 17:37               ` Dmitry A. Kazakov
2011-03-24 21:12             ` Peter C. Chapin
2011-03-25  9:02               ` Dmitry A. Kazakov
2011-03-24 16:41           ` Georg Bauhaus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox