comp.lang.ada
 help / color / mirror / Atom feed
* Easiest way to use redular expressions?
@ 2020-12-27  8:20 reinert
  2020-12-27  8:36 ` J-P. Rosen
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: reinert @ 2020-12-27  8:20 UTC (permalink / raw)


Hello,

I made the following hack to match a string with a regular expression
(using a named pipe and grep under linux):

procedure to_os (str : String) is
   package c renames Interfaces.C;
   procedure system_rk (source : in c.char_array);
   pragma Import (c, system_rk, "system");
begin
   system_rk (Interfaces.C.To_C (str));
end to_os;

function match1(S,P : String) return boolean is
   cfile1 : constant String := "regexp_pipe0";
   file1 : File_Type;
   str1 : constant String := "echo " & S & "| grep -ic " & P;
begin
   to_os(str1 & " > regexp_pipe0 &" );
   Open(file1,In_File,cfile1);
   return b : constant boolean := Natural'Value(get_line(file1)) > 0 do
      Close(file1);
   end return;
end match1;
-----------------------------------------
OK, I assume it somehow breaks the philosophy on Ada and security/reliability.  Could someone therefore show a better and more simple way to do this? gnat.expect?

reinert

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2020-12-27  8:20 Easiest way to use redular expressions? reinert
@ 2020-12-27  8:36 ` J-P. Rosen
  2020-12-27 11:14   ` Emmanuel Briot
  2020-12-28 21:07 ` Jeffrey R. Carter
  2021-01-05  1:31 ` Shark8
  2 siblings, 1 reply; 12+ messages in thread
From: J-P. Rosen @ 2020-12-27  8:36 UTC (permalink / raw)


Le 27/12/2020 à 09:20, reinert a écrit :
> OK, I assume it somehow breaks the philosophy on Ada and
> security/reliability.  Could someone therefore show a better and more
> simple way to do this? gnat.expect?

AdaControl uses Gnat.Regpat, and is quite happy with it...

-- 
J-P. Rosen
Adalog
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00
http://www.adalog.fr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2020-12-27  8:36 ` J-P. Rosen
@ 2020-12-27 11:14   ` Emmanuel Briot
  2020-12-27 20:31     ` oliverm...@gmail.com
  0 siblings, 1 reply; 12+ messages in thread
From: Emmanuel Briot @ 2020-12-27 11:14 UTC (permalink / raw)


On Sunday, December 27, 2020 at 9:36:51 AM UTC+1, J-P. Rosen wrote:
> AdaControl uses Gnat.Regpat, and is quite happy with it... 

GNAT.Regpat is a package I wrote 18 years ago or so (time flies..), basically manually translating C code from the Perl implementation of regular expressions.
Nowadays, I think it would be better to write a small binding to the pcre library (which has quite a simple API, so the binding should not be too hard). This will
provide much better performance, support for unicode, and a host of regexp features that are not supported by GNAT.Regpat.

Never did that while I was working for AdaCore because we would have ended up with too many regexp packages (there is also GNAT.Regexp, which is very
efficient but limited in features because it is based on a definite state machine).

I think libpcre might even be distributed with gcc nowadays, although I did not double-check so might be wrong.

This binding would be a nice small project for someone who wants to get started with writing Ada bindings

Emmanuel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2020-12-27 11:14   ` Emmanuel Briot
@ 2020-12-27 20:31     ` oliverm...@gmail.com
  2020-12-28  8:01       ` Per Sandberg
  0 siblings, 1 reply; 12+ messages in thread
From: oliverm...@gmail.com @ 2020-12-27 20:31 UTC (permalink / raw)


On Sunday, December 27, 2020 at 12:14:48 PM UTC+1, briot.e...@gmail.com wrote:
> [...]
> Nowadays, I think it would be better to write a small binding to the pcre library (which has quite a
> simple API, so the binding should not be too hard).

About "should not be too hard", if you mean a low level binding then I agree - given the wonders of `gcc -fdump-ada-spec`.
However, IMO the low level binding is quite ugly and cumbersome to use.
The tough part is to make a thick binding that is comfortable for Ada programmers.

> This binding would be a nice small project for someone who wants to get started with writing
> Ada bindings 

There are some questions attached:
- Target pcre or pcre2 ?
- Support all three character sizes UTF-8/16/32 ? In separate packages?

Oliver

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2020-12-27 20:31     ` oliverm...@gmail.com
@ 2020-12-28  8:01       ` Per Sandberg
  2020-12-28 13:58         ` Maxim Reznik
  0 siblings, 1 reply; 12+ messages in thread
From: Per Sandberg @ 2020-12-28  8:01 UTC (permalink / raw)


Had a small play and i would say
* pcre2
* three separate specs (Character, Wide_Character and 
Wide_Wide_Character) and since the interface is equal i would do the 
Character first and then use that one as a template generate the two others.
I got some very initial play hacks on:
   https://github.com/Ada-bindings-project/a-pcre

/P

On 27/12/2020 21:31, oliverm...@gmail.com wrote:
> On Sunday, December 27, 2020 at 12:14:48 PM UTC+1, briot.e...@gmail.com wrote:
>> [...]
>> Nowadays, I think it would be better to write a small binding to the pcre library (which has quite a
>> simple API, so the binding should not be too hard).
> 
> About "should not be too hard", if you mean a low level binding then I agree - given the wonders of `gcc -fdump-ada-spec`.
> However, IMO the low level binding is quite ugly and cumbersome to use.
> The tough part is to make a thick binding that is comfortable for Ada programmers.
> 
>> This binding would be a nice small project for someone who wants to get started with writing
>> Ada bindings
> 
> There are some questions attached:
> - Target pcre or pcre2 ?
> - Support all three character sizes UTF-8/16/32 ? In separate packages?
> 
> Oliver
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2020-12-28  8:01       ` Per Sandberg
@ 2020-12-28 13:58         ` Maxim Reznik
  0 siblings, 0 replies; 12+ messages in thread
From: Maxim Reznik @ 2020-12-28 13:58 UTC (permalink / raw)


The Matreshka library has rather advanced regexp engine with full Unicode support

https://forge.ada-ru.org/matreshka/wiki/League/Regexp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2020-12-27  8:20 Easiest way to use redular expressions? reinert
  2020-12-27  8:36 ` J-P. Rosen
@ 2020-12-28 21:07 ` Jeffrey R. Carter
  2020-12-31 10:29   ` reinert
  2021-01-05  1:31 ` Shark8
  2 siblings, 1 reply; 12+ messages in thread
From: Jeffrey R. Carter @ 2020-12-28 21:07 UTC (permalink / raw)


On 12/27/20 9:20 AM, reinert wrote:
> OK, I assume it somehow breaks the philosophy on Ada and security/reliability.  Could someone therefore show a better and more simple way to do this? gnat.expect?

You can use PragmARC.Matching.Regular_Expression or its instantiation for 
Character and String, PragmARC.Matching.Character_Regular_Expression

https://github.com/jrcarter/PragmARC/tree/Ada-12

-- 
Jeff Carter
"I was in love with a beautiful blonde once, dear.
She drove me to drink. That's the one thing I'm
indebted to her for."
Never Give a Sucker an Even Break
109

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2020-12-28 21:07 ` Jeffrey R. Carter
@ 2020-12-31 10:29   ` reinert
  0 siblings, 0 replies; 12+ messages in thread
From: reinert @ 2020-12-31 10:29 UTC (permalink / raw)


Thanks for response and hints.
reinert

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2020-12-27  8:20 Easiest way to use redular expressions? reinert
  2020-12-27  8:36 ` J-P. Rosen
  2020-12-28 21:07 ` Jeffrey R. Carter
@ 2021-01-05  1:31 ` Shark8
  2021-01-05  9:27   ` Dmitry A. Kazakov
  2 siblings, 1 reply; 12+ messages in thread
From: Shark8 @ 2021-01-05  1:31 UTC (permalink / raw)


> OK, I assume it somehow breaks the philosophy on Ada and security/reliability. Could someone therefore show a better and more simple way to do this? gnat.expect? 
> 
> reinert
In my career, about 90% of the paid programming has been maintenance.
As such, any time RegEx comes up, I am almost filled with dread: RegEx is terrible, overly limited, often part of a system that easily evolves beyond the constraints that are implied with RegEx (that of being a "regular language"). -- I would go so far as to even advise things like "compiler recognizers" *not* use RegEx.

I've found that an actual parsing system is far preferable to RegEx. In Byron, parsing is trivial:
  I : Integer renames Integer'Wide_Wide_Value( This_Value );
Where "This_Value" is the string in question.
Is it cheating to use the 'Value Attribute? Possibly, but this also provides for consistency between 'Value and the associated integer-value.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2021-01-05  1:31 ` Shark8
@ 2021-01-05  9:27   ` Dmitry A. Kazakov
  2021-01-05 10:46     ` Paul Rubin
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry A. Kazakov @ 2021-01-05  9:27 UTC (permalink / raw)


On 2021-01-05 02:31, Shark8 wrote:
>> OK, I assume it somehow breaks the philosophy on Ada and security/reliability. Could someone therefore show a better and more simple way to do this? gnat.expect?
>>
>> reinert
> In my career, about 90% of the paid programming has been maintenance.
> As such, any time RegEx comes up, I am almost filled with dread: RegEx is terrible, overly limited, often part of a system that easily evolves beyond the constraints that are implied with RegEx (that of being a "regular language"). -- I would go so far as to even advise things like "compiler recognizers" *not* use RegEx.

I agree. Regular expressions is almost always wrong choice. In general 
all patterns beyond wildcards * are.

If somebody is adamant to use patterns, then SNOBOL would be a better 
choice. It is cleaner, intuitive and more powerful than regular 
expressions. (There is a SPITBOL implementation in GNAT libraries. I 
also have an implementation with immediate assignment support, though in 
C with Ada bindings).

> I've found that an actual parsing system is far preferable to RegEx.

Yes, but it is upfront efforts people are shy of. It pays off, but 
later. When parsing/matching things, failures (errors) are more 
important that successful matches. Patterns are weak and tedious in 
working on failures, unbalanced brackets, misused underscores in 
literals, such stuff. In some ceases they cannot do it at all, in others 
they require a lot of recursion and backtracking becoming very 
inefficient. In short, using patterns is a quick start and endless 
headache later.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2021-01-05  9:27   ` Dmitry A. Kazakov
@ 2021-01-05 10:46     ` Paul Rubin
  2021-01-05 11:20       ` Dmitry A. Kazakov
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Rubin @ 2021-01-05 10:46 UTC (permalink / raw)


"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> If somebody is adamant to use patterns, then SNOBOL would be a better
> choice. It is cleaner, intuitive and more powerful than regular
> expressions. 

OMG, SNOBOL was fun, but these days, look at Parsec-style parser
combinators.  Parsec is a Haskell library but there are similar things
in other languages.

> (There is a SPITBOL implementation in GNAT libraries.

It is fun to look at, written in an abstract assembly language with a
SPITBOL program that translates it to various real assembly languages.
It is fun to read the source code, which is extremely well commented.

SPITBOL was originally written by Robert Dewar (of Adacore) and Ken
Belcher, fwiw.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Easiest way to use redular expressions?
  2021-01-05 10:46     ` Paul Rubin
@ 2021-01-05 11:20       ` Dmitry A. Kazakov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry A. Kazakov @ 2021-01-05 11:20 UTC (permalink / raw)


On 2021-01-05 11:46, Paul Rubin wrote:
> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> If somebody is adamant to use patterns, then SNOBOL would be a better
>> choice. It is cleaner, intuitive and more powerful than regular
>> expressions.
> 
> OMG, SNOBOL was fun, but these days, look at Parsec-style parser
> combinators.

The question was about pattern matching vs parsing. Proper parsing is 
clearly preferable to pattern fly-over.

> Parsec is a Haskell library but there are similar things
> in other languages.

Huh, a table driven recursive descent parser with expression inserts is 
all anybody ever needed. I am using this approach for decades 
implementing dozens of crazy domain-specific languages and other 
idiotisms like JSON. Nothing is better, IMO.

And the *best*: recursive descent is not declarative, not FP, not 
juggling standing on the head, it is as imperative and procedural as it 
goes.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-01-05 11:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-27  8:20 Easiest way to use redular expressions? reinert
2020-12-27  8:36 ` J-P. Rosen
2020-12-27 11:14   ` Emmanuel Briot
2020-12-27 20:31     ` oliverm...@gmail.com
2020-12-28  8:01       ` Per Sandberg
2020-12-28 13:58         ` Maxim Reznik
2020-12-28 21:07 ` Jeffrey R. Carter
2020-12-31 10:29   ` reinert
2021-01-05  1:31 ` Shark8
2021-01-05  9:27   ` Dmitry A. Kazakov
2021-01-05 10:46     ` Paul Rubin
2021-01-05 11:20       ` Dmitry A. Kazakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox