comp.lang.ada
 help / color / mirror / Atom feed
* Stripping html from a string
@ 2004-02-22  0:13 wave
  2004-02-22  0:45 ` Georg Bauhaus
  0 siblings, 1 reply; 4+ messages in thread
From: wave @ 2004-02-22  0:13 UTC (permalink / raw)


Hello, I was wondering if anybody knew of a function lying around that
would return a given string with any html tags in it stripped.

I've had a look at Gnat.regexp, but for some reasons it's not liking
my regular expressions at all which 'should' strip the html.

Here is some of my example code:


with Ada.Text_Io, Gnat.Regexp;
use Ada.Text_Io, Gnat.Regexp;

procedure Regex is 

   procedure Testmatch (
         Re : Regexp; 
         S  : String  ) is 
   begin
      if Match( S, Re ) then
         Put_Line( S & " matches the expression" );
      else
         Put_Line( S & " doesn't match the expression" );
      end if;
   end Testmatch;

   Criteria : Regexp;  

begin
   Put_Line( "This program demonstrates GNAT's regular expression" );
   Put_Line( "capabilities. These are used to find text that match" );
   Put_Line( "a certain pattern." );
   New_Line;
  
   Criteria := Compile("<([A-Z][A-Z0-9]*)[^>]*></\1>", False, True);
   
   Testmatch( Criteria, "hello world" );
   Testmatch( Criteria, "<a
href=""http://www.helloworld.org/"">hello</a>" );
   Testmatch( Criteria, "<b>hello, world</b>" );
   Testmatch( Criteria, "some random text" );


end Regex;


Any input in this matter would be greatly appreciated.


Mut.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Stripping html from a string
  2004-02-22  0:13 Stripping html from a string wave
@ 2004-02-22  0:45 ` Georg Bauhaus
  2004-02-22 11:07   ` wave
  0 siblings, 1 reply; 4+ messages in thread
From: Georg Bauhaus @ 2004-02-22  0:45 UTC (permalink / raw)


wave <mutilation@bonbon.net> wrote:
:  
:   Criteria := Compile("<([A-Z][A-Z0-9]*)[^>]*></\1>", False, True);

What "output" to you expect? AFAIKS, input will have to be rather
mute, so to speak, in order to match.


-- Georg



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Stripping html from a string
  2004-02-22  0:45 ` Georg Bauhaus
@ 2004-02-22 11:07   ` wave
  2004-02-22 16:17     ` Georg Bauhaus
  0 siblings, 1 reply; 4+ messages in thread
From: wave @ 2004-02-22 11:07 UTC (permalink / raw)


Georg Bauhaus <sb463ba@l1-hrz.uni-duisburg.de> wrote in message news:<c18u2p$gls$1@a1-hrz.uni-duisburg.de>...
> wave <mutilation@bonbon.net> wrote:
> :  
> :   Criteria := Compile("<([A-Z][A-Z0-9]*)[^>]*></\1>", False, True);
> 
> What "output" to you expect? AFAIKS, input will have to be rather
> mute, so to speak, in order to match.
> 
> 
> -- Georg

Oh, sorry, the code I gave was just to test the regular expression
pattern. Gnat is just throwing back an error with the pattern, if I
could get it working correctly then I could arrange the stripping of
the string.

Cheers,
Mut.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Stripping html from a string
  2004-02-22 11:07   ` wave
@ 2004-02-22 16:17     ` Georg Bauhaus
  0 siblings, 0 replies; 4+ messages in thread
From: Georg Bauhaus @ 2004-02-22 16:17 UTC (permalink / raw)


wave <mutilation@bonbon.net> wrote:
:  Gnat is just throwing back an error with the pattern, if I
: could get it working correctly then I could arrange the stripping of
: the string.

Other than the line break that the news reader shows in your
source, I only noticed the backreference \1. I think this is
not provided in the simple regex matching package.


-- Georg



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-02-22 16:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-22  0:13 Stripping html from a string wave
2004-02-22  0:45 ` Georg Bauhaus
2004-02-22 11:07   ` wave
2004-02-22 16:17     ` Georg Bauhaus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox