From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,f93edcfe54a071cf,start X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2004-02-21 16:14:00 PST Path: archiver1.google.com!postnews1.google.com!not-for-mail From: mutilation@bonbon.net (wave) Newsgroups: comp.lang.ada Subject: Stripping html from a string Date: 21 Feb 2004 16:13:59 -0800 Organization: http://groups.google.com Message-ID: <4d01ad29.0402211613.34d2ebcd@posting.google.com> NNTP-Posting-Host: 62.252.128.11 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: posting.google.com 1077408840 24383 127.0.0.1 (22 Feb 2004 00:14:00 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Sun, 22 Feb 2004 00:14:00 +0000 (UTC) Xref: archiver1.google.com comp.lang.ada:5715 Date: 2004-02-21T16:13:59-08:00 List-Id: Hello, I was wondering if anybody knew of a function lying around that would return a given string with any html tags in it stripped. I've had a look at Gnat.regexp, but for some reasons it's not liking my regular expressions at all which 'should' strip the html. Here is some of my example code: with Ada.Text_Io, Gnat.Regexp; use Ada.Text_Io, Gnat.Regexp; procedure Regex is procedure Testmatch ( Re : Regexp; S : String ) is begin if Match( S, Re ) then Put_Line( S & " matches the expression" ); else Put_Line( S & " doesn't match the expression" ); end if; end Testmatch; Criteria : Regexp; begin Put_Line( "This program demonstrates GNAT's regular expression" ); Put_Line( "capabilities. These are used to find text that match" ); Put_Line( "a certain pattern." ); New_Line; Criteria := Compile("<([A-Z][A-Z0-9]*)[^>]*>", False, True); Testmatch( Criteria, "hello world" ); Testmatch( Criteria, "hello" ); Testmatch( Criteria, "hello, world" ); Testmatch( Criteria, "some random text" ); end Regex; Any input in this matter would be greatly appreciated. Mut. From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,f93edcfe54a071cf X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2004-02-21 16:45:15 PST Path: archiver1.google.com!news2.google.com!news.maxwell.syr.edu!eusc.inter.net!cs.tu-berlin.de!uni-duisburg.de!not-for-mail From: Georg Bauhaus Newsgroups: comp.lang.ada Subject: Re: Stripping html from a string Date: Sun, 22 Feb 2004 00:45:14 +0000 (UTC) Organization: GMUGHDU Message-ID: References: <4d01ad29.0402211613.34d2ebcd@posting.google.com> NNTP-Posting-Host: l1-hrz.uni-duisburg.de X-Trace: a1-hrz.uni-duisburg.de 1077410714 17084 134.91.1.34 (22 Feb 2004 00:45:14 GMT) X-Complaints-To: usenet@news.uni-duisburg.de NNTP-Posting-Date: Sun, 22 Feb 2004 00:45:14 +0000 (UTC) User-Agent: tin/1.5.8-20010221 ("Blue Water") (UNIX) (HP-UX/B.11.00 (9000/800)) Xref: archiver1.google.com comp.lang.ada:5717 Date: 2004-02-22T00:45:14+00:00 List-Id: wave wrote: : : Criteria := Compile("<([A-Z][A-Z0-9]*)[^>]*>", False, True); What "output" to you expect? AFAIKS, input will have to be rather mute, so to speak, in order to match. -- Georg From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,f93edcfe54a071cf X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2004-02-22 03:07:59 PST Path: archiver1.google.com!postnews1.google.com!not-for-mail From: mutilation@bonbon.net (wave) Newsgroups: comp.lang.ada Subject: Re: Stripping html from a string Date: 22 Feb 2004 03:07:58 -0800 Organization: http://groups.google.com Message-ID: <4d01ad29.0402220307.1168a8ae@posting.google.com> References: <4d01ad29.0402211613.34d2ebcd@posting.google.com> NNTP-Posting-Host: 62.252.128.11 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: posting.google.com 1077448079 32412 127.0.0.1 (22 Feb 2004 11:07:59 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Sun, 22 Feb 2004 11:07:59 +0000 (UTC) Xref: archiver1.google.com comp.lang.ada:5725 Date: 2004-02-22T03:07:58-08:00 List-Id: Georg Bauhaus wrote in message news:... > wave wrote: > : > : Criteria := Compile("<([A-Z][A-Z0-9]*)[^>]*>", False, True); > > What "output" to you expect? AFAIKS, input will have to be rather > mute, so to speak, in order to match. > > > -- Georg Oh, sorry, the code I gave was just to test the regular expression pattern. Gnat is just throwing back an error with the pattern, if I could get it working correctly then I could arrange the stripping of the string. Cheers, Mut. From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,f93edcfe54a071cf X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2004-02-22 08:17:34 PST Path: archiver1.google.com!news2.google.com!fu-berlin.de!cs.tu-berlin.de!uni-duisburg.de!not-for-mail From: Georg Bauhaus Newsgroups: comp.lang.ada Subject: Re: Stripping html from a string Date: Sun, 22 Feb 2004 16:17:33 +0000 (UTC) Organization: GMUGHDU Message-ID: References: <4d01ad29.0402211613.34d2ebcd@posting.google.com> <4d01ad29.0402220307.1168a8ae@posting.google.com> NNTP-Posting-Host: l1-hrz.uni-duisburg.de X-Trace: a1-hrz.uni-duisburg.de 1077466653 4791 134.91.1.34 (22 Feb 2004 16:17:33 GMT) X-Complaints-To: usenet@news.uni-duisburg.de NNTP-Posting-Date: Sun, 22 Feb 2004 16:17:33 +0000 (UTC) User-Agent: tin/1.5.8-20010221 ("Blue Water") (UNIX) (HP-UX/B.11.00 (9000/800)) Xref: archiver1.google.com comp.lang.ada:5729 Date: 2004-02-22T16:17:33+00:00 List-Id: wave wrote: : Gnat is just throwing back an error with the pattern, if I : could get it working correctly then I could arrange the stripping of : the string. Other than the line break that the news reader shows in your source, I only noticed the backreference \1. I think this is not provided in the simple regex matching package. -- Georg