From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,ac1252c179cf9560 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2002-11-18 06:17:47 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!news.tele.dk!news.tele.dk!small.news.tele.dk!eusc.inter.net!cs.tu-berlin.de!uni-duisburg.de!not-for-mail From: Georg Bauhaus Newsgroups: comp.lang.ada Subject: Re: HTML parser in Ada ? Date: Mon, 18 Nov 2002 14:17:46 +0000 (UTC) Organization: GMUGHDU Message-ID: References: <17cd177c.0211161143.7f8d5842@posting.google.com> NNTP-Posting-Host: l1-hrz.uni-duisburg.de X-Trace: a1-hrz.uni-duisburg.de 1037629066 29659 134.91.1.34 (18 Nov 2002 14:17:46 GMT) X-Complaints-To: usenet@news.uni-duisburg.de NNTP-Posting-Date: Mon, 18 Nov 2002 14:17:46 +0000 (UTC) User-Agent: tin/1.5.8-20010221 ("Blue Water") (UNIX) (HP-UX/B.11.00 (9000/800)) Xref: archiver1.google.com comp.lang.ada:31042 Date: 2002-11-18T14:17:46+00:00 List-Id: Gautier wrote: : OK - I'll take a look at proposed solutions: : XML/Ada and OpenToken. Be aware though that you are on the edge of natural language processing when dealing with real world web pages. The best one can hope is that you don't need more than full SGML with tag minimization features. Second best will likely require context sensitive parsing and some heuristics. -- georg