From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_00,LONGWORDS autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,561ac97c34d8f8ef X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2004-01-21 11:28:57 PST Path: archiver1.google.com!news1.google.com!sn-xit-02!sn-xit-01!sn-xit-06!sn-post-02!sn-post-01!supernews.com!corp.supernews.com!not-for-mail From: "Randy Brukardt" Newsgroups: comp.lang.ada Subject: Re: OT-spam detection, was Re: WM, and lindens rustled Date: Wed, 21 Jan 2004 13:28:04 -0600 Organization: Posted via Supernews, http://www.supernews.com Message-ID: <100tknotk84nfe0@corp.supernews.com> References: X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4910.0300 X-Complaints-To: abuse@supernews.com Xref: archiver1.google.com comp.lang.ada:4614 Date: 2004-01-21T13:28:04-06:00 List-Id: wrote in message news:mbzPb.97939$sv6.404211@attbi_s52... > >correlate biography death peltry clan locutor booty revolution gossip convention natural compressor consummate fecund conrail pathfind transcript gila great slung
> Surely any spam detector would trivially note that this word sequence is, > statistically, a very unusual English sentence. Certainly mine does. (It uses a dictionary of known common e-mail words. Most of those aren't in it.) But anything that only works on words (and not on the relationships) will have trouble with such things, if the words are valid. Which is why I don't think Bayesian filters work because of the words (even though that how they are usually described), but rather because of the HTML markup (which, often hides word lists like that). Randy.