comp.lang.ada
 help / color / mirror / Atom feed
From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: OT-spam detection, was Re: WM, and lindens rustled
Date: Wed, 21 Jan 2004 13:28:04 -0600
Date: 2004-01-21T13:28:04-06:00	[thread overview]
Message-ID: <100tknotk84nfe0@corp.supernews.com> (raw)
In-Reply-To: mbzPb.97939$sv6.404211@attbi_s52

<tmoran@acm.org> wrote in message news:mbzPb.97939$sv6.404211@attbi_s52...
> >correlate biography death peltry clan locutor booty revolution gossip
convention natural compressor consummate fecund conrail pathfind transcript
gila great slung <BR>
>   Surely any spam detector would trivially note that this word sequence
is,
> statistically, a very unusual English sentence.

Certainly mine does. (It uses a dictionary of known common e-mail words.
Most of those aren't in it.) But anything that only works on words (and not
on the relationships) will have trouble with such things, if the words are
valid. Which is why I don't think Bayesian filters work because of the words
(even though that how they are usually described), but rather because of the
HTML markup (which, often hides word lists like that).

              Randy.






  reply	other threads:[~2004-01-21 19:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <mailman.9.1074679047.281.comp.lang.ada@ada-france.org>
2004-01-21 18:05 ` OT-spam detection, was Re: WM, and lindens rustled tmoran
2004-01-21 19:28   ` Randy Brukardt [this message]
2004-01-22  7:56     ` Preben Randhol
2004-01-22 10:41       ` Larry Kilgallen
2004-01-22 13:06         ` Preben Randhol
2004-01-22 15:47           ` Larry Kilgallen
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox