comp.lang.ada
 help / color / mirror / Atom feed
From: Wesley Groleau <wesgroleau@despammed.com>
Subject: Re: If anybody wants to make something in Ada but do not know what
Date: Wed, 16 Apr 2003 18:21:17 -0500
Date: 2003-04-16T18:21:17-05:00	[thread overview]
Message-ID: <VZ2dndUZ8_77eACjXTWcpA@gbronline.com> (raw)
In-Reply-To: <v9rdn4e8tej0d2@corp.supernews.com>


> First of all, Bayesian filters are most effective the closer to the
> client that they are. On the server, they have to filter everyones mail,
> and that necessarily means that they have to let more stuff through. For

Not necessarily.  The one I proposed does the filtering on the server
based on feedback from the addressee.  In other word, each user would
have his/her dedicated statistical DB.

> Secondly, most of the effectiveness of Bayesian filters have come from
> the fact that they include the HTML markup in the text stored. Spammers
> have figured that out, and are now sending a lot more plain text

> Thirdly, spammers have started sending random strings of junk (usually
> placed so it won't display) as part of messages. Depending on the
> filter, that can make a lot of messages look "OK" to a Bayesian filter,
> because they often treat unknown words as unlikely to be spam. Even if
> they don't do that, they tend to clog up the database with lots of junk
> 'words'.

The way some implementations work, these tricks won't work.
For example, Paul Graham's algorithm only looks at the strongest
indicators at both ends.  If a spammer puts in a lot of random
words, they won't be consistent and will not have much weight.
If the spammer puts in the same words all the time, and these words
are common in non-spam, they will not be srong indicators and
won't be used.  If they are not common in non-spam, they will
catch the spam.

> Lastly, a Bayesian filter can never be accurate enough to entrust with
> discarding of messages, at least for me. I'll only trust a pinpoint
> filter for that, such as discarding names that include a particular URL.
> Even so, I'm discarding 70% of the incoming spam here.

My _limited_ tests with a Bayesian filter had no false negatives or false 
positives.  And the 'net being what it is, an occasional message gets
lost somehow anyway.  Besides, no filter is required to discard anything.

I would just like the presumed spam messages stored on the server
until I say trash them (or until I ignore them for some length
of time).  Ideally, have the filter put the subject lines in an
e-mail to me, containing a CGI form with two choices:
  - trash all of them
  - send a individual choice message
The individual choice message would let me select specific messages
to be kept.

> My preference is to filter on the URLs (and in some cases, phone numbers
> and snail mail addresses) that the spammers use for contacts.

But a Bayesian filter can do stats on that as well.




  reply	other threads:[~2003-04-16 23:21 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-16 12:53 If anybody wants to make something in Ada but do not know what Preben Randhol
2003-04-16 13:59 ` Warren W. Gay VE3WWG
2003-04-16 16:10   ` rd
2003-04-16 16:34     ` SPAM-less email (was If anybody wants to make something in Ada but do not know what) Warren W. Gay VE3WWG
2003-04-16 17:00       ` SPAM-less email (was If anybody wants to make something in Ada but Larry Kilgallen
2003-04-16 17:43         ` Warren W. Gay VE3WWG
2003-04-16 18:03           ` Samuel Tardieu
2003-04-16 18:48             ` SPAM-less email (was If anybody wants to make something in Ada tmoran
2003-04-16 20:58               ` Georg Bauhaus
2003-04-17 16:51             ` SPAM-less email (was If anybody wants to make something in Ada but Warren W. Gay VE3WWG
2003-04-17 21:54               ` Robert A Duff
2003-04-17 22:39                 ` AG
2003-04-18  8:27                 ` Preben Randhol
2003-04-17 23:38               ` SPAM-less email (was If anybody wants to make something in Adabut Randy Brukardt
2003-04-18  0:06                 ` AG
2003-04-18  0:32                   ` Larry Kilgallen
2003-04-18  0:48                     ` AG
2003-04-18  2:10                       ` Larry Kilgallen
2003-04-18  3:13                         ` AG
2003-04-18  4:50                           ` tmoran
2003-04-18 11:26                             ` Larry Kilgallen
2003-04-18 11:23                         ` Larry Kilgallen
     [not found]                         ` <g3Kna.5120$mZ4.89596@news.xtra.co.nzOrganization: LJK Software <JKMUgN4L70TN@eisner.encompasserve.org>
2003-04-19  6:36                           ` Tarjei T. Jensen
2003-04-21 18:50                     ` Randy Brukardt
2003-04-18  7:32                 ` Jacob Sparre Andersen
2003-04-18 11:32                   ` Larry Kilgallen
2003-04-19  4:45                     ` [way off-topic] A new spammer is born? Wesley Groleau
2003-04-19 20:10                   ` SPAM-less email (was If anybody wants to make something in Adabut Georg Bauhaus
2003-04-19 21:15                     ` AG
2003-04-20 15:31                       ` Georg Bauhaus
2003-04-21  3:33                         ` Wesley Groleau
2003-04-16 19:19           ` SPAM-less email (was If anybody wants to make something in Ada Larry Kilgallen
2003-04-16 21:38       ` SPAM-less email (was If anybody wants to make something in Ada butdo not know what) rd
2003-04-16 22:03         ` Samuel Tardieu
2003-04-17  0:16           ` rd
2003-04-17 16:59           ` Warren W. Gay VE3WWG
2003-04-17 16:58         ` Warren W. Gay VE3WWG
2003-04-17 22:02         ` Robert A Duff
2003-04-16 19:16     ` If anybody wants to make something in Ada but do not know what Pascal Obry
2003-04-16 19:42       ` Samuel Tardieu
2003-04-24 13:55   ` Frode Tenneboe
2003-04-28 16:00     ` Warren W. Gay VE3WWG
2003-04-28 17:28       ` Preben Randhol
2003-04-28 19:53         ` Wesley Groleau
2003-04-29  6:14           ` Preben Randhol
2003-04-29 17:40       ` Georg Bauhaus
2003-04-16 17:52 ` Jano
2003-04-16 18:43 ` Wesley Groleau
2003-04-16 20:03   ` Randy Brukardt
2003-04-16 20:01 ` Randy Brukardt
2003-04-16 23:21   ` Wesley Groleau [this message]
2003-04-17  8:05     ` AG
2003-04-17 16:52       ` Wesley Groleau
2003-04-17 22:02         ` AG
2003-04-17 22:58     ` Randy Brukardt
2003-04-19  6:28       ` Tarjei T. Jensen
2003-04-23 19:32         ` Robert C. Leif
2003-04-24  1:35           ` Wesley Groleau
2003-04-16 23:26   ` Wesley Groleau
2003-04-17 22:28     ` Randy Brukardt
2003-04-30 12:44 ` Frank
2003-04-30 19:59   ` Free SVG tools Nick Roberts
2003-05-02  4:54     ` Steve Bowen
2003-05-02 20:12       ` Martin Holmes
2003-05-03 18:54         ` Steve Bowen
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox