comp.lang.ada
 help / color / mirror / Atom feed
From: Dmitry A. Kazakov <mailbox@dmitry-kazakov.de>
Subject: Re: Current "Swen" worm attack - a tip
Date: Wed, 24 Sep 2003 12:08:01 +0200
Date: 2003-09-24T12:08:01+02:00	[thread overview]
Message-ID: <lgq2nvkklls9n8jd5ps661h8osd5qkrgia@4ax.com> (raw)
In-Reply-To: WD%bb.785$RW4.309@newsread4.news.pas.earthlink.net

On Tue, 23 Sep 2003 17:44:22 GMT, Jeffrey Carter <spam@spam.com>
wrote:

>Preben Randhol wrote:
>> 
>> I have found that the baysian filtering is very good when you have
>> taught it what is spam and what is not. It takes a bit effort in the
>> beginning, but now I get about 40-50 spams a day and I have some 5-7
>> mailinglists and it filters all for me into correct folders. Sometimes a
>> spam ends in the wrong place, but then it is simply (for me) to press a
>> key and it is relearnt as spam and moved into that folder.
>> 
>> I have heard talk that the naive baysian statisical methods used could
>> be improved and other statistical methods might do better, however there
>> has not been an implementation yet. So if anybody here knows statistics
>> it is a nice chance to make a killer spam filter :-)
>
>I've long felt that a neural network should be able to learn to 
>distinguish spam from real mail very accurately.

It won't.

>The problem is figuring 
>out a good way to represent a mail message to the network.

Right. It is a well known problem of machine learning. To apply any
learning techinque, you have to have features. These features have to
be good, very good. For example, the feature, "number of repetitions
of a given word in a text" is a very bad feature if spammer generates
messages randomly with a big dictionary.

But features appearing good to us, humans, may be bad for the chosen
method. For example, the most of statistical methods require
statistically independent features. It is easy to build a feature
space where well distinguishable classes will never be separated by a
neural network, etc.

>I haven't had 
>much success on that, but once you have that, training the network is 
>simple.

Once you have good features. Surely.

BTW, it looks that it is over. Since yesterday I am receiving no more
spam (of this art). Is that because MS is closing that chats?

---
Regards,
Dmitry Kazakov
www.dmitry-kazakov.de



  parent reply	other threads:[~2003-09-24 10:08 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-09-22  3:05 Current "Swen" worm attack Alexander Kopilovitch
2003-09-22 10:27 ` Stephane Richard
2003-09-22 11:45   ` chris
2003-09-23  3:49     ` Wes Groleau
2003-09-22 11:49   ` Preben Randhol
2003-09-22 21:42     ` Randy Brukardt
2003-09-23  7:10       ` Preben Randhol
2003-09-23  7:35       ` Vinzent Hoefler
2003-09-23  0:39     ` Alexander Kopilovitch
2003-09-23  4:11       ` David Marceau
2003-09-23 11:08         ` Jeff C,
2003-09-23 15:41           ` Ludovic Brenta
2003-09-24  1:14             ` Jeff C,
2003-09-24  8:20             ` Martin Krischik
2003-09-25 10:10               ` Ludovic Brenta
2003-09-25 11:01                 ` Martin Krischik
2003-09-25 11:32                 ` Preben Randhol
2003-09-25 12:07                   ` Ludovic Brenta
2003-09-25 13:47                 ` Stephen Leake
2003-09-23 18:47         ` Randy Brukardt
2003-09-23 20:56         ` Berend de Boer
     [not found]       ` <3F6FA78D.3070708@myob.com>
2003-10-03 13:41         ` sk
2003-10-03 14:17           ` Preben Randhol
2003-09-23  3:44   ` Current "Swen" worm attack - a tip Wes Groleau
2003-09-23  7:33     ` Preben Randhol
2003-09-23 17:44       ` Jeffrey Carter
2003-09-23 18:00         ` Brian Catlin
2003-09-23 19:14           ` tmoran
2003-09-23 20:55         ` Berend de Boer
2003-09-24 10:08         ` Dmitry A. Kazakov [this message]
2003-09-24 21:50           ` Wes Groleau
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox