From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,c7ee0d960296483 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-09-24 02:59:35 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!headwall.stanford.edu!fu-berlin.de!uni-berlin.de!tar-alcarin.cbb-automation.DE!not-for-mail From: Dmitry A. Kazakov Newsgroups: comp.lang.ada Subject: Re: Current "Swen" worm attack - a tip Date: Wed, 24 Sep 2003 12:08:01 +0200 Message-ID: References: NNTP-Posting-Host: tar-alcarin.cbb-automation.de (212.79.194.111) Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de 1064397574 5206585 212.79.194.111 (16 [77047]) X-Newsreader: Forte Agent 1.8/32.548 Xref: archiver1.google.com comp.lang.ada:42849 Date: 2003-09-24T12:08:01+02:00 List-Id: On Tue, 23 Sep 2003 17:44:22 GMT, Jeffrey Carter wrote: >Preben Randhol wrote: >> >> I have found that the baysian filtering is very good when you have >> taught it what is spam and what is not. It takes a bit effort in the >> beginning, but now I get about 40-50 spams a day and I have some 5-7 >> mailinglists and it filters all for me into correct folders. Sometimes a >> spam ends in the wrong place, but then it is simply (for me) to press a >> key and it is relearnt as spam and moved into that folder. >> >> I have heard talk that the naive baysian statisical methods used could >> be improved and other statistical methods might do better, however there >> has not been an implementation yet. So if anybody here knows statistics >> it is a nice chance to make a killer spam filter :-) > >I've long felt that a neural network should be able to learn to >distinguish spam from real mail very accurately. It won't. >The problem is figuring >out a good way to represent a mail message to the network. Right. It is a well known problem of machine learning. To apply any learning techinque, you have to have features. These features have to be good, very good. For example, the feature, "number of repetitions of a given word in a text" is a very bad feature if spammer generates messages randomly with a big dictionary. But features appearing good to us, humans, may be bad for the chosen method. For example, the most of statistical methods require statistically independent features. It is easy to build a feature space where well distinguishable classes will never be separated by a neural network, etc. >I haven't had >much success on that, but once you have that, training the network is >simple. Once you have good features. Surely. BTW, it looks that it is over. Since yesterday I am receiving no more spam (of this art). Is that because MS is closing that chats? --- Regards, Dmitry Kazakov www.dmitry-kazakov.de