From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=unavailable autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,e5c972d04da95d51 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-04-23 12:33:34 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!nntp.cs.ubc.ca!freenix!enst.fr!not-for-mail From: "Robert C. Leif" Newsgroups: comp.lang.ada Subject: RE: If anybody wants to make something in Ada but do not know what Date: Wed, 23 Apr 2003 12:32:03 -0700 Organization: ENST, France Message-ID: Reply-To: "comp.lang.ada mail to news gateway" NNTP-Posting-Host: marvin.enst.fr Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Trace: avanie.enst.fr 1051126375 22367 137.194.161.2 (23 Apr 2003 19:32:55 GMT) X-Complaints-To: usenet@enst.fr NNTP-Posting-Date: Wed, 23 Apr 2003 19:32:55 +0000 (UTC) To: "'comp.lang.ada mail to news gateway'" Return-Path: X-Envelope-From: rleif@rleif.com X-Envelope-To: X-Mailer: Microsoft Outlook, Build 11.0.4920 In-Reply-To: Thread-Index: AcMJojBNootxKCBDRGePgd9PMsceBAADnD7w X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 X-BeenThere: comp.lang.ada@ada.eu.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: comp.lang.ada mail to news gateway List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Xref: archiver1.google.com comp.lang.ada:36436 Date: 2003-04-23T12:32:03-07:00 Unfortunately, the utility of a spam filter depends to a very large = extent on the domain knowledge of its creators. Thus, it is a poor choice for = an Ada software engineering project. For instance, in pattern recognition = there is a well known balance between false positives and false negatives. As = one goes up, the other goes down. One approach that has worked for other projects is to order the messages in terms of probability that they are = Spam with high probability spam being at the bottom of the list. If anyone is serious about making a commercial product, look into Xforms development and the use of Xforms with Ada. The present Ada screen generators could be reused for this purpose. Since Microsoft appears to = NOT be following the W3C XForms standard, this offers a chance to compete. Bob Leif -----Original Message----- From: Tarjei T. Jensen [mailto:tarjei@online.no]=20 Sent: Friday, April 18, 2003 11:29 PM To: comp.lang.ada@ada.eu.org Randy Brukardt wrote: > That might prevent passing spam, but it does nothing to avoid the > overhead. The problem is in order to find out the strongest indicator, > you have to score every 'word' in the message. When a lot of trash = words > are in the message, you have to allocate new words and new counters = for > them; and when there are a lot of such messages, the size of the DB > grows rapidly. (We saw this happen in the search engine when we > accidentially indexed some Unix .lib files.) That adds overhead; a lot > of overhead for a filter like mine which gets invoked on each message > individually. (Writing out the word list each time is expensive.) Why not do it another way: Check all URL in the message. If they point = to a know porn/spam server, mark it as suspect. Then do some processing on = what is left of the text. One could also obtain some sort of unique signature from the mail. Then compare that to other messages received. If a lot of messages have the = same signature, then they are likely to be spam. Known mailing lists will of course be excluded. The only problem is to generate a signature that is = not trivial to evade. Preferably there should be a number of signatures algorithms to choose from, so that it becomes difficult to optimize the = mail for all of them since the spammer can't know which algorithm is used any given day or hour. The algorithm would of course be chosen arbitrarily = at each site. greetings,