From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,e5c972d04da95d51
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2003-04-23 12:33:34 PST
Path: 
 archiver1.google.com!news1.google.com!newsfeed.stanford.edu!nntp.cs.ubc.ca!freenix!enst.fr!not-for-mail
From: "Robert C. Leif" <rleif@rleif.com>
Newsgroups: comp.lang.ada
Subject: RE: If anybody wants to make something in Ada but do not know what
Date: Wed, 23 Apr 2003 12:32:03 -0700
Organization: ENST, France
Message-ID: <mailman.3.1051126366.13478.comp.lang.ada@ada.eu.org>
Reply-To: "comp.lang.ada mail to news gateway" <comp.lang.ada@ada.eu.org>
NNTP-Posting-Host: marvin.enst.fr
Mime-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Trace: avanie.enst.fr 1051126375 22367 137.194.161.2 (23 Apr 2003 19:32:55
 GMT)
X-Complaints-To: usenet@enst.fr
NNTP-Posting-Date: Wed, 23 Apr 2003 19:32:55 +0000 (UTC)
To: "'comp.lang.ada mail to news gateway'" <comp.lang.ada@ada.eu.org>
Return-Path: <rleif@rleif.com>
X-Envelope-From: rleif@rleif.com
X-Envelope-To: <comp.lang.ada@ada.eu.org>
X-Mailer: Microsoft Outlook, Build 11.0.4920
In-Reply-To: <O06oa.5030$8g5.77428@news2.e.nsc.no>
Thread-Index: AcMJojBNootxKCBDRGePgd9PMsceBAADnD7w
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
X-BeenThere: comp.lang.ada@ada.eu.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: comp.lang.ada mail to news gateway <comp.lang.ada.ada.eu.org>
List-Unsubscribe: <http://ada.eu.org/mailman/listinfo/comp.lang.ada>,
	<mailto:comp.lang.ada-request@ada.eu.org?subject=unsubscribe>
List-Post: <mailto:comp.lang.ada@ada.eu.org>
List-Help: <mailto:comp.lang.ada-request@ada.eu.org?subject=help>
List-Subscribe: <http://ada.eu.org/mailman/listinfo/comp.lang.ada>,
	<mailto:comp.lang.ada-request@ada.eu.org?subject=subscribe>
Xref: archiver1.google.com comp.lang.ada:36436
Date: 2003-04-23T12:32:03-07:00

Unfortunately, the utility of a spam filter depends to a very large =
extent
on the domain knowledge of its creators. Thus, it is a poor choice for =
an
Ada software engineering project. For instance, in pattern recognition =
there
is a well known balance between false positives and false negatives. As =
one
goes up, the other goes down. One approach that has worked for other
projects is to order the messages in terms of probability that they are =
Spam
with high probability spam being at the bottom of the list.

If anyone is serious about making a commercial product, look into Xforms
development and the use of Xforms with Ada. The present Ada screen
generators could be reused for this purpose. Since Microsoft appears to =
NOT
be following the W3C XForms standard, this offers a chance to compete.
Bob Leif

-----Original Message-----
From: Tarjei T. Jensen [mailto:tarjei@online.no]=20
Sent: Friday, April 18, 2003 11:29 PM
To: comp.lang.ada@ada.eu.org

Randy Brukardt wrote:
> That might prevent passing spam, but it does nothing to avoid the
> overhead. The problem is in order to find out the strongest indicator,
> you have to score every 'word' in the message. When a lot of trash =
words
> are in the message, you have to allocate new words and new counters =
for
> them; and when there are a lot of such messages, the size of the DB
> grows rapidly. (We saw this happen in the search engine when we
> accidentially indexed some Unix .lib files.) That adds overhead; a lot
> of overhead for a filter like mine which gets invoked on each message
> individually. (Writing out the word list each time is expensive.)

Why not do it another way: Check all URL in the message. If they point =
to a
know porn/spam server, mark it as suspect. Then do some processing on =
what
is left of the text.

One could also obtain some sort of unique signature from the mail. Then
compare that to other messages received. If a lot of messages have the =
same
signature, then they are likely to be spam. Known mailing lists will of
course be excluded. The only problem is to generate a signature that is =
not
trivial to evade. Preferably there should be a number of signatures
algorithms to choose from, so that it becomes difficult to optimize the =
mail
for all of them since the spammer can't know which algorithm is used any
given day or hour. The algorithm would of course be chosen arbitrarily =
at
each site.

greetings,