comp.lang.ada
 help / color / mirror / Atom feed
From: "Robert C. Leif" <rleif@rleif.com>
To: "'comp.lang.ada mail to news gateway'" <comp.lang.ada@ada.eu.org>
Subject: RE: If anybody wants to make something in Ada but do not know what
Date: Wed, 23 Apr 2003 12:32:03 -0700
Date: 2003-04-23T12:32:03-07:00	[thread overview]
Message-ID: <mailman.3.1051126366.13478.comp.lang.ada@ada.eu.org> (raw)
In-Reply-To: <O06oa.5030$8g5.77428@news2.e.nsc.no>

Unfortunately, the utility of a spam filter depends to a very large extent
on the domain knowledge of its creators. Thus, it is a poor choice for an
Ada software engineering project. For instance, in pattern recognition there
is a well known balance between false positives and false negatives. As one
goes up, the other goes down. One approach that has worked for other
projects is to order the messages in terms of probability that they are Spam
with high probability spam being at the bottom of the list.

If anyone is serious about making a commercial product, look into Xforms
development and the use of Xforms with Ada. The present Ada screen
generators could be reused for this purpose. Since Microsoft appears to NOT
be following the W3C XForms standard, this offers a chance to compete.
Bob Leif

-----Original Message-----
From: Tarjei T. Jensen [mailto:tarjei@online.no] 
Sent: Friday, April 18, 2003 11:29 PM
To: comp.lang.ada@ada.eu.org

Randy Brukardt wrote:
> That might prevent passing spam, but it does nothing to avoid the
> overhead. The problem is in order to find out the strongest indicator,
> you have to score every 'word' in the message. When a lot of trash words
> are in the message, you have to allocate new words and new counters for
> them; and when there are a lot of such messages, the size of the DB
> grows rapidly. (We saw this happen in the search engine when we
> accidentially indexed some Unix .lib files.) That adds overhead; a lot
> of overhead for a filter like mine which gets invoked on each message
> individually. (Writing out the word list each time is expensive.)

Why not do it another way: Check all URL in the message. If they point to a
know porn/spam server, mark it as suspect. Then do some processing on what
is left of the text.

One could also obtain some sort of unique signature from the mail. Then
compare that to other messages received. If a lot of messages have the same
signature, then they are likely to be spam. Known mailing lists will of
course be excluded. The only problem is to generate a signature that is not
trivial to evade. Preferably there should be a number of signatures
algorithms to choose from, so that it becomes difficult to optimize the mail
for all of them since the spammer can't know which algorithm is used any
given day or hour. The algorithm would of course be chosen arbitrarily at
each site.

greetings,








  reply	other threads:[~2003-04-23 19:32 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-16 12:53 If anybody wants to make something in Ada but do not know what Preben Randhol
2003-04-16 13:59 ` Warren W. Gay VE3WWG
2003-04-16 16:10   ` rd
2003-04-16 16:34     ` SPAM-less email (was If anybody wants to make something in Ada but do not know what) Warren W. Gay VE3WWG
2003-04-16 17:00       ` SPAM-less email (was If anybody wants to make something in Ada but Larry Kilgallen
2003-04-16 17:43         ` Warren W. Gay VE3WWG
2003-04-16 18:03           ` Samuel Tardieu
2003-04-16 18:48             ` SPAM-less email (was If anybody wants to make something in Ada tmoran
2003-04-16 20:58               ` Georg Bauhaus
2003-04-17 16:51             ` SPAM-less email (was If anybody wants to make something in Ada but Warren W. Gay VE3WWG
2003-04-17 21:54               ` Robert A Duff
2003-04-17 22:39                 ` AG
2003-04-18  8:27                 ` Preben Randhol
2003-04-17 23:38               ` SPAM-less email (was If anybody wants to make something in Adabut Randy Brukardt
2003-04-18  0:06                 ` AG
2003-04-18  0:32                   ` Larry Kilgallen
2003-04-18  0:48                     ` AG
2003-04-18  2:10                       ` Larry Kilgallen
2003-04-18  3:13                         ` AG
2003-04-18  4:50                           ` tmoran
2003-04-18 11:26                             ` Larry Kilgallen
2003-04-18 11:23                         ` Larry Kilgallen
     [not found]                         ` <g3Kna.5120$mZ4.89596@news.xtra.co.nzOrganization: LJK Software <JKMUgN4L70TN@eisner.encompasserve.org>
2003-04-19  6:36                           ` Tarjei T. Jensen
2003-04-21 18:50                     ` Randy Brukardt
2003-04-18  7:32                 ` Jacob Sparre Andersen
2003-04-18 11:32                   ` Larry Kilgallen
2003-04-19  4:45                     ` [way off-topic] A new spammer is born? Wesley Groleau
2003-04-19 20:10                   ` SPAM-less email (was If anybody wants to make something in Adabut Georg Bauhaus
2003-04-19 21:15                     ` AG
2003-04-20 15:31                       ` Georg Bauhaus
2003-04-21  3:33                         ` Wesley Groleau
2003-04-16 19:19           ` SPAM-less email (was If anybody wants to make something in Ada Larry Kilgallen
2003-04-16 21:38       ` SPAM-less email (was If anybody wants to make something in Ada butdo not know what) rd
2003-04-16 22:03         ` Samuel Tardieu
2003-04-17  0:16           ` rd
2003-04-17 16:59           ` Warren W. Gay VE3WWG
2003-04-17 16:58         ` Warren W. Gay VE3WWG
2003-04-17 22:02         ` Robert A Duff
2003-04-16 19:16     ` If anybody wants to make something in Ada but do not know what Pascal Obry
2003-04-16 19:42       ` Samuel Tardieu
2003-04-24 13:55   ` Frode Tenneboe
2003-04-28 16:00     ` Warren W. Gay VE3WWG
2003-04-28 17:28       ` Preben Randhol
2003-04-28 19:53         ` Wesley Groleau
2003-04-29  6:14           ` Preben Randhol
2003-04-29 17:40       ` Georg Bauhaus
2003-04-16 17:52 ` Jano
2003-04-16 18:43 ` Wesley Groleau
2003-04-16 20:03   ` Randy Brukardt
2003-04-16 20:01 ` Randy Brukardt
2003-04-16 23:21   ` Wesley Groleau
2003-04-17  8:05     ` AG
2003-04-17 16:52       ` Wesley Groleau
2003-04-17 22:02         ` AG
2003-04-17 22:58     ` Randy Brukardt
2003-04-19  6:28       ` Tarjei T. Jensen
2003-04-23 19:32         ` Robert C. Leif [this message]
2003-04-24  1:35           ` Wesley Groleau
2003-04-16 23:26   ` Wesley Groleau
2003-04-17 22:28     ` Randy Brukardt
2003-04-30 12:44 ` Frank
2003-04-30 19:59   ` Free SVG tools Nick Roberts
2003-05-02  4:54     ` Steve Bowen
2003-05-02 20:12       ` Martin Holmes
2003-05-03 18:54         ` Steve Bowen
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox