Spamcalc TODO list

Last modified on 1 April 2002.

There is a huge todo list, as this is only version 0.5 of the script.
For now, just the list, not the exact descriptions, so that the script
can be released asap :)

* More and more accurate penalty values are needed.
This item will never be removed from the todo list.
Always, the lists of words, regexps and domains will need perfecting.
So don't hesitate to comment on the penalty values and/or send me additional
lists of spamwords (in any language). Check the file 'feedback' for extra
information on this.

* Error reporting and stuff like that
This was only my first perl script ever. Therefore, I did not use OO or any
special functions, simply because I don't know them. There probably is an
amazing function for parsing the command line or to give errors, but I just
have no clue about that :)
Tell me if you know of any improvements.

* The algorithms needs improvement and fine-tuning.
At the moment, only 3 algorithms are used to determine the spam score. There
are several more that could (and will be) implemented:
 * Consider .co.uk and .com.au as 1 field, not as 2
 * Check for sequential words (i.am, is.my)
 * Check for rAnDom CaPS
 * Check for certain words at the beginning (i, you, he, what)
 * Check for multple-words-in.one-word-of-the.host
 * Check for l33t1sms
 * Add penalty for total length of hostname (strlen())
 * Calculate word penalties using trigrams
 * Check for lemona.de or cr.yp.to like spam
 * Check for repetition of fields (h4r.h4r.h4r.net)
 * Negative spamvalues that span more than 1 field

* The perl script needs improvement
 * merge the Read{Word,Regexp,Domains}File(s) from 6 functions into 2
 * think about what to do when spamword is seen twice in the words files

If you have any additions, or have coded some perl for one of the items on
the TODO list, don't hesitate to contact me on joost@carnique.nl.