After building up a catalog of 951 good messages and 388 spam messages, Bayes is getting quite good at categorizing my mail.
This is out of a set of 1339 messages classified by me, by hand.
neko-base> foreach j (BAYES_{01,10,20,30,40,50,60,70,80,90})
foreach? echo -n $j
foreach? egrep $j mail/track-spam|wc -l
foreach? end
BAYES_01 2
BAYES_10 0
BAYES_20 2
BAYES_30 6
BAYES_40 0
BAYES_50 0
BAYES_60 22
BAYES_70 32
BAYES_80 31
BAYES_90 88
neko-base> foreach j ( BAYES_{01,10,20,30,40,50,60,70,80,90} )
foreach? echo -n $j ; egrep $j mail/track-good|wc -l
foreach? end
BAYES_01 64
BAYES_10 12
BAYES_20 11
BAYES_30 8
BAYES_40 0
BAYES_50 0
BAYES_60 0
BAYES_70 0
BAYES_80 0
BAYES_90 0
neko-base> grep -c BAYES mail/track-*
mail/track-good:101
mail/track-spam:183
Overall here is the distribution of actual spamassassin scores. (Spamassassin was not used on all 1339 messages :)
neko-base> egrep "^X-Spam-Level:" mail/track-good | sort | uniq -dc ; egrep "^X-Spam-Level:" mail/track-good | wc -l
108 X-Spam-Level:
4 X-Spam-Level: *
6 X-Spam-Level: **
3 X-Spam-Level: ***
2 X-Spam-Level: ****
123
neko-base> egrep "^X-Spam-Level:" mail/track-spam | sort | uniq -dc ; egrep "^X-Spam-Level:" mail/track-spam | wc -l
3 X-Spam-Level:
3 X-Spam-Level: *
6 X-Spam-Level: **
9 X-Spam-Level: ***
6 X-Spam-Level: ****
13 X-Spam-Level: *****
5 X-Spam-Level: ******
17 X-Spam-Level: *******
7 X-Spam-Level: ********
16 X-Spam-Level: *********
8 X-Spam-Level: **********
7 X-Spam-Level: ***********
9 X-Spam-Level: ************
7 X-Spam-Level: *************
8 X-Spam-Level: **************
9 X-Spam-Level: ***************
8 X-Spam-Level: ****************
5 X-Spam-Level: *****************
7 X-Spam-Level: ******************
6 X-Spam-Level: *******************
3 X-Spam-Level: ********************
8 X-Spam-Level: **********************
2 X-Spam-Level: ***********************
2 X-Spam-Level: ************************
178