Posted for willowisp and callicrates but others might be interested too.
Here is how I have spamassassin set up at work and at home… I wanted to have a system that SpamAssassin runs mostly by itself, but when it learns something wrong I want to be able to correct it.
1. .procmailrc edited for the following:
# UNALTERED BACKUP JUST IN CASE :0 c Incoming.backup.$MMYY # SPAMCOP FILTERING (only if size < 256 k) :0fw: spamassassin.lock * < 256000 | /usr/bin/spamassassin # Second copy to track the autolearn # Classified as learned-spam, learned-not-spam, unlearned :0c: * ^X-Spam-Status:.*autolearn=spam Spam/track-learned-spam :0c: * ^X-Spam-Status:.*autolearn=ham Spam/track-learned-not-spam :0c: * ^X-Spam-Status:.*autolearn=no Spam/track-unlearned # Tag not found means unlearned :0c: * ! X-Spam-Status:.*autolearn Spam/track-unlearned # Decide on the spam level you want to auto-reject # Leave commented until you are sure spam tagging is working right #:0: #* ^X-Spam-Level: ******* #/dev/null
2. Enable auto learning in ~/.spamassassin/user_prefs
# Enable the Bayes system use_bayes 1 # Enable Bayes auto-learning auto_learn 1 # Alter the thresholds for auto-learning a bit # (site default is 0.1-12.0 unlearned) bayes_auto_learn_threshold_nonspam 2.0 bayes_auto_learn_threshold_spam 10.0 # Just so that the BAYES_ header will appear in every message # default score of 0 means the header doesn't appear at all score BAYES_40 0.001 score BAYES_44 0.001 score BAYES_50 0.001 score BAYES_56 0.001 # Downgrade anything that is not english ok_languages en ok_locales en
3. Set up crontab (mine, not root) to re-learn stuff that was missed daily. After learning, it will tack the messages onto the appropriate learned folder and empty the missed folders. (Folders must be plain unix mailboxes for this to work, not .mbx)
>vi /udir/gconnor/crontab.gconnor 05 03 * * * /udir/gconnor/bin/spamlearner :wq > crontab /udir/gconnor/crontab.gconnor > vi /udir/gconnor/bin/spamlearner #!/bin/csh sa-learn --ham --mbox mail/Spam/missed-not-spam cat > mail/Spam/track-learned-not-spam && cp /dev/null mail/Spam/missed-not-spam sa-learn --spam --mbox mail/Spam/missed-spam cat > mail/Spam/track-learned-spam && cp /dev/null mail/Spam/missed-spam sa-learn --rebuild :wq > chmod +x /udir/gconnor/bin/spamlearner > /udir/gconnor/bin/spamlearner Learned from 0 message(s) (0 message(s) examined). Learned from 0 message(s) (0 message(s) examined).
After this runs correctly for a few days, you may alter crontab to send stdout to null.
4. If spamassassin learns something incorrectly, remove it from the “learned” folder and place it in the “missed” folder.
Incorrectly learned as good:
Spam/track-learned-not-spam -> Spam/missed-spam
Incorrectly learned as spam:
Spam/track-learned-spam -> Spam/missed-not-spam
5. Periodically look at the “unlearned” folder and place stuff in missed-spam or missed-not-spam to be learned.
This system is designed for one person. The bayes system seems to work better when it has a good sample of both good and spam mail. Also, if used for multiple people, they will have access to view everyone else’s good mail (unless you only provide access to the missed folders). It is possible to make the folders appear to multiple people by making symlinks into their mail directories, and making sure the original is writeable by the correct group.
I also have a recipe for creating “imap shared folders” in case the symlinks don’t work right, ask via comment if you want it :)