mydnsbl moving from investigation to testing

This is the latest draft of “mydnsbl,” which is a personal project I’ve been working on. It’s been about half work time and half personal time. As of now, the software seems to work OK, and the docs are pretty complete (for testing purposes anyway). I will soon be moving on to the more daunting task of trying to test my new DNSBL with actual user mail, if I can convince management that it’s safe.

“mydnsbl” is a script that reads syslog activity from a mail server, and creates a DNSBL based on the “bad” activity. The idea is that I want to keep track of the last 10 transactions from each IP, and if 9 of the last 10 transactions were user unknown, then that IP should go on a local DNSBL for something like 2 hours.

“Bad” activity in this case is considered to be user unknown, mailing to internal-only recipients, known spam, known virus, and basically anything that results in a failed transaction at your mailer for reasons not your fault :) This bad activity is offset by “good or neutral” activity, such as delivered OK, possible spam but not sure, and anything that results in an actual delivery.

Currently there are two pieces of the puzzle somewhat working.

The “myscanner” script tails a logfile where my mailservers send their syslog output. It takes multiple lines with the same transaction ID and puts the pieces back together, so that the output contains one line per transaction, telling the IP and the result. This is good for mailservers that output activity “as it happens” rather than one line per transaction. If you can convince your mailer to output (IP,result) on one line, you probably don’t need myscanner.

Currently “myscanner” only understands Barracuda output, but a similar framework could be used to make sendmail logs into transactional output. It is currently highly dependent on our specific output though. (There are similar programs or perl modules out there that produce summarized output for Sendmail and maybe others.)

Transaction ids are kept in a hash of hashes, which is pruned of old entries to keep it to a certain memory size. Statistics output at the end (or more frequently if you like) show how many log lines were from an “unknown” transaction and therefore could not be tracked to a specific IP.

myscanner has a lot of different keywords for different types of transactions to show or hide, different levels of detail, and whether to tail the end of
the log (x bytes in and forward) or to start at the beginning and run through to the end. I decided not to create a named pipe or do anything fancy with syslog itself. Myscanner detects when the file we are reading is no longer at the same name, and switches over to the newly created file, also rotating its own output file at the same time.

The default mode is “tally” which produces output suitable for “mycollect” (see below). Sample output looks like this:

var> ~gconnor/bin/myscanner
time=Jan 16 13:49:26, ip=, result=Recipient address rejected: Blocked
time=Jan 16 13:49:26, ip=, result=No such user:
time=Jan 16 13:49:26, ip=, result=Sender address rejected: Domain not found
time=Jan 16 13:49:26, ip=, result=Blocked: using;
time=Jan 16 13:49:28, ip=, result=Blocked: using;
time=Jan 16 13:49:28, ip=, result=Quarantined
time=Jan 16 13:49:28, ip=, result=Delivered
time=Jan 16 13:49:28, ip=, result=Blocked: spam
time=Jan 16 13:49:28, ip=, result=Blocked: spam
time=Jan 16 13:49:28, ip=, result=No such user:
From: Jan  3 12:27:22
  To: Jan  3 12:27:27
lines=1206 orphaned=281(23%) unmatched=0
transcount=242 cache=49 completed=193 incomplete=0(0%)

RBL activity is tracked so we can compare our BL to others in a dry run. RBL results don’t count as good or bad, they only affect stats output. After we go live we will probably stop logging RBL to our input file anyway (just “myscanner” instead of “myscanner show rbl”). If RBL output is omitted, our stats will be based on “activity not already blocked by an RBL” and will appear higher — though we probably can use other graphs to compare our live BL to the others.

Note that if your mail program already reports the result (Sent OK, Unknown user, spam, unknown domain, etc) on the same line as the IP, you probably don’t need myscanner; just alter mycollect to interpret the log format of your mailer.

mycollect keeps track of every IP seen so far in a hash, and with each IP it keeps the result of the most recent 10 transactions, where “result” is either bad, ok or wtf. If 9 of the last 10 transactions are “bad” then the IP is added to the internal “blocked list”. The current blocked list and its expire times are kept in memory, and dumped out to a disk file every 5 minutes. The output is just a list of IP addresses, so it will work with rbldnsd but I will eventually add a preamble and some formatting.

mycollect also keeps detailed statistics, which was its main job during the investigation phase. I wanted to get detailed info about how many IPs would be blocked, and how many messages that made it through would have been blocked.

Statistics are reported to STDERR at every 100,000 transactions (if you like) or when the program receives USR1 signal. Output looks like:

# kill -USR1 %1
From: Jan 16 00:46:07  To: Jan 16 02:48:48
total = 300000 (100%) (rbl = 65, ok = 1, bad = 32)
would block = 40027 (13%) (rbl = 0, ok = 0, bad = 12)
cache size = 18887, blocks size = 2609
usertime=329.18, systime=10.11, SZ:RSS=4940:4487
... (later) ...
From: Jan 16 00:46:07  To: Jan 16 06:53:16
total = 1000000 (100%) (rbl = 63, ok = 1, bad = 35)
would block = 176401 (17%) (rbl = 0, ok = 0, bad = 16)
cache size = 19346, blocks size = 2535
usertime=1374.61, systime=33.5, SZ:RSS=5344:4898

This indicates that after 300,000 transactions (about 2 hrs), 2609 IPs would be added to the blocked list, and 40,000 of those transactions would have been avoided, if the DNSBL had been really used. Later, after 1M transactions, we have 2535 entries on the BL, and would have blocked 176,000 transactions (17%).

It looks like most of the mail that would have been blocked would have resulted in “User unknown” or would have been caught by other tests anyway, but the real test will be to compare the messages that would have gotten through (ok) but are now stopped, to see if the “ok” number drops significantly.

Any feedback is appreciated. Right now it is pretty customized to my environment, but if you feel like playing with it and running your own output through it, please feel free. I would be interested to see what kind of numbers you come up with :)

Thanks for taking the time to read!

Leave a Reply