I have been thinking about ways to alter email programs (readers, servers, protocols) so that sending spam would be more difficult, and hopefully more expensive and less feasible for spammers. Yes, this is a tremendously difficult problem, but it is also a very important one, so much so that I am surprised it hasn’t been solved already.

A friend posted this link which is a filtering system based on this idea. It’s a very interesting way to judge whether something is spam based on parsing out all the words and tokens from two large containers of mail: one with spam, and one with your legitimate mail. It’s a cool idea because it is adaptive, and the criteria is set by each user (or each group that chooses to share info about good and bad messages).

However, I still think that we need to start thinking about the next generation of mail servers and how they are going to behave. Email is such a great productivity tool, even with the spam, but it is quickly becoming harder to use (and more expensive) due to spammers.

How to improve the email infrastructure

I believe there are a number of problems with the current email infrastructure and that those flaws are being exploited by spammers. A lot of these factors are historical and may not even be required parts of the system anymore. All the factors add up to “it’s harder to catch spammers than to send spam”. More importantly, sending spam is profitable and cheap; catching spammers and filtering out spam is expensive and unrewarding.

Since this is such a difficult problem, I don’t believe that any single solution will solve it. But, there may be several small solutions that are easier to implement than one difficult one.

Anyway, my emphasis here is on methods that are cheap and simple to implement, and which make it more expensive and infeasible for spammers to do business as usual.

1. Forgery is way too easy. Reporting spam back to the real admin is hard.

This has to do with the “trusting” nature of the early email servers and protocols. The idea was that any server could “relay” mail for any other server, on to its ultimate destination, and there was no mechanism for verifying the From: address and correlating it to the actual node which originated the transaction. You need a password to get your mail, but you don’t need a password to send mail, so you can freely send mail claiming to be from someone else. Also, most mail servers accept mail from any other node on the Internet, whether or not the machine actually starting the transaction is a bonafide representative of the domain represented in the From: address.

How do we fix this? Is it even possible to teach mail servers to only accept mail from other servers if the From: address correlates with the sending server?

Authentication of a message requires two parts: 1. is the message really from the sender named on the envelope, and 2. is the sender’s account still valid. There are complicated ways of issuing certificates and using split-key encryption to accomplish this; PGP is one such mechanism. But maybe there is a simpler way to get what we want. Really, what we want to know is, is this From: address really a valid address that we could send mail back to, and did this message really come from that address.

One way to do this might be to set a simple rule that says that servers who send mail out from a certain domain must be affiliated with that domain in a way that is hard to fake, such as publishing the server name in DNS. One brute-force way to do this is to make sure that the MX records (listing the servers authorized to receive mail sent TO my domain) also include all possible sending servers for that domain as well, so that mail FROM my domain also comes from one of the MX machines. Even if a server doesn’t accept email on port 25, it is probably harmless to add it to the MX list. (Most servers already do a DNS lookup to verify if the domain really exists.) A slightly more elegant variation on this would be to publish some other type of record (similar to MX) that identifies authorized senders. In either case, the sending out would have to be authorized by the actual owner of the domain. There has to be a trust relationship between the domain owner and the server owner, and if that trust is broken by either, no messages can be sent from that domain using that server.

This is sort of a “first pass” - you have authenticated just the domain part, which at least proves that the domain is for real and that someone affiliated with that domain probably sent the message. But we haven’t authenticated the full address, and any AOL.COM user could still send mail out claiming to be some other AOL.COM user. This would be up to the server administrator to solve. In a perfect world, the sending server would require the actual account holder to authenticate himself with his password before accepting the message. We haven’t solved this problem, but by reducing the number of servers authorized to send “from aol.com” mail, we have turned an impossible problem into a somewhat tractable one.

But maybe you don’t need to authenticate the whole email address. Even if the sender’s system doesn’t authenticate the account in each message, you can still report misbehavior back to the responsible person for the domain easily. It would then be up to the domain holder or ISP powering the server to track it back to the real sender some other way (IP, dialup logs, etc.) Most mail servers already check the originating IP against a list of “known good local” IPs, so if you kick the spammer off your network, they will not be able to send mail from that domain anymore. It is to the ISP’s advantage to authenticate the message to an account when sending, but even if they don’t do this, they are still held responsible for the behavior of the server and the users behind it.

In a more perfect world, we would also be able to check the user credentials sometime later to see if the user account was cancelled after the message was sent. This is more difficult and actually less important anyway, since you can easily revoke someone’s sending credentials at the sending server, and if this has been done, it becomes much less important to flush out already-sent messages. Besides, if you are able to verify whether a sending address is valid, spammers would then use this against you, to clean up their lists, and to guess and try to find more valid addresses. So, you don’t really want to serve out information like “Is address xyz@xyz.com valid?” What you really want to serve out is the answer to “Message 12345 claims to be from xyz@xyz.com, is this correct?” If you have this ability, you can check messages after the fact and maybe purge spam from the system after it has arrived, but before it is read by the receiver. But again, this is less important and more difficult, so it is probably not a “core” solution - we should save this one for use after we have gotten rid of 99% of our spam by other easier means.

2. Getting a dialup account is easy. Tracing back to a real person is hard.

There are thousands of new users getting on the Internet every day. How can a busy ISP keep up with them all, and weed out the spammers, and still expect to make some kind of profit? Spammers sign up for new accounts all the time, and the ISP cancels them as soon as they are abused, but they can just turn around and sign up again. You just need a credit card, and the ISP can’t afford to spend time verifying your mailing address, phone number, other ID, etc. Even worse, once you are identified as an abuser, you can come back to the same ISP again and again with a new credit card (or even the same card?) and make up phony names or whatever. If the ISP gets wise and learns your credit card number, you can go to the next ISP on the list.

ISPs are prevented from sharing information with each other, to protect the privacy of their users. Unfortunately this also protects the spammers, because the ISP can’t compare notes on who the spammers actually are.

What if ISPs could report back to a central authority (such as a credit reporting agency, or the bank itself) when an account purchased with a certain card is abused? Using a credit reporting agency is extra expense for the ISP accepting the signups, but this would be more thorough, since any abuse on any card tracked back to a user (or his SSN) could be flagged and reported even if they use a new card. In an ideal world, your online behavior could be tracked and correlated to you just like your credit and financial history - perhaps someone with a low rating would only be allowed to send mail messages to other people who had emailed him first, or be limited to 5 or 10 messages per day, or something.

This problem is more expensive to solve, so perhaps there is another, simpler idea that makes this less important. The next section might be a better starting point.

3. Sending in bulk is easy.

Once you have a dialup account, you can send any number of messages, (possibly forged, but see #1). You are limited only by the bandwidth of your connection. Any node on the internet that can be used to browse the web (port 80) can also be used to send mail to other mail servers (port 25). At first glance, the ISP can’t really tell that you’re sending out spam and not browsing the web. Setting up alarms and triggers to analyze who is sending mail messages versus who is just clicking on web pages is expensive, time-consuming, and not quite accurate or helpful.

One thing ISPs can do (and I believe some already do) is to prevent users from connecting to other servers on their port 25. This is an easy router change. This forces any messages going out from that ISP to go through the legitimate email servers for that ISP. This makes it harder for legitimate users to run their own mail servers (such as DSL users who have their own Unix machines and want to run their own domains) but they can still do this if they set up their sending server to always relay to the ISPs server (or just ask for an exception to the port-25 policy, if they are already a trusted customer).

Forcing mail to go through certain servers makes it easier to enforce other rules, such as limiting their sending rate to 10 messages a day, or something appropriate like that. This is another thing that some ISPs will do and others won’t, but it is in their best interests to do so. The idea is to keep the spammers from coming back, and the only way to do that is to make it expensive on a per-message basis to send spam through their network. If you can pay $19.95 for the account and send 1 million messages, and get 100 to 1,000 sales, that might be worth it, but if that ISP limits you to only sending 100 or 1,000 messages before you get caught, you will move on to the next ISP. ISPs that do this will spend less on abuse complaints, and those that don’t will get picked on more and more. If this keeps up, it becomes infeasible for the ISP to do business the old-fashioned way, and it becomes infeasible for spammers to spam compared to the cost of other legit advertising.

There are some senders who will need to send bulk mail for legitimate reasons. For example, amazon.com sends thousands of messages saying “Thank you for your order”. But, they are not dialup, and we assume that their ISP has an arrangement with them to behave in an ethical fashion. If they are not hiding their identity, and if they are willing to take complaints themselves and resolve them without effort by the ISP, they can be allowed a greater margin of freedom.

I had some other ideas here but I am not going to write about them right now. Perhaps later. Here are some notes.

4. Establishing trust is hard.

Compare email sending for a domain to registering SSL certs. Discuss whether we could use the exact same type of certs to verify a domain or a server as a valid sender. Is this an issue or a useful tool? Anyone can register a new domain, and many domains have phony contact info. ISPs should require the contact info to be real before serving up the domain. Registrars should make it harder for someone to register a new domain for the purposes of spamming.

5. There is always a bad apple. How do you define “spam”?

What to do if someone successfully authenticates and their mail is not forged, but it is still offensive to the recipients? Each ISP or each bandwidth provider should be able to enforce sensible guidelines. Even if forging a From: address is difficult, this still doesn’t prevent someone from sending offensive messages from that domain, it just means that it’s easier to track back to them when they do. The industry should agree on what are appropriate guidelines for mail usage. How do we do this? Discuss the role of spam reporting services in publishing “bad apples” which many users have complained about.

I’m interested in anyone’s thoughts, if you are also interested in email server administration and reducing spam. Please give me feedback. Also, are there any communities, mailing lists, or other forums that people are already using to discuss topics like these? I’m interested in any references to forums that host discussions on topics like this one.