I have been thinking about ways to alter email programs (readers, servers, protocols) so that sending spam would be more difficult, and hopefully more expensive and less feasible for spammers. Yes, this is a tremendously difficult problem, but it is also a very important one, so much so that I am surprised it hasn’t been solved already.
A friend posted this link which is a filtering system based on this idea. It’s a very interesting way to judge whether something is spam based on parsing out all the words and tokens from two large containers of mail: one with spam, and one with your legitimate mail. It’s a cool idea because it is adaptive, and the criteria is set by each user (or each group that chooses to share info about good and bad messages).
However, I still think that we need to start thinking about the next generation of mail servers and how they are going to behave. Email is such a great productivity tool, even with the spam, but it is quickly becoming harder to use (and more expensive) due to spammers.
How to improve the email infrastructure
I believe there are a number of problems with the current email infrastructure and that those flaws are being exploited by spammers. A lot of these factors are historical and may not even be required parts of the system anymore. All the factors add up to “it’s harder to catch spammers than to send spam”. More importantly, sending spam is profitable and cheap; catching spammers and filtering out spam is expensive and unrewarding.
Since this is such a difficult problem, I don’t believe that any single solution will solve it. But, there may be several small solutions that are easier to implement than one difficult one.
Anyway, my emphasis here is on methods that are cheap and simple to implement, and which make it more expensive and infeasible for spammers to do business as usual.
1. Forgery is way too easy. Reporting spam back to the real admin is hard.
This has to do with the “trusting” nature of the early email servers and protocols. The idea was that any server could “relay” mail for any other server, on to its ultimate destination, and there was no mechanism for verifying the From: address and correlating it to the actual node which originated the transaction. You need a password to get your mail, but you don’t need a password to send mail, so you can freely send mail claiming to be from someone else. Also, most mail servers accept mail from any other node on the Internet, whether or not the machine actually starting the transaction is a bonafide representative of the domain represented in the From: address.
How do we fix this? Is it even possible to teach mail servers to only accept mail from other servers if the From: address correlates with the sending server?
Authentication of a message requires two parts: 1. is the message really from the sender named on the envelope, and 2. is the sender’s account still valid. There are complicated ways of issuing certificates and using split-key encryption to accomplish this; PGP is one such mechanism. But maybe there is a simpler way to get what we want. Really, what we want to know is, is this From: address really a valid address that we could send mail back to, and did this message really come from that address.
One way to do this might be to set a simple rule that says that servers who send mail out from a certain domain must be affiliated with that domain in a way that is hard to fake, such as publishing the server name in DNS. One brute-force way to do this is to make sure that the MX records (listing the servers authorized to receive mail sent TO my domain) also include all possible sending servers for that domain as well, so that mail FROM my domain also comes from one of the MX machines. Even if a server doesn’t accept email on port 25, it is probably harmless to add it to the MX list. (Most servers already do a DNS lookup to verify if the domain really exists.) A slightly more elegant variation on this would be to publish some other type of record (similar to MX) that identifies authorized senders. In either case, the sending out would have to be authorized by the actual owner of the domain. There has to be a trust relationship between the domain owner and the server owner, and if that trust is broken by either, no messages can be sent from that domain using that server.
This is sort of a “first pass” – you have authenticated just the domain part, which at least proves that the domain is for real and that someone affiliated with that domain probably sent the message. But we haven’t authenticated the full address, and any AOL.COM user could still send mail out claiming to be some other AOL.COM user. This would be up to the server administrator to solve. In a perfect world, the sending server would require the actual account holder to authenticate himself with his password before accepting the message. We haven’t solved this problem, but by reducing the number of servers authorized to send “from aol.com” mail, we have turned an impossible problem into a somewhat tractable one.
But maybe you don’t need to authenticate the whole email address. Even if the sender’s system doesn’t authenticate the account in each message, you can still report misbehavior back to the responsible person for the domain easily. It would then be up to the domain holder or ISP powering the server to track it back to the real sender some other way (IP, dialup logs, etc.) Most mail servers already check the originating IP against a list of “known good local” IPs, so if you kick the spammer off your network, they will not be able to send mail from that domain anymore. It is to the ISP’s advantage to authenticate the message to an account when sending, but even if they don’t do this, they are still held responsible for the behavior of the server and the users behind it.
In a more perfect world, we would also be able to check the user credentials sometime later to see if the user account was cancelled after the message was sent. This is more difficult and actually less important anyway, since you can easily revoke someone’s sending credentials at the sending server, and if this has been done, it becomes much less important to flush out already-sent messages. Besides, if you are able to verify whether a sending address is valid, spammers would then use this against you, to clean up their lists, and to guess and try to find more valid addresses. So, you don’t really want to serve out information like “Is address firstname.lastname@example.org valid?” What you really want to serve out is the answer to “Message 12345 claims to be from email@example.com, is this correct?” If you have this ability, you can check messages after the fact and maybe purge spam from the system after it has arrived, but before it is read by the receiver. But again, this is less important and more difficult, so it is probably not a “core” solution – we should save this one for use after we have gotten rid of 99% of our spam by other easier means.
2. Getting a dialup account is easy. Tracing back to a real person is hard.
There are thousands of new users getting on the Internet every day. How can a busy ISP keep up with them all, and weed out the spammers, and still expect to make some kind of profit? Spammers sign up for new accounts all the time, and the ISP cancels them as soon as they are abused, but they can just turn around and sign up again. You just need a credit card, and the ISP can’t afford to spend time verifying your mailing address, phone number, other ID, etc. Even worse, once you are identified as an abuser, you can come back to the same ISP again and again with a new credit card (or even the same card?) and make up phony names or whatever. If the ISP gets wise and learns your credit card number, you can go to the next ISP on the list.
ISPs are prevented from sharing information with each other, to protect the privacy of their users. Unfortunately this also protects the spammers, because the ISP can’t compare notes on who the spammers actually are.
What if ISPs could report back to a central authority (such as a credit reporting agency, or the bank itself) when an account purchased with a certain card is abused? Using a credit reporting agency is extra expense for the ISP accepting the signups, but this would be more thorough, since any abuse on any card tracked back to a user (or his SSN) could be flagged and reported even if they use a new card. In an ideal world, your online behavior could be tracked and correlated to you just like your credit and financial history – perhaps someone with a low rating would only be allowed to send mail messages to other people who had emailed him first, or be limited to 5 or 10 messages per day, or something.
This problem is more expensive to solve, so perhaps there is another, simpler idea that makes this less important. The next section might be a better starting point.
3. Sending in bulk is easy.
Once you have a dialup account, you can send any number of messages, (possibly forged, but see #1). You are limited only by the bandwidth of your connection. Any node on the internet that can be used to browse the web (port 80) can also be used to send mail to other mail servers (port 25). At first glance, the ISP can’t really tell that you’re sending out spam and not browsing the web. Setting up alarms and triggers to analyze who is sending mail messages versus who is just clicking on web pages is expensive, time-consuming, and not quite accurate or helpful.
One thing ISPs can do (and I believe some already do) is to prevent users from connecting to other servers on their port 25. This is an easy router change. This forces any messages going out from that ISP to go through the legitimate email servers for that ISP. This makes it harder for legitimate users to run their own mail servers (such as DSL users who have their own Unix machines and want to run their own domains) but they can still do this if they set up their sending server to always relay to the ISPs server (or just ask for an exception to the port-25 policy, if they are already a trusted customer).
Forcing mail to go through certain servers makes it easier to enforce other rules, such as limiting their sending rate to 10 messages a day, or something appropriate like that. This is another thing that some ISPs will do and others won’t, but it is in their best interests to do so. The idea is to keep the spammers from coming back, and the only way to do that is to make it expensive on a per-message basis to send spam through their network. If you can pay $19.95 for the account and send 1 million messages, and get 100 to 1,000 sales, that might be worth it, but if that ISP limits you to only sending 100 or 1,000 messages before you get caught, you will move on to the next ISP. ISPs that do this will spend less on abuse complaints, and those that don’t will get picked on more and more. If this keeps up, it becomes infeasible for the ISP to do business the old-fashioned way, and it becomes infeasible for spammers to spam compared to the cost of other legit advertising.
There are some senders who will need to send bulk mail for legitimate reasons. For example, amazon.com sends thousands of messages saying “Thank you for your order”. But, they are not dialup, and we assume that their ISP has an arrangement with them to behave in an ethical fashion. If they are not hiding their identity, and if they are willing to take complaints themselves and resolve them without effort by the ISP, they can be allowed a greater margin of freedom.
I had some other ideas here but I am not going to write about them right now. Perhaps later. Here are some notes.
4. Establishing trust is hard.
Compare email sending for a domain to registering SSL certs.
Discuss whether we could use the exact same type of certs to verify a domain or a server as a valid sender.
Is this an issue or a useful tool?
Anyone can register a new domain, and many domains have phony contact info.
ISPs should require the contact info to be real before serving up the domain.
Registrars should make it harder for someone to register a new domain for the purposes of spamming.
5. There is always a bad apple. How do you define “spam”?
What to do if someone successfully authenticates and their mail is not forged, but it is still offensive to the recipients?
Each ISP or each bandwidth provider should be able to enforce sensible guidelines.
Even if forging a From: address is difficult, this still doesn’t prevent someone from sending offensive messages from that domain, it just means that it’s easier to track back to them when they do.
The industry should agree on what are appropriate guidelines for mail usage. How do we do this?
Discuss the role of spam reporting services in publishing “bad apples” which many users have complained about.
I’m interested in anyone’s thoughts, if you are also interested in email server administration and reducing spam. Please give me feedback. Also, are there any communities, mailing lists, or other forums that people are already using to discuss topics like these? I’m interested in any references to forums that host discussions on topics like this one.
The idea of using a neural net or whatever AI technique is whizzy this month to filter out spam has been around for quite a while; I’m not sure why it hasn’t gotten more usage. (Possibly because the inventiveness of spammers is endless, so the filter would always need more training, and people want something that just makes spam disappear without them having to do anything?)
The problem with making everyone behave and not do anything that makes life easy for spammers is that it requires everyone to spend money (following procedures properly is always more expensive than following them in a half-assed way or not at all), which businesses are loathe to do, so you end up trying to legislate what constitutes aiding and abetting spammers and then you have the Office of Homeland E-Mail and it’s just all bad. (Admittedly, I am innately distrustful of all proposals that boil down to “Privacy and anonymity on the part of any citizen and should be abolished in the public interest” but I don’t think I am unreasonable for being so.)
Observe that paper mail is pretty spammy, and has been for decades, but is still a useful tool, mostly because people are good at discarding junk mail. I suspect that once people become used to email (as opposed to the current situation where 90% of all users have just bought the Internet from AOL and have no idea what’s going on) spam will become less of a problem.
Hm. Before people realized that the problem with the Internet was going to be all the losers with $19.95 a month to spend on annoying everyone else, there was much discussion for how you’d inevitably need your own personal AI agents to filter the huge piles of content for what you really wanted. But we don’t have personal AI agents, even though the MIT Media Lab keeps talking about them; we have search engines.
And are there not spam-filtering services available for pretty much anyone to use? Admittedly, you have to pay, but that’s because a service that took all the spam out of someone’s incoming email stream and then added in advertisments to pay for the server farm would get a lot of the hairy eyeball.
I’m not sure I’m reaching any useful conclusion here, except that I don’t think we can conclude that the spam issue is as dire as it’s often made out to be, because the situation is still very much in flux.
Privacy vs. impunity
I agree that preserving privacy and personal freedom is important. However, I don’t think anything I proposed is contrary to that. Perhaps the topic touched on something else you were already thinking about, because I don’t think your point about privacy and anonymity and new legislation is directly related to what I said. However, I will take the opportunity to ramble about privacy and to what extent spammers are entitled to it…
Take example 1, where I decide to sign up for an email account with the name “firstname.lastname@example.org” and hide my real identity, that’s between me and my ISP, and they will probably not reveal my real identity, so as long as I don’t break any laws I should have some level of anonymity.
Now take example 2, where I sign up for an internet account just to abuse their resources and I want to keep sending as long as possible before they find me and close me down. The abusive messages I send out are still anonymous, in that they don’t point back to my real identity, and even if I abuse the service and get kicked off, the ISP will probably protect my privacy anyway since they don’t want to get sued. But, if I want to send out forged messages as well, I have another layer of anonymity applied, not to protect my real identity, but to mask my online presence as well, so that the complaints about my abuse will go to email@example.com instead of firstname.lastname@example.org and I get to keep spamming for a few hours longer. I don’t think you can argue that I have legal rights to the additional layer of anonymity, any more than you can argue that it’s an intrusion on my right to privacy to ask me to leave the store after harassing other customers. In most cases, the spammers abuse the hell out of their contract with the ISP, and the ISP kicks them out, but they still haven’t broken any laws, so the ISP is obliged to protect their privacy. (If they are suspects in a real criminal investigation, the judge may warrant a search of their email, but I assume much of the same rules apply as search of a home).
Anyway, I don’t think we’re really at odds here… I wasn’t really proposing that we should impose strict Moral Codes on people or ISPs, or that we should require ISPs to spend large amounts of money, etc. I’m arguing quite the opposite. I’m saying that 1. passing new laws doesn’t help, that just requires more people to work harder at tracking spammers, and 2. imposing new requirements on ISPs doesn’t work either, unless we make it valuable in terms of savings elsewhere (like less work for the complaint department).
Bottom line for me is, we really need to focus on “simple” solutions, that don’t require added expense for ISPs and users, but make it more expensive for spammers to operate. Things are weighted totally in the other direction right now, so spammers continue to win. I’m looking for the economic angle…
Re: Privacy vs. impunity
Yah, I started reading Whole Wide World by Paul J McAuley the other day, but you can probably achieve the same effect by reading the stuff on Slashdot about the EU requiring all ISPs to keep all email for a year just in case law enforcement decides to want it. (Why yes, McAuley is a British writer and WWW is set in a near-future UK, why do you ask?)
So yes, I was already primed to be very dubious about any proposal other than “filter your own damn incoming mail”, because WWW is pretty dystopic in that British “we haven’t had a functional economy since WWII and we’re kind of used to it” way.
I’d say that anonymizing or forging return headers on email should be treated equivalently to putting no or false return addresses on paper mail, except that I don’t actually know how that’s legally regarded. I would guess that omitting a return address is your own problem but using a false one is within hailing distance of mail fraud, but I’m speaking entirely ex burro.
I’m not sure what can be done to make spamming more expensive without also making real email more expensive. Write some Open Source tools to automatically discard email from an account that sends more than 3/minute or whatever, I guess, with the exemptions for eg Amazon you mentioned. (But what does newamazon.com do in its first month or quarter or whatever of operation, before it’s satisfied the ISP that it’s legit?) You could make email cost $0.01 or other nominal fee per message, but that has to be implemented at the level of the spammer’s ISP; if you try to do accounting for all the email att.com users send over sprint.net to people all over the world, your head explodes. Deposit on your ISP account, like when you rent a car or something, so if you spam you lose the money? If it’s a big deposit, though, noncompliant ISPs get more customers and you’re back to square zero…
Basically, I’m not seeing any feasible way to impose restrictions at the ISP level without making it mandatory; spammers can do the Business-at-Internet-Speed thing to switch to noncompliant ISPs faster than society-at-large-and-slow can determine that those ISPs are really evil. So, recipient-level filtering it is.
I like the idea of stochastically (probabilistically)
checking for spam. There is a risk that as spammers
become aware of the anti-spam techniques, they’ll
be able to spoof them.
As an aside, as far as implementations go, is this
something you would want to run yourself (say, in
your mail client) or want your ISP to run on your
behalf? It depends on whether this is a technique
designed to reduce the amount of spam a user has
to process (read) or the amount that an ISP has
to pass on to users. If the techniques could be run on the
ISP’s mail queues, it could reduce the amount of
spam that gets into the user’s inbox, but it might
also prevent some legitimate mail from getting
through. There are users who want to receive
net coupons and such so they need to be able to
continue to do that, and they also need tools that
will enable them (or their ISPs) to configure
their spam filtering easily.
I probably would be interested in using the Bayes-dictionary type spam detection system, once it becomes more refined. The techniques it is talking about in the link are very interesting, for much the same reason that searching the web by keywords is interesting. I like the idea of using spam that other people report to classify my own mail. The dictionary method requires a huge library of spam, and a fair amount of “good” messages as well. I would probably use my own saved mail as the “good” sample and I would want to share with others to build a library of known-bad messages.
The link isn’t really what I wanted to talk about in my post… It just got me to thinking about some other spam-related topics I have already been kicking around in my head, in discussions with friends, etc. I’m not a mathematician or rocket engineer, so I can’t save the world from spam by coming up with new theories and artificial intelligence. I am an expert system administrator though, so perhaps I can come up with new ideas for system administration and influence other sysadmins.
The main point I was trying to get at is that we have to make it more expensive for spammers to do business, or they will keep on doing it. The response rate for spam is abysmally low, but since it is virtually free to send, they do so anyway. (Unlike direct mail where there are other constraints like cost of materials and postage). I am looking for an economic angle to change the balance.
Do you think these ideas will increase the cost
of Internet access for end users? If they
increase the cost substantially, end users may
complain that either spammers should be outlawed,
or that they should have to pay extra.
I haven’t spent a lot of time thinking about this,
but my guess is that the strongest solution will
probably involve some sort of charging for email,
which would probably have to be applied to all