What’s the big idea here?
Problem:
Any spam blocking list is either too specific or too small/ineffective to be noticed, or it is effective enough to get spammers to attack/threaten/sue its owners and DDOS its servers into the stone age.
Proposal:
Make a blocking system that:
- is fed by raw data from its members
- so that there’s not one person or group “making decisions”
- allows members to show their policies and see others
- so that users can see what blocks other people actually use
- can sort “policies” such as blocks according to how many use/support them
- so that effective policies can be quickly adopted by many
- can customize a list for each user according to his criteria
- so it’s not “all or nothing” – the database is not the one “blocking you”
- is based on signatures and a “web of trust”
- so you can quickly see policies from people you trust and whom they trust
- is massively decentralized, using a distributed storage/transport like NNTP
- so that anyone can download the source, run it, and
- bam, a copy of the DB and web site
- even if the primary site gets bombed into the stone age.
Basic data points:
Complaints. A single spam is a complaint. Track who submitted it, signed of course, and file according to the info from their trusted Received: line
(from-IP, Helo name, rDNS name, UTC time,
From: address, Return-Path, headers, body,
any comment from the receiver)
Blocks. Not just “hmm that might be suspicious” – a Block is a statement that says I am blocking this IP range and my users haven’t complained loudly enough to make me unblock it. (Alternately, this could be a range setting: Block on sight, accept with prejudice, unknown, accept relaxed, accept unconditionally)
Best policies, best practices. (Most policies would be “block this range” as above, but other policies/practices are cataloged too)
Members, and their public keys.
Endorsements, of a block, policy, practice, or trusted person.
More thoughts….
Quote from message 1:
What I think *most* of us would like is a block list that compiles the “best common practice” of hard-working dedicated spam-fighters like ourselves… otherwise it’s sort of “every admin for herself”.
The data points I would like to be able to compile and correlate might include:
Evidence of spam: “I received this. I consider it abuse. I can vouch for the correctness of the server/client info in the top Received: line.”
Rules people are actually using: “I am blocking 1.2.3.0/20 based on the following criteria: (x) Received a high ratio of spam to non-spam (x) Complained about abuse, abuse continues (x) owned by known spammer organization”
Statistics: “(AS12345,last_week) = spam=85798 good=134 unknown=3875”
Policies people are actually using: “Blocking external clients who give my own domain as HELO blocked X% of my spam with Y% false positive”
White list or spf-style server info: “We send only from IP 3.4.5.0/20. Please accept mail from outMX.mydomain.xx and accept with filtering from proxyMX.mydomain.xx and deny other mail claiming to be from us”
It would be cool if information like this could be collected all in one place and summarized. It would also be cool if other users of the site could sign (a) stuff they submit (b) stuff they agree with and will try to follow as well, and (c) other users who they trust. I would be *very* interested in seeing a list like that.
Perhaps it doesn’t have to be a “block list” per se? At least, not at first? I mean, if it is a block list, then some organization has to set “their” standard and “they” may be sued, targetted, or whatever. If the web site and database were just a forum for people to post their own data and see what others are doing, the owners of the web site wouldn’t be making decisions or setting policies themselves.
Some other ideas. How would you create a “totally decentrallized” data stream and tool for gathering/displaying it? Something like a newsgroup where signed messages appear as “input data” and various mirror sites (running some open source software) would take articles from NNTP and present summaries? A “summary” could be a list of rules or policies signed by at least 20 people, or by >20 people who are each trusted by >20 others, served as an spfilter-style rsync or http document that can be fed to rbldns or something? Also, users who decide to view/use the data could set their own level of participation… top 10? top 20? top 100?
Quote from message 2
>How high a ratio should be sufficient for listing? What kind of grace period should we allow after reporting abuse, before deciding that the abuse department isn’t doing anything about the problem? If these are to be listing criteria, we need to set hard numbers. As for the “owned by known spammer organization”, whose list are we basing this on? Spamhaus/ROKSO? “Known” by whom, and on what evidence? We’ve got to have some standards there as well, or else we open ourselves to claims of accepting hearsay evidence.
Actually, what I would most like to know is “How many other admins like me are already blocking these guys?” I was imagining a scenario in which each entry might be “supported” by multiple administrators, and the more “votes” it gets, the higher the block goes on the list. But in this scoring system a “vote” wouldn’t just be “Yeah, sounds good” – it would only count as “support” if we actually do block them.
Most of us would probably block based on our own judgement, and evidence we can verify with our own eyes (or logs). If someone posts a range, it might get my attention, but if I don’t know who is posting I may not block based only on that. But, if someone else’s logs agree with mine, I am more likely to adopt the block and log my “vote” for it. (The web site could also remember all the blocks I have personally “voted” for and give me a feed of only the ones I have selected, to encourage me to only vote for the ones I will actually use myself, and to get me to actually use them right away)
The initial submission “reason” along with any other “replies” can be listed as details of the record. The spammer or isp themselves could post a reply too, if they want to go to the trouble of signing up for an account and using their key or password or whatever. The “reason” and any “replies” can be viewed by people, but the “score” in this system would only be based on one thing: how many other sites are already blocking it.
(I clipped “sightings” in the above summary, but it’s still an important part… The “sightings” or “submitted spam” would be the “detail records” and the “policies” or “blocks” would tie together multiple submitted spams into a summary.)
>>> It would be cool if information like this could be collected all in one place and summarized. It would also be cool if other users of the site could sign (a) stuff they submit (b) stuff they agree with and will try to follow as well, and (c) other users who they trust. I would be *very* interested in seeing a list like that.
> This hits on another discussion point–how should spam reports by alliance members be handled? The suggestion here is some sort of “web of trust”, but without some openness to the process there’s a risk of this becoming a cliquish cabal.
A web of trust is one way to go if there is no organization at all, no agreed leadership, etc. At the time I wrote this, I was thinking of something “massively decentralized” where people would submit data onto nntp and some software would be available to read the incoming stream and sort it into policies with votes and signatures. That would be the “ultra-paranoid” system… if your #1 goal is not to have a centralized presence that might be attacked.
Once there are a couple of gorillas on the team, this might be a moot point. You can probably build a much nicer database/web site if you don’t have to worry about massive decentralization.
Anyway, the system I was imagining would give members some “clout” based on other members voting for them… In the case of voting for another member, it’s kind of like saying “I would probably do what this person suggests, absent other evidence, even if I hadn’t verified it”. Perhaps voting for someone could give you an individualized list of block-suggestions of the day for you to examine and start using, based on those scoring high “among those you trust”.
Interestingly, this also kind of side-steps the “How dare you? Who gave you the right?” factor, because each admin would be following the advice of others they trust, and each person might get a different feed of the policies… though there is a very large chance that the “top 5” spam contributors would soon get noticed and blocked by all, it’s really lots of little decisions that are making up your own site’s block policy. The “best practices clearinghouse” would have each person in control of her own system, but it’s easier and more collaborative.
A web-of-trust type of system is one way to allow entries/grade entries/etc but it’s not the only way. (I bet someone who is not as tired as I am right now could think of a couple more…)
>At the same time, should any member–no matter how new or inexperienced–be able to submit spam reports with the same degree of credibility? Perhaps a Razor-style “reputation”-based system might work, rating members based on their track record as submitters, and assigning a greater weight to their reports as their reputation scores increase. Reports with scores below some threshold might be treated as “pending” until verified and approved manually by someone with an established reputation.
Right. I wouldn’t want anything to be automatic. I also wouldn’t want one person to be voted into prominence because he made up 100 fake accounts and voted for himself with all of them… Length of time on the membership list might be a factor, as well as blocks or sightings submitted that others have later confirmed.
Data types
Members (Keys*, email*, agreement, email-verified, user’s info, preferences)
Keys (ID, data, owner)
Friends (member, friend-member) (group, friend-member)
Groups/Associations (group-ID, members*, owners*)
Trust/Proxy (member, trusted-member) (group-ID, owner)
Policy/practice (author, description, linked-cases*, linked-IP, linked-email, linked-domain*, obsoleted-by, related-to)
Votes (policy, member, Yes/No)
Complaints/Cases: (Submitted-by, signed-by*, IP, from-addr, reply-addr, URL*)
IP Ranges (IP/mask, owner-info, route-info, …)
Email addresses (email, domain, linked-cases)
Domains (linked-email, linked-URLs, linked-cases)
URLs (url, domain, linked-cases)
Search agents (can search cases, policies, domains, IP ranges, etc)
Peer servers (hostname, key-id, sync interval)
Note/Message (can be attached to just about any other data item, like blog comments)
Signatures (just about anything can be signed: membership agreement, keys, friendship, proxy auth, policy, votes, complaints, server/peer info, message)
Activities
Member sign up
Update keys
Post complaints/cases
Post policies
Post follow-up message
Search/view complaints/cases
Search/view members: add-friend, add-proxy
Search/view groups: join
On behalf of group: add/remove owner/authorizer
Search/view ip ranges
Search/view domains
Add/remove friend/association
Add/remove trust/proxy
Search/view policies: Most-popular, popular-with-friends, by category, by IP range, by domain
Approve/vote: (Known good, Probably OK, Neutral/Unknown, Probably bad, block on sight)
Download “my block list” (sendmail hash, bind, rbldns)
Query “my block list” ( host d.c.b.a.gconnor.custom-dnsbl.association.org )
(result: known-good, probably good, no info, probably bad, known bad)
(OR, query each list separately)