Maintaining a Publicly Available Blacklist - Mechanisms and Principles 89
badger.foo writes "When you publicly assert that somebody sent spam, you need to ensure that your data is accurate. Your process needs to be simple and verifiable, and to compensate for any errors, you want your process to be transparent to the public with clear points of contact and line of responsibility. Here are some pointers from the operator of the bsdly.net greytrap-based blacklist."
Using a blacklist ... (Score:5, Interesting)
And while we're at it, some hints on using a public blacklist with regards spam. The correct way is not to trust the blacklist 100%. Instead, you use it as one part of a comprehensive scheme (part of this complete breakfast). So, you may use a dictionary, and for every word in the dictionary you add 10 points (viagra, v1agra, v14gr4, etc.). You can use SPF [wikipedia.org] and if it doesn't match, then that's worth 50 points, and if it's not there, maybe 20 points. And if the domain or IP address is on a blacklist, maybe 40 points. You assign the points as you like. Then, if you hit 100 points, you mark the email as "probably spam".
But you never reject or mark an email spam just because it's on some blacklist. That's just stupid. Now I'm off to RTFA.
----
OK if you have your own blacklist (perhaps a list of domains or IP addresses that have sent email to a catch-all, or that have fallen into a honeytrap), then you do what you want. But you probably should date entries and remove old ones (if they do not misbehave again), in case a legitimate user is now at that location.
We run DNS-based lists (Score:5, Interesting)
... though they are not publicly-accessible; only accessible to our customers. Here's how they work:
Using our reputation-collection protocol [mimedefang.org], we receive a constant stream of events from our customers. An "event" is something like "IPv4 address x.y.z.w sent to a nonexistent recipient" or "IPv6 address abcd::1234 sent something that a human voted as spam"
Currently, we have a database of just under two billion events. Once an hour, we go through our database and categorize IP addresses as:
The whole system is 99.99% automated. The only manual intervention is when some requests delisting. If it seems that someone was the victim of a compromise and has now cleaned up his/her machine, we delist it for 45 days which is long enough for all events from that IP to expire. Then it goes back into consideration for automatic listing.
This system works really well. We have about 3.75 million IPv4 and 3300 IPv6 addresses on our lists; those are machines for which we have confidence that there's enough data to categorize them.