« a Sun a Dog and a Rat in Berkeley | Main | End of Diplomacy »

Increasing Marginal Cost on Spam (the mail, not the meat)

Lee Tien of the Electronic Frontier Foundation and of the PUB (our local hangout) posed a challenge to me after I asked him for an un-paid internship at EFF. The challenge is to think of a way to increase marginal costs on people who sent spam emails. Here's the situation: since there is no marginal cost to sending spam (each additional email they send out doesn't cost them anything in terms of cash or time), they would overwhelm the carrying capacity of the internet, slowing down legitimate information exchange by crowding the internet with psuedo viagra ads and pathetic porn come-ons (I mean, can't they at least be funny or creative?). Recently, AOL said they had to filter A BILLION pieces of spam in a single day!

Spam is a tough problem to solve. Some have suggested government involvement by legislating spam. A recent proposal in California would require spammers to put "ADV:" in the subject field of all email advertisements. Another solution is to have people use spam filters at the receiving end to sort out the junk. There is a major problem with these two solutions. Basically, it doesn't solve the "crowding out the network" problem related to spam's zero marginal cost.

My idea is simple. It includes a tiny bit of consumer activism and a centrally located database. Recipients of what they consider to be spam would forward messages to this database. The database would use a Bayesian filter to decide whether the message being forwarded was indeed spam or legitimate mail. At the centrally located database, volunteers would feed the system samples of legitimate mail and samples of spam mail to build up the initial confidence levels of the Bayesian filter. For more information on Bayesian filters, see Arnold Kling's recent explanation.

The system would also weight the number of messages originating from any particular email address. If the filter decides that the messages are indeed spam and enough recipients forward the message to statistically reject the null hypothesis (null: not spam), the system will publicize the email address as a "spammer". Now, this database would not deal with enforcement. It would simply be a database of spammer email addresses. It would be up to individual ISPs to enforce spammers. It would work because the cost of spamming and crowding the networks affects ISPs directly and they have the incentive to enforce. So in effect, you have a centrally located database that ISPs would use to displace spammers. There are three major advantages to such a system:

1. The more spam mail you send out, the more likely you'll get on the spammers list, thus increasing the marginal cost to each additional piece of spam email you send out. Caveat: consumers bear some reporting cost that is balanced by the reduction of Spam in their mailbox. It will take time to see if that cost/benefit creates a workable system.

2. Bayesian filtering and recipient reporting reduces the dictatorial nature of the current MAPS system. It replaces a autocratic system with a democratic one. It also gets the government out of the process (which many people would view as a good thing).

3. The system of enforcement would be robust and distributed. It would be in the interest of each ISP to remove people on the "spammers list" and therefore the costs and benefits are well assigned. ISPs cleaning up spammers is analagous to restaurants cleaning up after the dirty customers, it's a cost that they should be willing to shoulder.

Okay, there's a fourth important point:

4. If a non-profit, like the EFF created and maintained the system. It would add some balance to the electronic battle between the dark forces of Industry vs. the naive Consumer.

Now the harder problem... how do I go about getting that un-paid internship?

Comments (1)

Ethan Lewis:

Hey, man, I'm spamming your blog!

This is a potentially good idea for identifying spam email addresses. I identify what I consider spam and put it on my "auto-delete" list, and your suggestion, if I understand it correctly, is to record this for everyone and then based on some frequency test have a certain email addressed declared a "spammer." (My method, it should be noted, does not reduce internet traffic, it just reduces my personal annoyance
level.)

My suggestion is that you elaborate more on how this information could be used to increase the costs of sending mass emails. You say it would be up to the ISPs, but I think it would be helpful to suggest what they could do with the information that would have the desired effect of reducing wasteful internet traffic.

Also, a harder problem: you probably know more about spam then me, but one way unscrupulous people seem to get around being blocked is by constantly generating new free email accounts with absurd addresses like x7734xy2@yahoo.com.
I have no idea was percentage of "spam" is of this nature, but it'd be great if someone could figure out a way to address that.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on March 7, 2003 1:25 AM.

The previous post in this blog was a Sun a Dog and a Rat in Berkeley.

The next post in this blog is End of Diplomacy.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33