Talk:Spam blacklist

From IcehouseOrg
Jump to: navigation, search

Flash! We're preparing to move to a whitelist instead. See bottom for details and how you can help

The spam blacklist is a tool for blocking certain kinds of outbound links from the wiki.

What's on the Spam blacklist?

The spam blacklist are the common parts of URLs that spammers have tried adding to wikis. If any submitted page contains an external link (URL) that one of these pattern matches, that page is discarded. For example, http://imaspammer.p2l.info is a forbidden link, since it matches the pattern of a spammer who attacked this site starting around 23 May 2005. You may check what happens when a link is forbidden in the spam test sandbox, on this page.

Spam test sandbox
The sandbox allows any text. But it
forbids links to blacklisted sites.
Edit the sandbox

There are two times that the blacklist needs to be changed. The first one is when a new spammer appears. In this case, it is important to put a note on this page to document the allegation, and notify an Administrator who can add the necessary patterns to the list.

The second time the blacklist needs editing is when a blacklist entry is overbroad. This should rarely happen here, since nearly all of our outbound links lead to just a few closely-related sites. The same basic procedure applies, in reverse: Put a note on this page to document the problem, and notify an Administrator who can disable the overbroad pattern on the list.

This blacklist extension is shared by many other Wikis, so in fact instead of making only the change you request, an Administrator may download a newer blacklist (or a list of updates) from another site.

This list is very high overhead and only for hard cases which can't be dealt with using routine blocking methods. If IP blocks don't work and the spam is regular, it's a good candidate for listing here. Administrators here will use their discretion in adding and removing entries by the guidelines laid out on this page.

If you are having problems with the spam list and aren't a spammer please include a link to the article you are having trouble saving and say which URL (without the leading http part) you are told is blocked. Any administrator can edit the spam blacklist.

For removals because they conflict with a known significant spammer, please include them in the whitelisting desired section so they can be specifically whitelisted later, while still keeping the generic block in place.

Requests for removal

Add new items to the end of this section. For each one, include:

  • The URL the error message mentions without the http:// prefix.
  • Links to the article or articles you were editing.
  • See /completed removals for removals which have been processed (either removed or reasons given for choosing not to remove). If not removed, you can still use a plain text form of the link.
  • Many of these should probably go into the whitelisting desired section - if it's just a single URL, that's probably where they belong.

Requests for addition

Add new items to the end of the list. See /completed additions for additions which have been processed (either added or reasons given for choosing not to add).

If you do not provide enough information, nothing will happen.

For each request, please include:

  • Links to one or more page diffs which show the spam being added.
  • Include the URLs being promoted - they won't be added without a link to them. We need to document why we added something.
  • Do not include anything where you can't point to a diff showing the spam being added - we also have to ensure that false requests aren't made for addition or deletion.
  • Do not include http://, even inside nowiki tags, otherwise we will run into the spam filter while moving your entry if it is added.

Show restraint in requesting additions for:

  • Spammers who spam once.
  • Spammers who target only one article.
  • Spammers who can be effectively blocked with IP blocks - use those instead.

azzacash.com

(This one's on the list already, but we leave this entry here as an example of form)

Diffs: [1] [2] and [3]

URLs: Various subdomains of azzacash.com, such as buy-fioricet.1.azzacash.com

It looks like this is the only domain the new round of spam is linking to.

Seconded. This domain is already listed on the master spam blacklist. -- Rootbeer 16:39, 26 May 2005 (GMT)

Just reverted more azzacash.com spam a few mins ago. Stupid botnets. -- Jeremiah 17:40, 26 May 2005 (GMT)

Various domains with names of drugs

Diffs: [4] [5] [6] [7] [8]

URLs: These ones use a variety of different URLs. They mostly contain the name of various drugs, so maybe we could just filter on the names of those drugs. viagra, cialis, phentermine, vicodin, ambien, hydrocodone, tramadol, xanax seem to be the most popular ones. Also, subdomains of bzh.bz, weboficial.com, warp0.com are frequently used.

It seems that they are only targeting Help:Contents, but they're getting to be a real nuisance.

Matching against stuff other than the domain

It seems that the spam blacklist only matches against the domain name part of the URL. You can still link to URLs like http://example.com/viagra . There are a bunch of spams we've gotten that have blacklisted words in the URL but not in the domain. Would it be possible to fix the regexp matching so it matches the whole URL instead of just the domain? It might even be good to include matching against the link text, since that will frequently contain the words that people are trying to improve their PageRank on, which may be more consistent than the domains.

Ripway.com

Diff: [9]

Again, we have spamming that could have been blocked by filters on the entire link, instead of just the domain. I think we need to either implement that or the whitelist, because this is just getting ridiculous. You know, it seems funny to me that they would write bots that target MediaWiki, when MediaWiki has rel="nofollow" on by default, meaning that their spam won't actually improve their rankings in the least.

Whitelisting

I dunno about y'all, but I am tired of adding things to the blacklist, especially since the process is so manual. I am considering tweaking the spam-blacklist plugin to behave as a whitelist instead: that is, make it such that external links are only allowed if they match one of a short list of allowed domains. This is technically trickier, but doable in my view.

Although I would normally object to whitelisting as being too heavy handed, I feel like I've spent far too much of my recent life reverting edits by stupid spambots, so I'm a lot more amenable to the idea right now. The main issue is that it might be frustrating for someone trying to post their new game that's hosted on their own website to find their edit blocked because their site isn't whitelisted. At this point, we may want to just say "fine," since dealing with the spam is a much bigger pain in the ass. However, if you're going to have to get involved in the code either way, it may be possible to implement my suggestion above of blacklisting based on the entire URL and the link text, instead of just the domain as is already done. Many of the spams that we've gotten recently would have been filtered by a blacklist that matched various keywords in the entire URL and link. Just a though for something to try before resorting to a whitelist. — Lambda 22:40, 13 Dec 2005 (GMT)

It would be great if, in preparation for this, we could compile a list of legitimate outlink domains here. Here's a start:

  • arch-geek.net
  • archive.org
  • att.net
  • boardgamegeek.com
  • chushogi.org
  • creativecommons.org
  • crystalcaste.com
  • eblong.com
  • ee0r.com
  • geocities.com
  • icehousegames.com
  • icepackgames.com
  • invisible-city.com
  • looneylabs.com
  • msn.com
  • piecepack.org
  • playagaingames.com
  • superdupergames.org
  • tinyurl.com
  • wikipedia.org
  • willowpeterson.com
  • wunderland.com

An even better idea

As of Jan 05 2006, I've disabled edits for non-logged-in users. This is sad, but it's also very, very convenient, which is happy. - misuba 01:19, 6 Jan 2006 (GMT)