Flash! We're preparing to move to a whitelist instead. See bottom for details and how you can help
The spam blacklist is a tool for blocking certain kinds of outbound links from the wiki.
What's on the Spam blacklist?
The spam blacklist are the common parts of URLs that spammers have tried adding to wikis. If any submitted page contains an external link (URL) that one of these pattern matches, that page is discarded. For example,
http://imaspammer.p2l.info is a forbidden link, since it matches the pattern of a spammer who attacked this site starting around 23 May 2005. You may check what happens when a link is forbidden in the spam test sandbox, on this page.
There are two times that the blacklist needs to be changed. The first one is when a new spammer appears. In this case, it is important to put a note on this page to document the allegation, and notify an Administrator who can add the necessary patterns to the list.
The second time the blacklist needs editing is when a blacklist entry is overbroad. This should rarely happen here, since nearly all of our outbound links lead to just a few closely-related sites. The same basic procedure applies, in reverse: Put a note on this page to document the problem, and notify an Administrator who can disable the overbroad pattern on the list.
This blacklist extension is shared by many other Wikis, so in fact instead of making only the change you request, an Administrator may download a newer blacklist (or a list of updates) from another site.
Requests for removal
Requests for addition
(This one's on the list already, but we leave this entry here as an example of form)
URLs: Various subdomains of azzacash.com, such as buy-fioricet.1.azzacash.com
It looks like this is the only domain the new round of spam is linking to.
Just reverted more azzacash.com spam a few mins ago. Stupid botnets. -- Jeremiah 17:40, 26 May 2005 (GMT)
Various domains with names of drugs
URLs: These ones use a variety of different URLs. They mostly contain the name of various drugs, so maybe we could just filter on the names of those drugs. viagra, cialis, phentermine, vicodin, ambien, hydrocodone, tramadol, xanax seem to be the most popular ones. Also, subdomains of bzh.bz, weboficial.com, warp0.com are frequently used.
It seems that they are only targeting Help:Contents, but they're getting to be a real nuisance.
Matching against stuff other than the domain
It seems that the spam blacklist only matches against the domain name part of the URL. You can still link to URLs like http://example.com/viagra . There are a bunch of spams we've gotten that have blacklisted words in the URL but not in the domain. Would it be possible to fix the regexp matching so it matches the whole URL instead of just the domain? It might even be good to include matching against the link text, since that will frequently contain the words that people are trying to improve their PageRank on, which may be more consistent than the domains.
Again, we have spamming that could have been blocked by filters on the entire link, instead of just the domain. I think we need to either implement that or the whitelist, because this is just getting ridiculous. You know, it seems funny to me that they would write bots that target MediaWiki, when MediaWiki has rel="nofollow" on by default, meaning that their spam won't actually improve their rankings in the least.
I dunno about y'all, but I am tired of adding things to the blacklist, especially since the process is so manual. I am considering tweaking the spam-blacklist plugin to behave as a whitelist instead: that is, make it such that external links are only allowed if they match one of a short list of allowed domains. This is technically trickier, but doable in my view.
- Although I would normally object to whitelisting as being too heavy handed, I feel like I've spent far too much of my recent life reverting edits by stupid spambots, so I'm a lot more amenable to the idea right now. The main issue is that it might be frustrating for someone trying to post their new game that's hosted on their own website to find their edit blocked because their site isn't whitelisted. At this point, we may want to just say "fine," since dealing with the spam is a much bigger pain in the ass. However, if you're going to have to get involved in the code either way, it may be possible to implement my suggestion above of blacklisting based on the entire URL and the link text, instead of just the domain as is already done. Many of the spams that we've gotten recently would have been filtered by a blacklist that matched various keywords in the entire URL and link. Just a though for something to try before resorting to a whitelist. — Lambda 22:40, 13 Dec 2005 (GMT)
It would be great if, in preparation for this, we could compile a list of legitimate outlink domains here. Here's a start:
An even better idea
As of Jan 05 2006, I've disabled edits for non-logged-in users. This is sad, but it's also very, very convenient, which is happy. - misuba 01:19, 6 Jan 2006 (GMT)