Leveraging Your Content for Website Publishing & Collaboration
Pissed off users. Stopped spammers for 2 weeks.
Reject multiple sequential 2-byte characters on english language wiki. Stopped spammers for 2 weeks.
Problem was rejection notice - tells spammer that things don't work, so they try something else
Identify messages as spam, but them in tarpit. Spammers get shown spam, we and Google see normal content.
Need to detect spammers. Based on IP addresses... (1) no reverse DNS lookup (90% of spam does not have set it up); (2) comment poster not logged in.
Worked well for some time.
Poker, mortgage etc. posts.
Obvious spammers.
Hidden spam - hidden div, height of text = 1px, etc. Need to disallow styles in wiki.
...various amusing examples of spammers....
Spammers may put secret hashes on their posts to be able to check success through google.
Cross-site scripting (e.g. via an attachment on another site). May bypass javascript filter by spelling JaVaScript with mixed case.
No-follow links will stop spammers? Does deny google-juice to spammer, but they will try anyway.
CAPTCHA? PWNtcha defeates this 80-100%, 40% for good once. Or, human defeat - people do this for 1$/hour.
Spam is created by SpamBots? No, there are a lot of humans doing this.
Blocking IP addresses? No more - botnets. Botnets rent out time. Botnets use multiple IP addresses - e.g. 7 hour attach, changing 300 pages, with 120 IP addresses (likely, infected PC's).
Spam Resistant Wiki
Login with valid email will keep spam off, but... not as inviting.
Integrated tar pit - individual pages can be marked as spam.
RSS Feeds to detect changes to the site.
Approval of posts by users promotes or demotes author.
IP filtering - anonymous users can be marked as spam by ip.
Content filtering of URL's.
"No minor changes" - minor changes are not listed in history: very effective.