Ticket #765 (new enhancement)

Opened 14 years ago

Last modified 14 years ago

Should a protection against wiki spamming be integrated in CPSWiki and how?

Reported by: madarche Owned by: tziade
Priority: P3 Milestone: unspecified
Component: CPSWiki Version: TRUNK
Severity: normal Keywords: spam wiki keywords
Cc: laurent.pelecq@…

Description

The best protection against wiki spamming is forbidding anonymous access/contribution. But most of the times it is really more efficient to let anonymous access/contribution enabled. So it would be interesting to think how CPSWiki could provide a protection against spam.

Please drop your ideas in this ticket.

Some solutions:

  1. Use a vocabulary of spam words and donnot commit changes containing those words or words close to the spam words
  1. Provide moderation of commits
  1. Provide post-moderation of commits

Change History

comment:1 Changed 14 years ago by lregebro

Post-moderation is easily done if you have a change log. Does this exist? In Wikipedia you then just go in and view the revision history, and the last one you like you re-save.

A spam-word vocabulary could be combined with not letting anonymous people commit these changes, but logged in users still should be able to, I think.

Moderation of commits is very un-wiki...

comment:2 Changed 14 years ago by tziade

imo we should do post-moderation because:

  • moderation is giving up on wiki spirit
  • a black word list can't be efficient at all and can almost always be avoided by spammers, or it must be adaptive with a bayesian classifier but we need a "un-spam" function with it, (ain't this becoming heavy?)

post-moderation can do the following:

each commit is published and a mailis sent to the reviewer with the content. elow the content, a direct link will let the reviewer directly remove the post

more thaught here:

 http://wiki.tcl.tk/12559

comment:3 Changed 14 years ago by tziade

good resource to look at in this subject :  http://chongqed.org/

their list of known spammer can be interesting to use in public wikis maybe

comment:4 Changed 14 years ago by madarche

I 100% agree that moderation is un-wiki. It was just listed here so that all options were considered.

As for post-moderation, I have the use case of a wiki which is spammed every day by a spam-bot.

So post-moderation cannot be the solution in this case since it would require me to reject the change everyday. This is too laborious.

Black word adaptative list seems more efficient.

comment:5 Changed 14 years ago by tziade

let's do adaptive post-moderation then:

For each post you receive a warning if CPSWiki has decided it's a spam.

In this warning you have a "unspam" link, that will:

  • train back the bayesian classifier to tell him it was not a spam
  • commit the post

This way, the system will learn *your* way of moderating.

The only caveats I see is if suddenly, everyone starts to talk about viagra in your wiki :)

comment:6 Changed 14 years ago by madarche

Great! I'm convinced! Let's just wait a bit if others has some more ideas.

comment:7 Changed 14 years ago by fguillaume

In any case, you need a blacklist of words or URLs.

I think Bayesian is overkill for a start.

comment:8 Changed 14 years ago by fguillaume

  • Keywords spam wiki added; spam, wiki, removed

comment:9 Changed 14 years ago by lpelecq

A blacklist of regular expression like  http://blacklist.chongqed.org/ or  http://moinmaster.wikiwikiweb.de/BadContent would be enough.

Spammer adds their web sites urls to increase their page rank. It is not necessary to filter on something else than urls.

It takes time to train a bayesian filter and it is much easier to implement a blacklist.

Also all wiki generated links could have the attribute rel="nofollow" so that google at least don't consider them for page ranking:

 http://www.google.com/intl/en/googleblog/2005/01/preventing-comment-spam.html

comment:10 Changed 14 years ago by Tarek Ziadé <tz@…>

if it's just for url spammers i'm also convinced on black list solution

comment:11 Changed 14 years ago by madarche

I agree with Laurent that we should implement a black list of URLs.

But I would rather not implement the rel="nofollow" for the wiki content. The reason is that links in a wiki are valuable knowledge, and it is legitimate that those links if mentionned shoud have their ranks increase in search engines like Google.

The black list check should be checked on or off, and be off by default.

comment:12 Changed 14 years ago by fguillaume

I think you mean "should be on by default". Security first.

comment:13 Changed 14 years ago by madarche

No, I was meaning "off by default" because this way the wiki would always work as expected by the user.

The user might not understand why some modifications don't work otherwise. But if the user has explicitly asked for spam protection she would not be surprized by the fact that some modifications are not accepted.

But if the rejected modification is presented with a clear error message, the protection can be "on by default" I agree.

comment:14 Changed 14 years ago by tziade

this might be a little but overkill but:

Maybe we could specialize Wikis in the creation process by an small option ? to decide wheter to activate

intranet wiki / extranet wiki

(...thinking on doing less processes in big intranet wikis)

comment:15 Changed 14 years ago by fguillaume

Or ask a few parameters when you create a new wiki, one of which would be this setting.

comment:16 Changed 14 years ago by lpelecq

It would be better to provide the black list as an url. If there are several wikis, it wouldn't be convenient to copy the same black list for every wiki.

Note: See TracTickets for help on using tickets.