Wikipedia & Linterweb

3 February 2011

Archiving of Wikipedia external links: the problem has been fixed

Filed under: wikiwix — Matthieu @ 15:10

Last week there has been a very unfortunate incident with the cache system used, in particular, by the French speaking Wikipedia, cache system run by the web company Linterweb, and that allows to keep archives of external links used as footnotes inside articles.
What happened is that someone, while reading the article La_Quatrième Prophétie, checked the archive of the first footnote, thus getting the page saved in the cache system of our search engine Wikiwix. So far, all is normal.

Above the page displayed as it had been saved in our cache, we put some kind of information, like the URL of the archived page, the day the page was saved in our cache, how to contact us, how the webmaster of the site can prevent his site to be archived… in addition, for a few week, we had been adding the three top links of our Results in the recent web search engine new feature. These links are not ads. There are just links recently posted on Twitter, and related to the archived page, as determined by our search engine. Click-throughs do not generate revenues for Linterweb. These links are generated by our twitter-search algorithm which we are putting in place in order to return interesting up-to-the-minute results around search terms or, in our case, around the archived page. You can see an example of this twitter search service here:!twitter/en/&action=Wikipedia. The basic idea is that we want to show users material that is recent and fresh around their search term of interest or related to the archived page. We’d like to make it clear that we don’t make any money on it. The feature was just meant to enhance the cache service we provide to the French speaking Wikipedia.

Well, what happened is that the first of these three top Results in the recent web led actually to a football site (site apparently somehow related for some reason to the archived page, as determined by our twitter-search algorithm), site on which were displayed sexy ads.

Thus, dogged by bad luck (Wikipedia -> Wikiwix archive -> somehow related tweeted link -> football site -> sexy ad), our unfortunate user reached content not related with Wikipedia, and certainly inappropriate.

We feel sorry about that. We feel of course all the more concerned that, beside this collaboration with the French Wikipedia on the archiving and search engine system, we also provide some search engine services to Vikidia, a Wikipedia like encyclopaedia intended for children from 8 to 13 years old!!! :-S You probably understand now how much we feel concerned by possible problems of this nature (however, I’d like also to remind the possibility to install a parental control software; see the Wikipedia article Parental controls and its external links for more information).

We are working on a way to improve our algorithm so that it doesn’t show results that could lead to inappropriate content. In the meantime, we have disabled the feature.

If you have any comment, feel free to leave a message on our blog.

Take care 🙂 Matthieu.

Linterweb is a web company that, for now several years, has been developing various Wikipedia oriented programs, including:

  • Wikiwix, a semantic web search engine that gives only results out of the databases of the Wikimedia Foundation projects; My Wikiwix, your own search engine for your own website;, a mobile version of Wikiwix;
  • Okawix, the offline Wikipedia browser free of copyrights and free of charge that allows you to read offline the articles of the various Wikimedia Foundation projects, as well as archives of your own website;
  • a DVD of around 2000 articles from the English speaking Wikipedia; a USB flash drive that contains the version 0.7 of the English speaking Wikipedia;
  • a program that archives the external web pages of the Wikipedia articles (that is, the web pages outside Wikipedia but linked from a Wikipedia article), so that their content remains available and that those external links don’t get broken; this program is used automatically, in particular, for all external links of the French speaking Wikipedia.

Powered by WordPress