Wikipedia & Linterweb

15 December 2010

Wikiwix indexes tweeted web pages

Filed under: wikiwix — Matthieu @ 18:40

Dear all,

Wikiwix, the semantic search engine run by the web company Linterweb that, so far, had been giving only results out of the databases of the Wikimedia Foundation projects, indexes henceforth as well tweeted web pages.

The idea is very simple: when a tweet contains a link to a web page, the content of this web page gets indexed in real time, and the web page is displayed immediately amongst search results. Thus, Wikiwix indexes in real time about 150 pages per second, corresponding to the around 150 “Tweets with links” that are tweeted on average every second on Twitter.

Example: Let’s assume that you are searching for web pages relevant to the search term “Wikileaks”. The two last results (that is, the two last web pages to which a tweet refers to, and that are relevant to the search term “Wikileaks”) are displayed anti-chronologically (that is, the last tweeted pages are shown at the top of the result list, in a Last In First Out order) at the top of the result page, just over the classical results of the search, in a panel called “Results on the recent web”. For each result, the following information is provided:

  • Title: The first line of each result shows the title of the tweeted web page. Click this title to reach that web page.
  • Snippet: Below the title is a short description of the web page, sometimes an excerpt of text from the web page. This helps you to decide easily if the result really fits your search.
  • URL: The web address of the web page appears in green.
  • The amount of time passed, since the tweet of the page (in brackets). Unlike Google News, that indicates the time passed since indexing of the web page (sometimes hours after the publication of the page! That is, often slower than Wikiwix 🙂 ).

Then click on the “Plus” symbol, at the upper right corner of this result panel, to open a web page with more recent results (not only the two last ones of the list). You now see the whole list of results in the recent web with, for each result, the same information as just described (Title, Snippet, URL, Amount of time passed) and, in addition, the tweet, and a link to the tweet in which the web page has been posted.

Up til now, Wikiwix had been indexing fundamental articles, for a given subject, background articles of the various Wikimedia Foundation projects. Wikiwix will, from now on, index in addition the pages of the recent web. Both functionalities that, being used simultaneously, will complement each other wonderfully.
This new feature is already available in German, English, Spanish, French and Dutch.

We would be happy to read your comments, ideas, suggestions…

Take care 🙂 Matthieu.

Linterweb is a web company that, for now several years, has been developing various Wikipedia oriented programs, including:

  • Wikiwix, a semantic web search engine that gives only results out of the databases of the Wikimedia Foundation projects; My Wikiwix, your own search engine for your own website;, a mobile version of Wikiwix;
  • Okawix, the offline Wikipedia browser free of copyrights and free of charge that allows you to read offline the articles of the various Wikimedia Foundation projects, as well as archives of your own website;
  • a DVD of around 2000 articles from the English speaking Wikipedia; a USB flash drive that contains the version 0.7 of the English speaking Wikipedia;
  • a program that archives the external web pages of the Wikipedia articles (that is, the web pages outside Wikipedia but linked from a Wikipedia article), so that their content remains available and that those external links don’t get broken; this program is used automatically, in particular, for all external links of the French speaking Wikipedia.

Powered by WordPress