►◄ Reverse Zone
 

Home

About
Reverse Zone, weblog on urban planning, sustainability, and technology.

Martin Laplante

Subscribe
to an RSS feed of this weblog.

Links
A few favourite links.

Recent posts

 2008/04
 2008/03
 2008/02
 2008/01
 2007/12
 2007
 2006
 2005
 Complete List of Posts

Technorati Profile
Add to Technorati Favorites

Real Estate Top Blogs

Sustainability Web Ring 
control panel

     
Tue, 08 Jan 2008

Wikia Search Launches, Minus the Unique Features

Wikia Search, the latest Google Killer has launched in Alpha, with much hype. What launched seems to have none of the features that the hype is about. Wisdom of crowds? Social driven social search? Not there in the alpha as far as I can tell.

But what is there is interesting. They recycle some well-known components. Good old Grub, a distributed crawler that is now apparently open source, one that is so annoying that webmasters regularly ban it from their web sites, and Lucene/Nutch, a relatively unsophisticated open source search engine. Ho hum, just another amateur search engine start up. But Wikia does some unique things which I quite appreciate. It lets you download the source code for the search engine. And for every search, it lets you peek at most of the calculations and weights that result in the ranking of the web pages.

The algorithm is pretty standard tf-idf stuff. But it tells you the term frequencies and the document frequencies it is using. For instance on one page of one of my sites, it had document frequencies like "24", while the ranking of a different site was based on tens of thousands of documents for the same term. It tells you all the factors it considers and all of the weights and exponents. So for instance the tf-idf score of search terms found in the title is raised to the power of 1.5, while the weight in the url is raised to an amazing power of 4, and another power of 2 for the keyword in the hostname.

Now this "explain" facility does not explain the entire entire ranking. There are some unexplained differences between the explained and the actual ranking and some ability for community members to participate. Sounds interesting. When I look at the participation so far, it seems pretty idiosyncratic. Lots of open source type sites receive favourable bias. The input is signed, including Jimmy Wales. I decided to give a boost to the site of a complete stranger whose site ranks poorly and looks terrible but has good content, just for fun.

Google Killer? Not by a long shot. I wouldn't trust a search engine that is so easy for people like me to manipulate. The algorithms are still too rudimentary to be used in public. It doesn't have the basic protection against SEO techniques and I'm not sure that relying on people with time on their hands to manually re-rank queries is a reliable and scaleable solution. Still, it gives some interesting insights into why some sites rank highly in other search engines.

Tags:

[] permanent link Comments: 0