Google

Note: since I wrote this Google have changed their service to give a location-specific result by default; in most locations my home page is not now in the top 3 for the search term "hamish" (though it still is on google.co.uk/search?q=hamish and sometimes is on google.com/search?q=hamish).

Note 2: and more recently still, your Google results are likely to be tailored to you specifically — so you can expect to see your great Uncle Hamish way up there instead of little old me...

Google made two very significant technical advances. First, they realised (along with some others at the time) that every hyperlink on the Web is a judgement of the relevance of a particular page to a query. For example, if I insert the link Hamish's home page, I have in effect made a comment about the relevance of my home page to queries that contain the words "hamish", "home" and "page". I've also made it more likely that a random walk of the web would reach my page and, if this page itself becomes important, then it will in turn lend more importance to the pages it links to. Combine these insights with normal methods in Information Retrieval, which measure the relative salience of terms against large document collections, and you have the essence of Google's search algorithm.

Their second innovation was to make the collection and query of the models that they build from the Web fully distributed. That is, there is no single point that has to work in order for your query to succeed — instead there are thousands of machines working in parallel, and when one breaks nobody really cares. This has allowed Google to scale their system rapidly to cope with the truly enormous processing load that they now support.

There were other reasons why Google became the pre-eminent search engine (for example, the fast-loading, simple design of their front page), but from a technical point-of-view the key to the story is links as relevance judgements and scaleability via distributed algorithms.

There are advantages and disadvantages, of course. One of the disadvantages with using links is that relevance becomes a popularity contest. For example, if you type "hamish" into Google, my home page is usually one of the top three hits (but see note above about google.com etc.). This is nice for the rather small number of people who search for me, but doesn't necessarily reflect the global significance of one more nerd sitting in an office in Sheffield. It just means that (amongst other things) more people have linked to my page than to most others. Similarly, Google also probably privileges older material over newer, and my home page is, in Web terms ancient (birthday: January 15th 1995).