Tuesday, October 03, 2006

Beyond searching

Google Scholar may provide an easy way to search. However, with the constantly increasing quantity of scholarly data, Google Scholar will soon be facing a new challenge, as will database providers and metasearch systems: the comprehensive presentation of search results to the user.

The assumption underlying the implementation of relevance ranking and its use as a sorting order is that end-users will not scroll down and scan large amounts of data. Therefore, the results that are most likely to suit their research needs should appear at the top of the list. However, this sorting order has several drawbacks. As mentioned earlier, users have different research needs, and an item that is most relevant to one user may be less relevant to another.

Another problem with presenting search results in any type of linear list is that sometimes there are a great many results. Some users, particularly those who are novices, may not know how to define their queries effectively; however, once the system analyzes the set of results and provides options to narrow down the list, such users can easily drill down to the relevant subset of results.

Several companies have developed technologies that enable sites to cluster search results and offer drill-down options to end-users. One such company is Vivísimo, whose technology can be seen on the Web site of the Institute of Physics (IOP).

I am looking for information about the sine-Gordon equation. When I search the IOP Web site, the traditional display provides a list of 95 articles. However, I can opt to see the list clustered . As explained on the IOP site, "when you cluster your search results, you will find them presented (unchanged) on screen alongside folders representing the clusters generated. The folders are sorted according to the number of search results in each, and according to the overall rank of the individual search results in the search engine's output". I can select any of these topic clusters, thus narrowing down my list of results, and I can drill down even further and see only the results for a particular subtopic. In our example, I quickly identify "soliton" as the topic of interest, thus decreasing the number of relevant results to 35; and if I am seeking information about magnetic fields, I can drill down further to the "magnetic fields" subtopic and see a list of four records. Note, however, that the Vivísimo IOP implementation clusters only the first 250 records.

Conclusions

Google Scholar is becoming the object of greater attention from libraries, patrons, and publishers, regardless of librarian approval. Depending on Google's plans, Google Scholar may turn into a core resource for researchers. Perhaps the library community should encourage patrons to use this search engine when appropriate and keep a watchful eye on the quality of the results.

Google's attentiveness to the library community, as evidenced by the rapid implementation of the OpenURL standard in Google Scholar, indicates that this service might well be evolving in the right direction. Nevertheless, it is not likely to replace metasearch systems in the short term. A locally controlled and branded system that enables librarians to offer accurate, up-to-date, subject-specific research data and to customize relevant services renders metasearch systems highly valuable to the scholarly community.

No comments: