Microsoft Academic Search – A Review with Comparisons - take it and go

Microsoft Academic Search – A Review with Comparisons

An Overview

Microsoft Academic Search (MAS) is a free search engine for academic papers and resources principally in the field of computer science, developed by Microsoft Research Asia, Beijing. The database consists of the bibliographic information (metadata) for academic papers published in journals, conferences proceedings and the citations between them.

Specific Features

Computer Science Directory
Microsoft Academic Search considers Papers, Authors, Conferences, Journals and Organizations as different types of objects in the research literature. These objects are categorized in the Computer Science Directory, which contains rank lists for Papers, Authors, Conferences, Journals or Organizations within the 24 research fields of Computer Science. In each rank list, year filter is available to sort out key new scholars.

Object Detail Page
In Microsoft Academic Search, each paper, author, organization, conference or journal has a profile page, which shows detail information of each object. In the case that there exists matching entity to the search query, MSAS will automatically return an object detail page.

User Edit functions
Microsoft Academic Search is open for users to edit the content. Users can make corrections or updates to author profile, publication list or paper information directly online. It also allows users to update publication list by uploading PDF or BibTex document.

Call for Papers Calendar
The Call for Papers Calendar shows paper submission deadlines of a conference, as well as the location and dates of the event. Users can choose the conferences they're interested in, created their own conference list and view all related information in the calendar.

Co-author Network/ Path
The Visual Explorer page presents the co-author relationship among scholars. On the Co-author Graph UI, each node represents an author, and a bigger node means the author has more publications. The more papers two authors write together, the closer their nodes are positioned. The graph also contains other information such as paper count and basic information of each author, number of co-written papers between two authors, etc. The Co-author Path enables users to discover the relationship path between two scholars. The name of each side can be changed in order to perform different searches. Both graphs can be embedded to other websites.

Reviews

We have five categories of criterion to evaluate an academic search engine, respectively based on its performance on knowledge discovery, socialization, user experience, technical quality of services and extensibility.

1. Knowledge Discovery

- Keyword Mining

The coverage of the database of MAS is much narrow than some popular search engine, such as Google Scholar. The most of articles are coming from the field of Computer Science. For two toy keywords "Map Reduce", MAS got 1,485 results obviously losing to 2,710,000 results that Google got. It is driven by the object-level vertical search technology. Objects are ranked according to two factors: their relevance to the query, which is computed by its attributes; and their global importance, calculated by its relationships with other objects. However, the ranking algorithm of MAS is not that satisfied. Nature language processing technology of MAS is poor. The relevance between papers and query has been obviously underestimated. For keyword "click model", which is a proper term for a concept in the field of information retrieval (IR), only 6 documents of the top 10 search results are actually related to "click model" concept in IR, as the list below shows.

Results	Relevant
NP-Click: A Programming Model for the Intel IXP1200 (Citations: 48)	N
A dynamic bayesian network click model for web search ranking (Citations: 12)	Y
Click chain model in web search (Citations: 3)	Y
Incorporating post-click behaviors into a click model	Y
Incremental click-stream tree model: Learning from new users for web page prediction	N
Analysis of FM Systems with Co-Channel Interference Using a Click Model	N
Temporal click model for sponsored search	Y
A novel click model and its applications to online advertising	Y
The One-Click Grid-Resource Model	N
Pay-per-action Model for Online Advertising (Citations: 7)	Y

Even the first result is totally irrelevant because the search engine just simply regards the query as two separated words and ranks the document almost by the citation number. Because the citation number has been over weighted by MAS, MAS does not tend to rank the newest documents onto the top. So it may be harder to find what's recently going on in the academic field by using MAS.

- Topic/Category Mining

MAS maintain categories' list of its database, while Google Scholar does not. It considers Papers, Authors, Conferences, Journals and Organizations as different types of objects in the research literature.
As they claim, MAS provides the almost complete and detailed categories, and the classifying process is automatic based on object detection technology proposed by Microsoft Research Asia.

MAS created a page for each object such as conference or paper, where shows all the detailed information about the objects. This feature gives a best way to understand the current academic situation and tendencies, while other services don't have such useful tools.
For example, the below figures shows a page for a researcher in MAS's database, where MAS provide his basic information, such as name, and local name, the affiliation, and the statistical information such as publication number, citation number, H-index and G-index, also gives his possible interests, which are generated by machine learning algorithm. MAS further plots the changing of one's publication and citation number which reflects the extent of someone's academic activity. It is awesome that all information in such object page is automatically generated by machines within few of artificial operations.

- Knowledge Network

For a particular paper, the publications that this paper referred to and which cite this paper are both listed in the page of that paper, which is a basic function of academic search engine. However, to my surprise, MAS sometimes give the citation context, which is an intimate feature for users.

The disadvantage of MAS in knowledge network mining is a lack of the feature "related works to a paper". Only all related works have been linked by citation. Thus, "related works to a paper" is indeed helpful if the user wants to collect all literatures in a narrow field. This feature has been implemented by Google Scholar with its powerful experience on document relevance and ranking.

-Accessibility

User can easily access the full text of the paper after the paper she wants has been found. If the corresponding PDF file is free accessible, the download button will be presented to the right side of the title of each search result as the figure below shows.

On the particular paper page, MAS also lists all accessible sources of PDFs as the figure below shows.

I think that MAS do an excellent job in helping the user accessing academic materials given the search results.

2. Socialization

- Academic Social Network Mining

Google seems too proud to improve its social skills, while other company such as ArnetMiner and MAS actively "socialize" their services, which seem to meet the trend of the modern world.
In MAS, the Visual Explorer is such a tool that it visualize the so-called co-author graph if we consider the scholars as nodes and their coauthor relationships as edges. It provides two view mode, "coauthor graph" mode and "coauthor path". The former moe centers the target person, and shows the links to his coauthor by the distance which is based on how frequently they co-worked. See Figure on the left which shows S.J. Shenker is really an academic social activist. The later mode shows the several social paths from one target person to another target person. See Figure on the right which tells us our teacher Xiaoming Sun's Erdos Number is 3 and he is better to cooperate other scholar to improve his Erdos Number. Visual Explorer impressed me with tidy user interface and fancy animation which are built on Silverlight.

While its competitor Arnetminer extend this feature further by showing advisor-advisee relationships so that users can easily find who is the student of whom and infer how academic affiliation and faction grows.

- Author Information - Name disambiguation Both MAS and Arnetminer provide name disambiguation to identify the different authors with the same name. But I think user interface of this feature in MAS is much better than ArnetMiner. I even do not tell that this is a name disambiguation service in user interface of the later, where it just lists all the results as the web search engine does. - User Participation User can participate in the process of enriching MAS's databases. Users can edit the most of content (See Figure below). Users can make corrections or updates to author profile, publication list or paper information directly online. It also allows users to update publication list by uploading PDF or BibTex document. User Experience

- User Interface

MAS's UI is obviously better than ArnetMiner and Google Scholar. It's well-colored, tidy, clear, graceful, and also consistent with Bing's style. To be honest, it's more efficient for users to browse the search results in Google Scholar's UI with Google's Minimalism Philosophy. But I prefer the good balance between good-looking and efficiency as MAS did.

- Responding Speed

MAS require users to install SiverLight while ArnetMiner requires users to install Flash. Those plugins will increase the latency of page loading significantly.

In order to test the latency of the page loading empirically, we adopt http://www.webpagetest.org as our testing tools. The results shows Google Scholar has definitely the shortest latency. Both MAS and ArnetMiner are extremely slow (10s to 20s) in the first time of page loading. But since the script of ArnetMiner occupies the fraction of the loading time, MAS is better than ArnetMiner on the second time of page loading.

Google Scholar for Searching "Click Model"

ArnetMiner for Searching "Click Model"

MAS for Searching "Click Model"

Content Break down for Google, ArnetMiner, and MAS.

3. Technical Quality of Service

- APIs

MAS does not provide a public API, while Google does not either. Meanwhile, ArnetMiner provides the Restful API for third-party developer.

-Coverage & Update Rate

On MAS's homepage, it is claimed that it covers more than 8 million publications and 5.7 million authors as of March 2011, with weekly update. I have two papers published last year in the proceedings of CIKM 2010. Six months is passed after CIKM, MAS still has not indexed my name "Botao Hu". Google Scholar updated its databases with my information after only one week after the conference ended. Obviously, MAS indexing update rate is not satisfied by me compared with Google Scholar.

Summary

Based on above reviews, we see MAS did less works in search quality but more works in fancy stuff. So I would like to suggest Microsoft Academic Search to improve its performance as following:

Improve the search relevance and ranking algorithm.
Improve updating rate of indexing
Broaden the indexing papers field besides Computer Science.
More socialized. For example, provide "likes" button as facebook did. With such social voting information, MAS can do some paper recommendation to users.
Decrease the latency of page loading.
Search suggestion (auto-complete)

Author: Amber

posted on 2011-03-12 01:28 take it and go 阅读(1513) 评论(0) 收藏举报

刷新页面返回顶部

take it and go