随笔 - 3  文章 - 4  评论 - 0 

Search Service Components

1.Index Engine

  When you sepecify a content source such as a folder, a web site, Exchange folders or Business Data, the Index Engine is responsible for parsing those contents and placing the text results in the content index and the filtered properties into the Property Store.

2.Query Engine

  Query all desired content is indexed, the Query Engine is responsible for allowing consumption of that data by returning results based on client's queries. The Query Engine accepts keyword queries as well as SQL syntax queries. From a query furnished to it by a web part, custom code, or other client, it will retrieve the results from the indexed content and return it to that client.

3.Protocol Handlers

  Some content sources store properties that are not native to Microsoft so a Protocol Handler must be created to be able to view inside those documents. Once the Protocol Handler is created to view the internal contents, the Index Engine can take those properties and populate them into the content index or Property Store.

4.IFilters

  IFilters work in conjunction with Protocol Handlers. The Protocol Handler provides the means to get into the doc, and it is the IFilter that is responsible for taking that "content" and parsing it for ites text and properties.

5.Content Index

  The content index is essentially a data store. It is a binary data source that contains the information about words and their location in a content item.

6.Property Store

  The Property Store is also a data store that stores a table of name/value pairs.

7.Search Configuration Data

  Stores information used by the Search service, including crawl configuration, property schema,scope,and so on.

8.Wordbreakers

  Used by the query and index engines to break compound words and phrased into individual words or tokens.

9.Content Crawling

  The index engine uses a pipe of shared memory to request that the Filter Daemon begin filtering the content source. For the crawl process to succeed, the content source must have an associated protocol handler that can read its protocol. The Filter Daemon invokes the appropriate protocol handler for the content source base on the start address provided by the index engine. The Filter Daemon uses protocol handlers and IFilters to extract and filter individual items from the content source.

   Appropriate IFilters for each document are applied, and the Filter Daemon passes the extract text and metadata to the index engine through the pipe.

   At this point in the content crawling process, the index engine saves document properties to a property store separeate fom the conten index. The Property store consists of a table of properties and their values. Properties in this store can be retrieved and stored. In addition, simple queries against properties are supported by the store. Each row in the table corresponds to a separate document in th full-text index. The actual text of a content item is stored in the content index, so it can be used for content queries. The property store also maintains and enforces document-level security that is gathered when a docuement is crawled.

   At this point, the index engine uses wordbreakers and stemmers to further process the text and properties picked up during the crawl. The wordbreakers component is used to break the text into words and phrased. The stemming component is used to generate inflected forms of a given word. The index engine also removes noise word and creates an inverted index for full-text searching.

10.Search Query Execution

   When a search query is executed, the query engine passes the query through a language-specific wordbreaker. If there is no wordbreaker for the query language, the neutral wordbreaker is used, which does whitespace-style wordbreaking, which means that the wordbreaking occurs where there are whitespaces in the words and phrases. After wordbreaking, the resulting words are passed through a stemmer to generate language-specific inflected forms of a given word. The use of wordbreaker and stemmer in both the crawling and query processes enhances the effectiveness of search because more relevant alternatives to a user's query phrasing are generated. When the query engine executes a property value query, the index is checked first to get a list of possible matches. The properties for the matching documents are loaded form the property store, and the properties in the query are checked again to ensure that there was a match. The result of the query is a list of all matching results, ordered according to their relevance to the query words. If the user does not have matching results, ordered according to their relevance to the query words. If the user does not have permission to a matching document, the query engine filters that document out of the list that is returned.  

Tag标签: MOSS
posted on 2008-09-14 09:03 ferlysky 阅读(5) 评论(0)  编辑 收藏 网摘 所属分类: MOSS

标题  
姓名  
主页
Email (博主才能看到) 
验证码 *  看不清,换一张 [登录][注册]
内容(请不要发表任何与政治相关的内容)  
  登录  使用高级评论  新用户注册  返回页首  恢复上次提交      
Google站内搜索

相关文章:


相关搜索:
MOSS

相关链接: