solr拾遗：CopyField

公告

Posted in solr on 十月 30th, 2010 by kafka0102

solr的index schema中，除了支持基本数值类型的field，还支持一些特别的field，比如较常用的CopyField。以下面的schema配置片断为例：

<span style="color: #009900;"><span style="color: #000000; font-weight: bold;"><schema</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">"eshequn.post.db_post.0"</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">"1.1"</span></span> <span style="color: #009900;">    <span style="color: #000066;">xmlns:xi</span>=<span style="color: #ff0000;">"http://www.w3.org/2001/XInclude"</span><span style="color: #000000; font-weight: bold;">></span></span>     <span style="color: #009900;"><span style="color: #000000; font-weight: bold;"><fields<span style="color: #000000; font-weight: bold;">></span></span></span>     	<span style="color: #808080; font-style: italic;"><!-- for title --></span>        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;"><field</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">"t"</span> <span style="color: #000066;">type</span>=<span style="color: #ff0000;">"text"</span> <span style="color: #000066;">indexed</span>=<span style="color: #ff0000;">"true"</span> <span style="color: #000066;">stored</span>=<span style="color: #ff0000;">"false"</span> <span style="color: #000000; font-weight: bold;">/></span></span>        <span style="color: #808080; font-style: italic;"><!-- for abstract --></span>        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;"><field</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">"a"</span> <span style="color: #000066;">type</span>=<span style="color: #ff0000;">"text"</span> <span style="color: #000066;">indexed</span>=<span style="color: #ff0000;">"true"</span> <span style="color: #000066;">stored</span>=<span style="color: #ff0000;">"false"</span> <span style="color: #000000; font-weight: bold;">/></span></span>        <span style="color: #808080; font-style: italic;"><!-- for title and abstract --></span>        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;"><field</span> <span style="color: #000066;">name</span>=<span style="color: #ff0000;">"ta"</span> <span style="color: #000066;">type</span>=<span style="color: #ff0000;">"text"</span> <span style="color: #000066;">indexed</span>=<span style="color: #ff0000;">"true"</span> <span style="color: #000066;">stored</span>=<span style="color: #ff0000;">"false"</span> <span style="color: #000066;">multiValued</span>=<span style="color: #ff0000;">"true"</span><span style="color: #000000; font-weight: bold;">/></span></span>    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;"></fields<span style="color: #000000; font-weight: bold;">></span></span></span>    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;"><copyField</span> <span style="color: #000066;">source</span>=<span style="color: #ff0000;">"t"</span> <span style="color: #000066;">dest</span>=<span style="color: #ff0000;">"ta"</span> <span style="color: #000000; font-weight: bold;">/></span></span>    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;"><copyField</span> <span style="color: #000066;">source</span>=<span style="color: #ff0000;">"a"</span> <span style="color: #000066;">dest</span>=<span style="color: #ff0000;">"ta"</span> <span style="color: #000000; font-weight: bold;">/></span></span> <span style="color: #009900;"><span style="color: #000000; font-weight: bold;"></schema<span style="color: #000000; font-weight: bold;">></span></span></span>

字段t是文章的标题，字段a是文章的摘要，字段ta是文章标题和摘要的联合。添加索引文档时，只需要传入t和a字段的内容，solr会自动索引ta字段。这算不上多高级的功能，不过如果让你来实现这个功能，你会怎么做呢？我接手的搜索系统原来就有类似的功能，它的做法是，将t和a字段的文本合并，塞到ta 字段，无可厚非的做法。不过，有人注意到lucene的Document类提供的public final Field[] getFields(String name)类似函数不？也就是说，lucene中的一个name可以对应多个Field。solr在添加索引时，会检查field name是不是copyField集合中的source，是的话就以其value构造dest field。如果dest由多个source构成，就需要将其指定为multiValued。

对于查询来说，如果查询字段要来自多个字段，一种选择是使用CopyField，化多个字段为一个字段，缺点是不能区分各个字段的重要度差别。比如文章的标题和摘要，标题就要比摘要重要性更强，如果有这方面的要求，可以选择查询多个字段的做法。

=============================== 华丽的终止符 ================================

本文作者：kafka0102，转载文章请注明来源，谢谢！！
本文链接：http://www.kafka0102.com/2010/10/394.html

浙江省高等学校教师教育理论培训

公告

相关日志