solr 之 synonyms.txt stopwords.txt

前提:索引和搜索都会加Factory

1. 如果是StandardTokenizerFactory  那么查询时,synonyms.txt只能配置单个词或者类似 植物 》 动物  但不能 ” 英雄 》植物“ 因为StandardTokenizerFactory   中文,默认就是字字分开,直接控制匹配度就行,要词分的话就用ik。

2. 当然对于WhitespaceTokenizerFactory ,那么” 英雄 》植物“ 是完全没有问题的(分词应该也没有问题哈)!!

3. 对于stopwords.txt 例如里面加一个“一”,那么搜索时都会将它忽略!!

4. 详解:

# blank lines and lines starting with pound are comments.  
#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings  
#ignore the expand parameter in the schema.  
#Examples:  
#-----------------------------------------------------------------------  
#some test synonym mappings unlikely to appear in real input text  
aaafoo => aaabar  
bbbfoo => bbbfoo bbbbar  
cccfoo => cccbar cccbaz  
fooaaa,baraaa,bazaaa  

# Some synonym groups specific to this example  
GB,gib,gigabyte,gigabytes  
MB,mib,megabyte,megabytes  
Television, Televisions, TV, TVs   
#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming  
#after us won't split it into two words.  
飞利浦刮胡刀,飞利浦剃须刀  

# Synonym mappings can be used for spelling correction too  
pixima => pixma  

a\,a => b\,b  
posted @ 2014-12-09 16:33  清晰-模块-组合-优化  阅读(2804)  评论(0)    收藏  举报