solr 之 synonyms.txt stopwords.txt

前提：索引和搜索都会加Factory

1. 如果是StandardTokenizerFactory 那么查询时，synonyms.txt只能配置单个词或者类似植物》动物但不能 ” 英雄》植物“ 因为StandardTokenizerFactory 中文，默认就是字字分开，直接控制匹配度就行，要词分的话就用ik。

2. 当然对于WhitespaceTokenizerFactory ，那么” 英雄》植物“ 是完全没有问题的（分词应该也没有问题哈）！！

3. 对于stopwords.txt 例如里面加一个“一”，那么搜索时都会将它忽略！！

4. 详解：

# blank lines and lines starting with pound are comments.  
#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings  
#ignore the expand parameter in the schema.  
#Examples:  
#-----------------------------------------------------------------------  
#some test synonym mappings unlikely to appear in real input text  
aaafoo => aaabar  
bbbfoo => bbbfoo bbbbar  
cccfoo => cccbar cccbaz  
fooaaa,baraaa,bazaaa  

# Some synonym groups specific to this example  
GB,gib,gigabyte,gigabytes  
MB,mib,megabyte,megabytes  
Television, Televisions, TV, TVs   
#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming  
#after us won't split it into two words.  
飞利浦刮胡刀,飞利浦剃须刀  

# Synonym mappings can be used for spelling correction too  
pixima => pixma  

a\,a => b\,b

posted @ 2014-12-09 16:33 清晰-模块-组合-优化阅读(2804) 评论(0) 收藏举报

刷新页面返回顶部

Refactoring

solr 之 synonyms.txt stopwords.txt

公告