jieba分词
1.引入依赖
<dependency>
<groupId>com.huaban</groupId>
<artifactId>jieba-analysis</artifactId>
<version>1.0.2</version>
</dependency>
2.分词工具类
package com.itcast.utils;
import com.huaban.analysis.jieba.JiebaSegmenter;
import com.huaban.analysis.jieba.SegToken;
import com.huaban.analysis.jieba.WordDictionary;
import java.io.*;
import java.nio.file.*;
import java.util.ArrayList;
import java.util.List;
public class JiebaAnalyzerUtil {
/*
*对语句进行分词
*@Param text 语句
*@return 分词后的集合
*@throws IOException
*/
public List segment(String text) throws IOException {
List<Object> strings = new ArrayList<>();
//dict.txt 自定义词典
String path = getClass().getClassLoader().getResource("dict.txt").getPath();
Path upath = Paths.get(new File(path).getAbsolutePath());
WordDictionary.getInstance().loadUserDict(upath);
JiebaSegmenter jiebaSegmenter = new JiebaSegmenter();
List<SegToken> process = jiebaSegmenter.process(text, JiebaSegmenter.SegMode.SEARCH);
for (SegToken segToken : process) {
String word = segToken.word;
strings.add(word);
}
return strings;
}
}
3.测试
public static void main(String[] args) throws IOException {
String str= "亲爱的请帮忙推荐一个稳健型-理财基金1期封闭式净值型产品";
List<String > segment = new JiebaAnalyzerUtil().segment(str); System.out.println(segment); }
4.未指定分词效果
[亲爱, 的, 请, 帮忙, 推荐, 一个, 稳健, 型, -, 理财, 基金, 1, 期, 封闭式, 净值, 型, 产品]
5.指定分词:
dict.txt中输入:
亲爱的 3 n 稳健型 3 n 理财基金1期 3 n 净值型产品 3 n
6.指定分词效果:
[亲爱的, 请, 帮忙, 推荐, 一个, 稳健型, -, 理财基金1期, 封闭式, 净值型产品]

浙公网安备 33010602011771号