IK Analyzer的Java代码实现(包括reset错误问题)
下面代码网上搜的,是IK的Java代码实现:报错了。
import java.io.IOException; import java.io.StringReader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.wltea.analyzer.lucene.IKAnalyzer; public class getIKAnalyzer { public static void main(String[] args) throws IOException { String text="基于java语言开发的轻量级的中文分词工具包"; //创建分词对象 Analyzer anal=new IKAnalyzer(true); StringReader reader=new StringReader(text); //分词 TokenStream ts=anal.tokenStream("", reader); CharTermAttribute term=ts.getAttribute(CharTermAttribute.class); //遍历分词数据 while(ts.incrementToken()){ System.out.print(term.toString()+"|"); } reader.close(); System.out.println(); } }
Exception in thread "main" java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow. at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:110) at java.io.Reader.read(Reader.java:123) at org.wltea.analyzer.core.AnalyzeContext.fillBuffer(AnalyzeContext.java:124) at org.wltea.analyzer.core.IKSegmenter.next(IKSegmenter.java:122) at org.wltea.analyzer.lucene.IKTokenizer.incrementToken(IKTokenizer.java:78) at service.getIKAnalyzer.main(getIKAnalyzer.java:20)
-
Instantiation of
TokenStream/TokenFilters which add/get attributes to/from theAttributeSource. -
The consumer calls
reset(). -
The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
-
The consumer calls
incrementToken()until it returns false consuming the attributes after each call. -
The consumer calls
end()so that any end-of-stream operations can be performed. -
The consumer calls
close()to release any resource when finished using theTokenStream.
只需要,重置下就OK了
public static void main(String[] args) throws IOException { String text="基于java语言开发的轻量级的中文分词工具包"; //创建分词对象 Analyzer anal=new IKAnalyzer(true); StringReader reader=new StringReader(text); //分词 TokenStream ts=anal.tokenStream("", reader);
ts.reset(); CharTermAttribute term=ts.getAttribute(CharTermAttribute.class); //遍历分词数据 while(ts.incrementToken()){ System.out.print(term.toString()+"|"); } reader.close(); System.out.println(); }
浙公网安备 33010602011771号