IK Analyzer的Java代码实现(包括reset错误问题)

下面代码网上搜的，是IK的Java代码实现：报错了。
import java.io.IOException;  
import java.io.StringReader;  
import org.apache.lucene.analysis.Analyzer;  
import org.apache.lucene.analysis.TokenStream;  
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;  
import org.wltea.analyzer.lucene.IKAnalyzer;

public class getIKAnalyzer {
    public static void main(String[] args) throws IOException {
         String text="基于java语言开发的轻量级的中文分词工具包";  
            //创建分词对象  
            Analyzer anal=new IKAnalyzer(true);       
            StringReader reader=new StringReader(text);  
            //分词  
            TokenStream ts=anal.tokenStream("", reader);  
            CharTermAttribute term=ts.getAttribute(CharTermAttribute.class);  
            //遍历分词数据  
            while(ts.incrementToken()){  
                System.out.print(term.toString()+"|");  
            }  
            reader.close();  
            System.out.println();
    }
}

Exception in thread "main" java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
    at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:110)
    at java.io.Reader.read(Reader.java:123)
    at org.wltea.analyzer.core.AnalyzeContext.fillBuffer(AnalyzeContext.java:124)
    at org.wltea.analyzer.core.IKSegmenter.next(IKSegmenter.java:122)
    at org.wltea.analyzer.lucene.IKTokenizer.incrementToken(IKTokenizer.java:78)
    at service.getIKAnalyzer.main(getIKAnalyzer.java:20)

是官方jar包问题,调用TokenStream API的流程必须是：

Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.
The consumer calls reset().
The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
The consumer calls incrementToken() until it returns false consuming the attributes after each call.
The consumer calls end() so that any end-of-stream operations can be performed.
The consumer calls close() to release any resource when finished using the TokenStream.

只需要，重置下就OK了

public static void main(String[] args) throws IOException {
         String text="基于java语言开发的轻量级的中文分词工具包";  
            //创建分词对象  
            Analyzer anal=new IKAnalyzer(true);       
            StringReader reader=new StringReader(text);  
            //分词  
            TokenStream ts=anal.tokenStream("", reader); 
　　　　　　　ts.reset(); 
            CharTermAttribute term=ts.getAttribute(CharTermAttribute.class);  
            //遍历分词数据  
            while(ts.incrementToken()){  
                System.out.print(term.toString()+"|");  
            }  
            reader.close();  
            System.out.println();
    }

posted on 2014-09-07 15:49 HI,你的蚊香阅读(515) 评论(0) 收藏举报