学习Lucene.net（一）

首先阅读了yufenfei和birdshover发布的内容来学习，这里纯粹作为记录下自己的学习过程，怕以后再遇到忘了。

yufenfei：http://yufenfei.iteye.com/

birdshover：http://www.cnblogs.com/birdshover/

deepfuture:http://deepfuture.iteye.com/blog/573707

hongfu_:http://blog.csdn.net/hongfu_/article/details/1933366

1.Directory

相当于一个书架，书架上可以放很多书（Document），每本书中有很多章节（Field），可以根据Field的内容来查找符合条件的Document。

Directory常见的几种：

RAMDirectory

RAMDirectory是将即将索引的数据资源存在在内存中而不是磁盘中，这使得文件的读写操作非常的迅速

缺点：因为在内存中，所以在程序退出后索引数据就不存在了。
FSDirectory
子类包括：FileSwitchDirectory, FSDirectory, RAMDirectory
相应的实现类的目录的文件系统存储索引文件在Windows。应该是将索引生成文件记录起来。
没细看，以后研究。
FileSwitchDirectory

Directory实例，交换机两个其他目录的实例文件。具有指定扩展名的文件都放置在主目录，其他都是次要目录中。所提供的设置不能更改一次传递给这个类，并且必须允许多个线程调用一次包含。
没细看，以后研究。

2.在声明Directory后，声明一个Analyzer，文本分析器常用的有一下几种，摘自：yufenfei的ITeye

1、WhitespaceAnalyzer

仅仅是去除空格，对字符没有lowcase化,不支持中文；

并且不对生成的词汇单元进行其他的规范化处理。

2、SimpleAnalyzer

功能强于WhitespaceAnalyzer, 首先会通过非字母字符来分割文本信息，然后将词汇单元统一为小写形式。该分析器会去掉数字类型的字符。

3、StopAnalyzer

StopAnalyzer的功能超越了SimpleAnalyzer，在SimpleAnalyzer的基础上增加了去除英文中的常用单词（如the，a等），也可以更加自己的需要设置常用单词；不支持中文

4、StandardAnalyzer

英文的处理能力同于StopAnalyzer.支持中文采用的方法为单字切分。他会将词汇单元转换成小写形式，并去除停用词和标点符号。

3.Document。使用IndexWriter和IndexSearcher，来分别完成对Document的保存和查询操作。

IndexWriter构造函数包括：Directory，Analyzer，bool（表示是否存在，不存在创建它），IndexDeletionPolicy（已知实现类：KeepOnlyLastCommitDeletionPolicy, SnapshotDeletionPolicy），IndexWriter.MaxFieldLength（最大长度）

IndexSearcher：应用程序通常只需要调用继承的Searcher.search(Query,int)或Searcher.search(Query,Filter,int)方法。出于性能原因，建议只开一个IndexSearcher和使用你的搜索一切。实例化时默认传Directory就可以了。

4.Query。当你想要通过IndexSearcher查询Document的时候，你就需要构筑一个Query来帮助你完成查询。Query的实现有2种方法：

指定声明一种Query的子类，如：

Term tr = new Term("fieldname", "text");
TermQuery query = new TermQuery(tr);
使用QueryParser，如：

QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "fieldname", analyzer); //查询的列
Query query = parser.Parse("text"); //对应的值包括"text"
以上2种的结果是一样的，分别适用于什么养的场景我还不知道。

5. 代码示例：

　　　　 Directory directory; 
　　　　 private void button5_Click(object sender, EventArgs e)
        {
            Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

            // Store the index in memory:
            directory = new RAMDirectory();
            // To store an index on disk, use this instead:
            //Directory directory = FSDirectory.open("/tmp/testindex");
            IndexWriter iwriter = new IndexWriter(directory, analyzer, true, new IndexWriter.MaxFieldLength(25000));
            Document doc = new Document();
            System.String text = "This is the text to be indexed.";
            doc.Add(new Field("fieldname", text, Field.Store.YES, Field.Index.ANALYZED));
            iwriter.AddDocument(doc);
            iwriter.Close();


        }

        private void button6_Click(object sender, EventArgs e)
        {
            Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);   //建立文本分析器（  StopAnalyzer的功能超越了SimpleAnalyzer，在SimpleAnalyzer的基础上增加了去除英文中的常用单词（如the，a等），也可以更加自己的需要设置常用单词；不支持中文，StandardAnalyzer 英文的处理能力同于StopAnalyzer.支持中文采用的方法为单字切分。他会将词汇单元转换成小写形式，并去除停用词和标点符号。）
            // Now search the index:
            IndexSearcher isearcher = new IndexSearcher(directory, true); // read-only=true      //搜索索引的工具
            // Parse a simple query that searches for "text":
            QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_CURRENT, "fieldname", analyzer);    //查询的列
            Query query = parser.Parse("text"); //对应的值包括"text"
            //Term tr = new Term("fieldname", "text");
            //TermQuery query = new TermQuery(tr);
            ScoreDoc[] hits = isearcher.Search(query, null, 1000).ScoreDocs;    //查询结果
            Assert.AreEqual(1, hits.Length);    //判断2个值是否相等，失败会显示一条信息
            // Iterate through the results:
            for (int i = 0; i < hits.Length; i++)
            {
                Document hitDoc = isearcher.Doc(hits[i].Doc);
                Assert.AreEqual(hitDoc.Get("fieldname"), "This is the text to be indexed.");
            }
            isearcher.Close();
            directory.Close();
        }

posted on 2015-08-27 14:48 一個人過得快樂阅读(186) 评论(0) 收藏举报

刷新页面返回顶部

学习Lucene.net（一）

导航

公告