Lucene 更新、删除、分页操作以及IndexWriter优化

更新操作如下：

注意：通过lukeall-1.0.0.jar 查看软件,我们可以看到,更新其实是先删除在插入, 前面我们知道索引库中有两部分的内容组成,一个是索引文件,另一个是目录文件, 目前我们更新, 只是真对Document, 但是Term部分并没有同步, 所以等会需要优化索引库即可

 1 // 把商品信息添加到索引库中
 2     public void updateGoods(Goods goods){
 3         // 创建一个indexWriter
 4         IndexWriter indexWriter=null;
 5         try {
 6             indexWriter=new IndexWriter(Configuraction.getDirectory(),Configuraction.getAnalyzer(),MaxFieldLength.LIMITED);
 7             /*
 8              * 先删除文档,在追加文档
 9              * Term: 删除文档的条件,一般来说我们都是根据主键删除
10              * doc:追加的新文档
11              * */
12             indexWriter.updateDocument(new Term("gid",goods.getGid().toString()),DocumentUtil.GoodsToDocument(goods));
13             // 提交
14             indexWriter.commit();
15         } catch (IOException e) {
16             e.printStackTrace();
17             throw new RuntimeException(e);
18         }finally{
19             try {
20                 // 更新操作的时候,原来的document已经删除,但是原来term并没有同步
21                 // 需要同步:otimize:让term与document同步
22                 indexWriter.optimize();
23                 indexWriter.close();
24             } catch (Exception e) {
25                 e.printStackTrace();
26                 throw new RuntimeException(e);
27             }
28         }
29 
30     }

注意：

注意更新操作和删除操作：都要进行optimize()方法优化,否则只会更新索引数据,不会更新索引目录. 以后可以把此操作放到独立的线程中定时执行

删除操作

 1     public void deleteGoods(int gid){
 2         // 创建一个indexWriter
 3         IndexWriter indexWriter=null;
 4         try {
 5             indexWriter=new IndexWriter(Configuraction.getDirectory(),Configuraction.getAnalyzer(),MaxFieldLength.LIMITED);
 6             indexWriter.deleteDocuments(new Term("gid",gid+""));
 7             // 提交
 8             indexWriter.commit();
 9         } catch (IOException e) {
10             e.printStackTrace();
11             throw new RuntimeException(e);
12         }finally{
13             try {
14                 indexWriter.optimize();
15                 indexWriter.close();
16             } catch (Exception e) {
17                 e.printStackTrace();
18                 throw new RuntimeException(e);
19             }
20         }
21 
22     }

分页查询

 1 public List<Goods> query(String gname,int currentPage){
 2         int size=5;  // 2*5--->10条
 3         List<Goods> goodsList=new ArrayList<Goods>();
 4         // 创建查询工具类
 5         IndexSearcher indexSearcher=null;
 6         try {
 7             QueryParser parse=new QueryParser(Version.LUCENE_30,"gname",Configuraction.getAnalyzer());
 8             // 解析要查询的关键字:返回的是Query类型
 9             Query query=parse.parse(gname);
10             indexSearcher=new IndexSearcher(Configuraction.getDirectory());
11             // indexSearch做查询操作: n 用户期望查询结果数,后面做分页使用
12             // 0 1 2 3 4 [5 6 7 8 9]
13             TopDocs topDocs=indexSearcher.search(query,currentPage*size);
14             /*
15              * TopDocs:
16              *     totalHits: 实际查询到的结果数
17              *     scoreDocs[]: 存储了所有符合条件的document 编号
18              * */
19             System.out.println("用户期待的数:" + currentPage*size);
20             System.out.println("索引库实际拥有结果数为:" + topDocs.totalHits);
21             // 存储的是document在lucenen中的逻辑编号
22             ScoreDoc[] docs=topDocs.scoreDocs;  //[0]=0 [1]=1
23             System.out.println("真实拿出来了文档编号数" + docs.length);
24             /*
25              * ScoreDoc:
26              *     doc: 文档逻辑编号
27              *     score: 当前文档得分
28              *
29              * */
30             for(int i=(currentPage-1)*size;i<docs.length;i++){
31                 //System.out.println("文档的编号:" + docs[i].doc);
32                 //System.out.println("此文档的得分:" + docs[i].score);
33                 // 通过文档的编号获取真正的文档
34                 Document doc=indexSearcher.doc(docs[i].doc);
35                 goodsList.add(DocumentUtil.DocumentToGoods(doc));
36             }
37         } catch (Exception e) {
38             e.printStackTrace();
39             throw new RuntimeException(e);
40         }
41         return goodsList;
42     }

分页的原理分析：

Search(query,10000),和Search(query,1)都不会影响topDocs.totalHits这个结果, 这个结果是Lucene索引库命中率的次数(也就是命中的Tearm的数量)那么可能会出现 totalHits大或者currentPage * PAGE_SIZE大的情况. 所以真正截取的时候,选最小的.int endNuM = Math.min(topDocs.totalHits, currentPage * PAGE_SIZE);而且分页的话,不是从开始处,开始所以真正的代码为:for (int i = (currentPage - 1) * PAGE_SIZE; i < endNuM; i++)

IndexWriter存在的问题

IndexWriter: 则必须要用单态模式独占.因为每一个Writer都需要lock文件,IndexWriter本身是操作类,支持多线程,所以一个全局的IndexWriter即可

1 @Test
2 public void testIndexWriter()throws Exception{
3         IndexWriter    indexWriter = new IndexWriter(Configuraction.getDirectory(),Configuraction.getAnalyzer(), MaxFieldLength.LIMITED);
4         IndexWriter    indexWriter2 = new IndexWriter(Configuraction.getDirectory(),Configuraction.getAnalyzer(), MaxFieldLength.LIMITED);
5     }

异常信息如下：

1 org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@D:\workspace\lucene3.0\indexData\write.lock

IndexWriter优化

 1 package cn.index;
 2 
 3 import java.io.IOException;
 4 
 5 import org.apache.lucene.index.CorruptIndexException;
 6 import org.apache.lucene.index.IndexWriter;
 7 import org.apache.lucene.index.IndexWriter.MaxFieldLength;
 8 import org.apache.lucene.search.IndexSearcher;
 9 
10 /*
11  * 把indexWriter:设置全局唯一的,项目只负责使用, 由LuceneUtil负责创建和销毁
12  *
13  * */
14 
15 public class LuceneUtil {
16 
17     private static IndexWriter indexWriter = null;
18     // 每一个indexSearcher在创建的时候就给索引库拍了一个快照,后面对索引库更新的操作都不能识别
19     // 也类MYSQL：隔离级别3： 可重复读
20     private static IndexSearcher indexSearcher=null;
21 
22     public static IndexSearcher getIndexSearcher() {
23         // 判断是否有IndexSearcher如果没有则创建一个
24         if(indexSearcher==null){
25             synchronized(LuceneUtil.class){
26                 if(indexSearcher==null){
27                     try {
28                         indexSearcher=new IndexSearcher(Configuraction.getDirectory());
29                     } catch (Exception e) {
30                         // TODO Auto-generated catch block
31                         e.printStackTrace();
32                         new RuntimeException(e);
33                     }
34                 }
35             }
36         }
37         return indexSearcher;
38     }
39 
40     public static IndexWriter getIndexWriter() {
41         // 获取IndexWriter的时候说明要操作索引库,此时关闭前面的indexSearch
42         // indexSearcher
43         closeIndexSearch(indexSearcher);
44         // 把全局变量设置为null
45         indexSearcher=null;
46         return indexWriter;
47     }
48 
49     private static void closeIndexWriter(IndexWriter indexWriter){
50         if(indexWriter!=null){
51             try {
52                 indexWriter.optimize();
53                 indexWriter.close();
54             } catch (IOException e) {
55                 e.printStackTrace();
56                 throw new RuntimeException(e);
57             }
58         }
59     }
60 
61     private static void closeIndexSearch(IndexSearcher indexSearcher){
62         if(indexSearcher!=null){
63             try {
64                 indexSearcher.close();
65             } catch (IOException e) {
66                 e.printStackTrace();
67                 throw new RuntimeException(e);
68             }
69         }
70     }
71 
72     static {
73         try {
74             // 加载LuceneUtil执行indexWriter创建
75             indexWriter = new IndexWriter(Configuraction.getDirectory(),
76                     Configuraction.getAnalyzer(), MaxFieldLength.LIMITED);
77             // 程序关闭的时候要释放资源
78             Runtime.getRuntime().addShutdownHook(new Thread(){
79                 public void run(){
80                     System.out.println("----run-----");
81                     // 程序结束的时候关闭indexWriter
82                     closeIndexWriter(indexWriter);
83                 }
84             });
85         } catch (Exception e) {
86             e.printStackTrace();
87             throw new RuntimeException(e);
88         }
89     }
90 }

posted @ 2018-12-10 14:59 HelloWord404 阅读(594) 评论(0) 收藏举报

刷新页面返回顶部

Hello World

爱我所爱，千夫所指我不改