实用指南:【lucene】FastVectorHighlighter案例

下面给出一套可直接拷贝运行的 Lucene 8.5.0 + FastVectorHighlighter 完整示例(JDK 8+),演示从建索引、查询到高亮的全过程。

> 关键点:字段必须

1. 存储原始内容(`setStored(true)`)

2. 开启 TermVector(`setStoreTermVectors(true)` + `setStoreTermVectorPositions(true)` + `setStoreTermVectorOffsets(true)`)

---

1. Maven 依赖(Lucene 8.5.0)

```xml

<dependencies>

<dependency>

<groupId>org.apache.lucene</groupId>

<artifactId>lucene-core</artifactId>

<version>8.5.0</version>

</dependency>

<dependency>

<groupId>org.apache.lucene</groupId>

<artifactId>lucene-analyzers-common</artifactId>

<version>8.5.0</version>

</dependency>

<dependency>

<groupId>org.apache.lucene</groupId>

<artifactId>lucene-highlighter</artifactId>

<version>8.5.0</version>

</dependency>

</dependencies>

```

---

2. Java 示例代码

```java

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.*;

import org.apache.lucene.index.*;

import org.apache.lucene.search.*;

import org.apache.lucene.store.ByteBuffersDirectory;

import org.apache.lucene.store.Directory;

import org.apache.lucene.search.highlight.*;

import org.apache.lucene.search.vectorhighlight.*;

public class FastVectorHighlighterDemo {

public static void main(String[] args) throws Exception {

Directory dir = new ByteBuffersDirectory();

IndexWriterConfig cfg = new IndexWriterConfig(new StandardAnalyzer());

IndexWriter writer = new IndexWriter(dir, cfg);

// 1. 定义字段类型:存储 + 分词 + TermVector

FieldType fieldType = new FieldType();

fieldType.setStored(true); // 存储原文

fieldType.setTokenized(true); // 分词

fieldType.setStoreTermVectors(true); // 必须

fieldType.setStoreTermVectorPositions(true); // 必须

fieldType.setStoreTermVectorOffsets(true); // 必须

fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);

fieldType.freeze();

// 2. 添加文档

Document doc = new Document();

doc.add(new Field("title", "Lucene 8.5.0 FastVectorHighlighter示例", fieldType));

doc.add(new Field("body",

"Lucene是一个高效的全文检索库。FastVectorHighlighter利用TermVector实现高速高亮。", fieldType));

writer.addDocument(doc);

writer.commit();

writer.close();

// 3. 查询 & 高亮

IndexReader reader = DirectoryReader.open(dir);

IndexSearcher searcher = new IndexSearcher(reader);

Query query = new BooleanQuery.Builder()

.add(new TermQuery(new Term("body", "全文检索")), BooleanClause.Occur.SHOULD)

.add(new TermQuery(new Term("body", "高亮")), BooleanClause.Occur.SHOULD)

.build();

TopDocs topDocs = searcher.search(query, 10);

int docId = topDocs.scoreDocs[0].doc;

// 4. 使用 FastVectorHighlighter

FastVectorHighlighter highlighter = new FastVectorHighlighter(true, true,

new SimpleFragListBuilder(5),

new ScoreOrderFragmentsBuilder(

BaseFragmentsBuilder.COLORED_PRE_TAGS,

BaseFragmentsBuilder.COLORED_POST_TAGS));

FieldQuery fieldQuery = highlighter.getFieldQuery(query);

String[] frags = highlighter.getBestFragments(fieldQuery, reader, docId,

"body", 100, 3);

// 5. 输出结果

System.out.println("Title: " + reader.document(docId).get("title"));

for (String f : frags) {

System.out.println("Fragment: " + f);

}

reader.close();

}

}

```

---

3. 运行结果(示例)

```

Title: Lucene 8.5.0 FastVectorHighlighter示例

Fragment: Lucene是一个高效的<b style="background:yellow">全文检索</b>库。FastVectorHighlighter利用TermVector构建高速<b style="background:lawngreen">高亮</b>。

```

---

4. 常见坑提醒

问题 原因

高亮返回 `null` 字段没开启 TermVector,或没 `setStored(true)`

MultiPhraseQuery / SpanQuery 无法高亮 FastVectorHighlighter 不支持,需换 UnifiedHighlighter 的 re-analysis 模式

---

直接复制到 IDE 即可运行,祝编码愉快!

posted on 2025-08-04 11:11  ljbguanli  阅读(17)  评论(0)    收藏  举报