[GenAI] Pre-retieval overview

Content

  • Query Rewrite
  • Query Expand
  • Multi-query
  • Query Decomposition
  • HyDE

Query Rewrite

Query Rewrite is a rewirting strategy. It means: without changing the user's core intent, rewrite the original query to make it more precise, clearer, and more suitable for retrieval.

Original user query:

Apple 15 battery life how

Rewritten query:

How is the battery life performance of the iPhone 15?

This does not change the actual question. It only turns a colloquial, vague, and incomplete expression into a more complete and standardized search query.

 

Query Expand

The core idea is: add more context or related terms to the original question, while ususally still expanding around one search request.

For example, the user asks:

Laptop keeps shutting down, what should I do?

After expansion, it may become:

Laptop keeps shutting down, causes, solutions, blue screen, freezing, overheating, driver issues

The key point here is that related terms are added, such as:

shutdown, blue screen, driver, overheating

 

Multi-query

The core idea is: generate multiple queries from different angles, retrieve results separately, and then merge the results.

For example, the user asks:

How to improve R&D team efficiency?

The system generates multiple queries:

Methods to improve R&D team efficiency
Best practices for improving software team collaboration efficiency
R&D process optimization OKR sprint code review
Common bottlenecks affecting R&D efficiency

This is not just “adding more keywords.” Instead, it generates multiple ways of asking the question from different perspectives.

 

Query Decomposition

It can be understood as: breaking down a quetion that is too broad, too complex, or hard to retrieve directly into several smaller, more specific, and easier-to-retrieve questions.

For example, the user asks:

Which company is more worth joining, Company A or Company B?

This question is difficult to retrieve directly, because “more worth joining” is too broad.

The system may first decompose it into several sub-questions:

What is the salary level at Company A?
What is the salary level at Company B?
What is the overtime situation at Company A?
What is the overtime situation at Company B?
What is the growth prospect of Company A?
What is the growth prospect of Company B?

 

What's the difference between Query Decomposition & Multi-query?

Multi-query: The core goal is: for the same question, grenerate several alternative phrasings or retrieval angles, so the system can retrieve more of the relevant content.

Query Decomposition: The core goal is:  a complex question naturally contains multi sub-questions, so the system first breaks it down and then retrieves each part separately.

 

HyDE (Hypothetical Document Embeddings)

First let the modal genearte a hypothetical answer/document, then use that generated content to perform vector retrieval.

Why doing this?

Because the user's original query is often short, colloquial, and information-poor, so it may not directly match the most relevant content.

Therefore, the system adds an intermediate step:

  1. Based on the user's question, let the LLMK first write a piece of content that may appear in the correct answer
  2. Use this generated content for vector retrieval
  3. Retrieve the truly relevant documents.

 

Example:

User question:

Why is RAG more suitable than fine-tuning for knowledge bases that are updated frequently?

HyDE first asks the model to generate a hypothetical answer, for example:

RAG is more suitable for frequently updated knowledge bases because it does not require retraining the model. You only need to update the external knowledge base. In contrast, fine-tuning usually requires retraining or redeployment, which is more costly and has a longer update cycle.

The key difference:

Normal retrieval: retrieve using the “question”
HyDE: retrieve using “a draft that looks like the answer”

HyDE can improve retrieval because the hypothetical answer often contains richer semantic signals than the original question, such as:

frequently updated knowledge base
no retraining
external knowledge base
fine-tuning
redeployment
cost
update cycle

HyDE adds one extra generation step, which makes the pipeline longer.

With HyDE, the process becomes:

  1. First, let the model generate a hypothetical answer
  2. Use that hypothetical answer for retrieval
  3. Then generate the final answer

This introduces:

  • Higher latency
  • Higher cost
  • More system complexity

 

How do we distinguish Query Rewrite, Multi-query, Query Decomposition, and HyDE? What scenarios are they suitable for?

My understanding is that all four are retrieval pre-processing techniques, but they solve different problems.

Query Rewrite mainly solves the problem of user queries being poorly expressed, incomplete, or too colloquial.

Essentially, it rewrites the query to make it clearer and more precise, without changing the user’s core intent.

Multi-query generates multiple phrasings or retrieval angles for the same question.
Its goal is to improve recall. It is suitable when the query is short, can be expressed in many ways, or may miss relevant documents.

Query Decomposition breaks a complex question into several sub-questions.
It is suitable for comparative, composite, or multi-constraint questions, such as: “What is the difference between A and B?” or “Which one is more suitable for a certain scenario?”

HyDE is slightly different. It does not simply rewrite the query.
Instead, it first generates a hypothetical answer or hypothetical document, then uses that generated text for retrieval.
It is suitable when the user query and the knowledge base documents have a large wording gap, or when direct vector retrieval is unstable.

From an engineering perspective, I would usually start with Query Rewrite and Multi-query, because they are relatively low-cost and have more predictable benefits.
HyDE should be introduced later based on experimental results, because it adds extra generation cost, latency, and system complexity.

From an engineering perspective, I would usually start with Query Rewrite and Multi-query, because they are relatively low-cost and have more predictable benefits.
HyDE should be introduced later based on experimental results, because it adds extra generation cost, latency, and system complexity.
 
 
In a real RAG project, when would you consider using Multi-query or HyDE? And when would you not recommend using them?

My main decision criterion is whether the current system’s bottleneck is insufficient recall.

If the user query is very short, ambiguous, can be expressed in many different ways, or the knowledge base uses scattered terminology, I would first consider Multi-query. Its core value is to expand retrieval coverage and improve recall.

If I find a large language/style gap between the user query and the knowledge base documents — for example, the user asks in a very casual way, but the documents are written formally — or if pure vector retrieval is unstable, I would consider HyDE. HyDE helps by generating a hypothetical text that is closer to the document style, and then using that text for retrieval.

But I would not enable either of them by default.

Both approaches increase pipeline complexity, latency, and cost, and they may also introduce noise. This is especially true for HyDE: if the generated hypothetical answer drifts away from the user’s real intent, retrieval may also be biased in the wrong direction.

 

How would you decide which path to take when a user question can be handled either by Query Rewrite or Query Decomposition?

 

I would first determine whether the question is mainly an expression problem or a structure problem.

If the core intent of the question is actually simple, but the user phrased it in a colloquial, incomplete way, used vague references, or chose imprecise keywords, I would prioritize Query Rewrite.

That type of question is still essentially one question. It only needs to be rewritten into a form that is more suitable for retrieval.

But if the question itself contains multiple dimensions — for example, comparing two options, considering time, pros and cons, and applicable scenarios at the same time — or if one question actually hides several sub-tasks, then I would prioritize Query Decomposition.

Because in that case, the problem cannot be solved reliably with a single retrieval query. It needs to be split into several smaller questions and retrieved separately.

In real engineering practice, I don’t think these two are completely mutually exclusive. A common approach is to first do a lightweight rewrite to clean up the expression, then decide whether the question is complex enough to require decomposition.

 

If I were designing a RAG SDK, I would put retrieval pre-processing in a query planning layer between the user request and the retriever.

If I were designing it, I would split it into three layers.

The first layer is the SDK core capability layer.
This layer should not be tightly coupled to any specific strategy. Instead, it should provide unified extension points and runtime orchestration capabilities, such as interfaces like:

QueryPreprocessor
Retriever
Postprocessor
posted @ 2026-06-01 13:35  Zhentiw  阅读(6)  评论(0)    收藏  举报