大模型RAG的上下文压缩与过滤

一、为什么要压缩 & 过滤
检索器一次拉回 top-k 篇文档，其中 80% 的 token 与问题无关 →

浪费 LLM 上下文窗口
无关内容引入幻觉
目标：在“送进 LLM 之前”就把无用段落/句子/ token 丢掉或压成更短的表示。

二、整体两阶段流水线
query
│
▼
[1. 检索] ─► 原始文档集合 D=[d1,…,dk]
│
▼
[2. 压缩+过滤] ─► 精炼文档集合 D̂=[d̂1,…,d̂m] (m≤k, |d̂i|≤|di|)
│
▼
[3. 生成] ──► 答案

三、四种常用压缩/过滤策略
A. FILTER – 保留与 query 相关的句子，其余扔掉
B. SUMMARY – 把长文做抽取/生成式摘要
C. PRECISION– 对结构化字段截断小数位、只保留关键键值
D. HYBRID – 先 A→再 B→必要时 C，兼顾精度与压缩率

四、伪代码实现（单文件即可跑通思路）

Python

复制

# 1. 基础数据结构
@dataclass
class Document:
    content: str
    score: float = 0.0          # 检索得分，可用于重排

# 2. 检索器（接口）
class Retriever:
    def retrieve(self, query:str, top_k:int) -> List[Document]:
        # 用向量/ES/关键词均可
        ...

# 3. 生成器（接口）
class Generator:
    def generate(self, prompt:str) -> str:
        # 调用任意 LLM
        ...

# 4. 过滤器：句子级相关度
class InformationFilter:
    def filter_sentences(self, query:str, text:str, keep_ratio:float=0.5) -> str:
        sentences = split_sentences(text)
        scores = [sentence_similarity(s, query) for s in sentences]
        threshold = percentile(scores, (1-keep_ratio)*100)
        keep = [s for s, sc in zip(sentences, scores) if sc >= threshold]
        return ' '.join(keep) if keep else ''

# 5. 摘要器：超长片段再压一层
class SummaryCompressor:
    def compress_document(self, query:str, text:str) -> str:
        prompt = f"请用2句话总结以下文本，保留与问题‘{query}’相关的信息：\n{text}"
        return Generator().generate(prompt)

# 6. 精度压缩器：结构化场景
class PrecisionExtractor:
    def extract_precise_information(self, query:str, text:str) -> str:
        # 简单示例：正则抽数字、日期、金额
        numbers = re.findall(r'\d+(?:\.\d+)?', text)
        return ' '.join(numbers) if numbers else text[:150]  # 兜底截断

# 7. 压缩策略枚举
class Strategy(Enum):
    FILTER = 'filter'
    SUMMARY = 'summary'
    PRECISION = 'precision'
    HYBRID = 'hybrid'

# 8. 核心压缩 RAG 类
class ContextCompressionRAG:
    def __init__(self, retriever:Retriever, generator:Generator,
                 strategy:Strategy=Strategy.HYBRID):
        self.retriever = retriever
        self.generator = generator
        self.strategy = strategy
        self.filter = InformationFilter()
        self.summarizer = SummaryCompressor()
        self.preciser = PrecisionExtractor()

    def run(self, query:str, top_k:int=5) -> str:
        docs = self.retriever.retrieve(query, top_k)
        compressed = self._compress_all(query, docs)
        if not compressed:
            return "抱歉，未找到足够信息。"
        context = '\n'.join(compressed)
        prompt = f"依据以下上下文回答问题：\n{context}\n问题：{query}\n答案："
        return self.generator.generate(prompt)

    # 统一压缩入口
    def _compress_all(self, query:str, docs:List[Document]) -> List[str]:
        if self.strategy == Strategy.FILTER:
            return [self.filter.filter_sentences(query, d.content) for d in docs]
        if self.strategy == Strategy.SUMMARY:
            return [self.summarizer.compress_document(query, d.content) for d in docs]
        if self.strategy == Strategy.PRECISION:
            return [self.preciser.extract_precise_information(query, d.content) for d in docs]
        if self.strategy == Strategy.HYBRID:
            return self._hybrid(query, docs)
        return [d.content for d in docs]

    # HYBRID：先过滤→再摘要/精准提取
    def _hybrid(self, query:str, docs:List[Document]) -> List[str]:
        out = []
        for d in docs:
            txt = self.filter.filter_sentences(query, d.content, keep_ratio=0.6)
            if not txt: 
                continue
            if len(txt) > 500:                       # 仍太长就摘要
                txt = self.summarizer.compress_document(query, txt)
            else:                                    # 否则抽关键字段
                txt = self.preciser.extract_precise_information(query, txt)
            if txt:
                out.append(txt)
        return out

# 9. 使用示例
if __name__ == "__main__":
    rag = ContextCompressionRAG(
            retriever=YourRetriever(),
            generator=YourGenerator(),
            strategy=Strategy.HYBRID)
    answer = rag.run("大模型RAG的上下文压缩与过滤是什么？")
    print(answer)

五、关键技巧小结

句子级过滤：用 embedding 相似度或 cross-encoder 重排，速度快。
摘要失败兜底：摘要模型可能输出空，直接退回到原文截断或关键字段抽取。
压缩率计算：真实压缩比 = 原始总 token 数 / 精炼后 token 数，可在 _compress_all 里顺手统计。
并行化：map-reduce 思路，把长文档先切小块，过滤后再合并，能线性加速。
训练式压缩：如果数据足，可参考 FlexRAG 用“可学习的摘要向量”做端到端压缩

。

把上面伪代码 copy 下来，把 Retriever 和 Generator 换成你的实际实现，即可得到一个带“上下文压缩 + 过滤”的 RAG 原型。祝玩得开心!

posted on 2025-10-11 17:43 ExplorerMan 阅读(2) 评论(0) 收藏举报

刷新页面返回顶部

ExplorerMan

大模型RAG的上下文压缩与过滤

导航

公告