大模型优化技术实施指南

🎯 大模型优化技术实施指南

SFT 监督微调 | RLHF 强化学习 | RAG 检索增强生成

📋 目录

技术概览
RAG 检索增强生成（最容易，推荐先做）
SFT 监督微调（中等难度）
RLHF 强化学习（最复杂）
综合应用方案

🎯 技术概览

三种技术对比

技术	难度	成本	效果	适用场景	实施时间
RAG	⭐	低	⭐⭐⭐⭐	知识更新、企业文档	1-3天
SFT	⭐⭐⭐	中	⭐⭐⭐⭐⭐	特定任务、风格定制	1-2周
RLHF	⭐⭐⭐⭐⭐	高	⭐⭐⭐⭐⭐	对齐人类偏好	1-2月

技术关系图

原始大模型（Qwen2.5:7b）
    ↓
┌───────────────────────────────┐
│  第一阶段：RAG（立即可用）      │
│  - 添加外部知识库              │
│  - 不修改模型参数              │
│  - 成本低，见效快              │
└───────────────────────────────┘
    ↓
┌───────────────────────────────┐
│  第二阶段：SFT 监督微调        │
│  - 用领域数据训练              │
│  - 修改模型参数                │
│  - 提升特定能力                │
└───────────────────────────────┘
    ↓
┌───────────────────────────────┐
│  第三阶段：RLHF 强化学习       │
│  - 人类反馈优化                │
│  - 对齐价值观                  │
│  - 提升回答质量                │
└───────────────────────────────┘
    ↓
企业级定制模型

📚 RAG 检索增强生成

难度: ⭐
推荐指数: ⭐⭐⭐⭐⭐
实施时间: 1-3天
无需重新训练模型！

1.1 什么是 RAG？

RAG 不修改模型本身，而是在生成回答前先检索相关文档，然后将检索到的信息作为上下文提供给模型。

用户提问
    ↓
检索相关文档（向量相似度）
    ↓
构建增强 Prompt（问题 + 检索到的文档）
    ↓
大模型生成回答

1.2 RAG 架构（您的项目已部分实现）

// 您的项目中已有 RAG 基础
backend/src/langchain-version/chains/ragChain.js  // ✅ 已实现
backend/src/utils/vectorDB.js                     // ✅ 已实现

1.3 完整 RAG 实施方案

步骤 1：部署向量数据库

# 启动 Qdrant（已在 docker-compose.local.yml 中）
docker-compose -f docker-compose.local.yml up -d qdrant

# 验证
curl http://localhost:6333

步骤 2：准备 Embedding 模型

# 下载中文 Embedding 模型
ollama pull bge-m3

# 或使用更大的模型（更准确）
ollama pull bge-large-zh

步骤 3：文档处理与向量化

创建 backend/src/utils/rag-pipeline.js：

/**
 * RAG 完整流程
 * 文档处理 → 向量化 → 存储 → 检索 → 生成
 */

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { QdrantClient } from '@qdrant/js-client-rest';
import axios from 'axios';

export class RAGPipeline {
  constructor(config = {}) {
    this.qdrantClient = new QdrantClient({
      url: config.qdrantUrl || 'http://localhost:6333',
    });
    this.collectionName = config.collectionName || 'documents';
    this.embeddingModel = config.embeddingModel || 'bge-m3';
    this.ollamaUrl = config.ollamaUrl || 'http://localhost:11434';
  }

  /**
   * 步骤 1：文档切片
   */
  async chunkDocument(text, metadata = {}) {
    const splitter = new RecursiveCharacterTextSplitter({
      chunkSize: 500,        // 每块 500 字符
      chunkOverlap: 100,     // 重叠 100 字符
      separators: ['\n\n', '\n', '。', '！', '？', '；', '，', ' ', ''],
    });

    const chunks = await splitter.createDocuments([text], [metadata]);
    
    console.log(`文档切分为 ${chunks.length} 个片段`);
    return chunks;
  }

  /**
   * 步骤 2：生成向量 Embedding
   */
  async generateEmbedding(text) {
    try {
      const response = await axios.post(
        `${this.ollamaUrl}/api/embeddings`,
        {
          model: this.embeddingModel,
          prompt: text,
        }
      );

      return response.data.embedding;
    } catch (error) {
      console.error('生成 Embedding 失败:', error.message);
      throw error;
    }
  }

  /**
   * 步骤 3：批量向量化
   */
  async batchEmbeddings(texts) {
    const embeddings = [];
    
    for (let i = 0; i < texts.length; i++) {
      console.log(`向量化进度: ${i + 1}/${texts.length}`);
      const embedding = await this.generateEmbedding(texts[i]);
      embeddings.push(embedding);
      
      // 避免请求过快
      await new Promise(resolve => setTimeout(resolve, 100));
    }
    
    return embeddings;
  }

  /**
   * 步骤 4：创建向量集合
   */
  async createCollection(vectorSize = 1024) {
    try {
      await this.qdrantClient.createCollection(this.collectionName, {
        vectors: {
          size: vectorSize,
          distance: 'Cosine',
        },
        optimizers_config: {
          default_segment_number: 2,
        },
        replication_factor: 1,
      });
      
      console.log(`✅ 集合 ${this.collectionName} 创建成功`);
    } catch (error) {
      if (error.message.includes('already exists')) {
        console.log(`⚠️  集合 ${this.collectionName} 已存在`);
      } else {
        throw error;
      }
    }
  }

  /**
   * 步骤 5：存储文档和向量
   */
  async storeDocuments(chunks) {
    // 确保集合存在
    await this.createCollection();

    // 提取文本
    const texts = chunks.map(chunk => chunk.pageContent);
    
    // 生成向量
    console.log('正在生成向量...');
    const embeddings = await this.batchEmbeddings(texts);

    // 准备存储的点
    const points = chunks.map((chunk, index) => ({
      id: Date.now() + index,
      vector: embeddings[index],
      payload: {
        text: chunk.pageContent,
        metadata: chunk.metadata,
        timestamp: new Date().toISOString(),
      },
    }));

    // 批量上传
    await this.qdrantClient.upsert(this.collectionName, {
      wait: true,
      points: points,
    });

    console.log(`✅ 已存储 ${points.length} 个文档片段`);
    return points.length;
  }

  /**
   * 步骤 6：检索相关文档
   */
  async search(query, topK = 5) {
    // 生成查询向量
    const queryEmbedding = await this.generateEmbedding(query);

    // 向量搜索
    const searchResult = await this.qdrantClient.search(this.collectionName, {
      vector: queryEmbedding,
      limit: topK,
      with_payload: true,
      with_vector: false,
    });

    return searchResult.map(result => ({
      text: result.payload.text,
      score: result.score,
      metadata: result.payload.metadata,
    }));
  }

  /**
   * 步骤 7：RAG 生成（检索 + 生成）
   */
  async generate(query, model = 'qwen2.5:7b') {
    // 1. 检索相关文档
    console.log('🔍 检索相关文档...');
    const documents = await this.search(query, 5);

    if (documents.length === 0) {
      console.log('⚠️  未找到相关文档，使用模型直接回答');
      return await this.directGenerate(query, model);
    }

    // 2. 构建增强 Prompt
    const context = documents
      .map((doc, index) => `[文档 ${index + 1}]\n${doc.text}`)
      .join('\n\n');

    const enhancedPrompt = `请基于以下参考文档回答问题。如果文档中没有相关信息，请说明。

参考文档：
${context}

问题：${query}

回答：`;

    // 3. 调用 LLM 生成
    console.log('💬 生成回答...');
    const response = await axios.post(
      `${this.ollamaUrl}/api/generate`,
      {
        model: model,
        prompt: enhancedPrompt,
        stream: false,
      }
    );

    return {
      answer: response.data.response,
      sources: documents,
      context: context,
    };
  }

  /**
   * 直接生成（无 RAG）
   */
  async directGenerate(query, model) {
    const response = await axios.post(
      `${this.ollamaUrl}/api/generate`,
      {
        model: model,
        prompt: query,
        stream: false,
      }
    );

    return {
      answer: response.data.response,
      sources: [],
      context: null,
    };
  }

  /**
   * 完整流程：添加文档到 RAG 系统
   */
  async addDocument(text, metadata = {}) {
    // 1. 切片
    const chunks = await this.chunkDocument(text, metadata);
    
    // 2. 向量化并存储
    const count = await this.storeDocuments(chunks);
    
    return {
      success: true,
      chunks: count,
      message: `成功添加 ${count} 个文档片段`,
    };
  }
}

// 导出便捷函数
export async function createRAGSystem(config = {}) {
  const rag = new RAGPipeline(config);
  await rag.createCollection();
  return rag;
}

步骤 4：使用示例

// scripts/test-rag.js
import { RAGPipeline } from '../backend/src/utils/rag-pipeline.js';

async function demo() {
  console.log('🚀 RAG 系统演示\n');

  // 1. 创建 RAG 实例
  const rag = new RAGPipeline({
    collectionName: 'company_docs',
  });

  // 2. 添加企业文档
  console.log('📄 添加企业文档...');
  await rag.addDocument(`
    公司年假政策：
    1. 工作满1年可享受5天年假
    2. 工作满3年可享受10天年假
    3. 工作满5年可享受15天年假
    年假必须在当年使用完毕，不可跨年。
  `, { type: 'policy', category: 'hr' });

  await rag.addDocument(`
    公司产品价格表：
    - 基础版：¥99/月
    - 专业版：¥299/月
    - 企业版：¥999/月
    所有版本均支持7天免费试用。
  `, { type: 'pricing', category: 'product' });

  // 3. 测试查询
  console.log('\n💬 测试 RAG 查询...\n');

  const result1 = await rag.generate('工作3年有几天年假？');
  console.log('问题：工作3年有几天年假？');
  console.log('回答：', result1.answer);
  console.log('来源：', result1.sources.length, '个文档片段\n');

  const result2 = await rag.generate('专业版多少钱？');
  console.log('问题：专业版多少钱？');
  console.log('回答：', result2.answer);
  console.log('来源：', result2.sources.length, '个文档片段\n');
}

demo();

步骤 5：集成到 API

// backend/src/routes/rag-enhanced.js
import express from 'express';
import { RAGPipeline } from '../utils/rag-pipeline.js';

const router = express.Router();

// 创建 RAG 实例（单例）
const rag = new RAGPipeline({
  collectionName: 'knowledge_base',
});

/**
 * POST /api/rag/query
 * RAG 查询
 */
router.post('/query', async (req, res) => {
  try {
    const { query } = req.body;
    
    if (!query) {
      return res.status(400).json({ error: '请提供查询内容' });
    }

    const result = await rag.generate(query);

    res.json({
      success: true,
      answer: result.answer,
      sources: result.sources,
    });
  } catch (error) {
    console.error('RAG 查询错误:', error);
    res.status(500).json({ error: '查询失败' });
  }
});

/**
 * POST /api/rag/add-document
 * 添加文档
 */
router.post('/add-document', async (req, res) => {
  try {
    const { text, metadata = {} } = req.body;
    
    if (!text) {
      return res.status(400).json({ error: '请提供文档内容' });
    }

    const result = await rag.addDocument(text, metadata);

    res.json(result);
  } catch (error) {
    console.error('添加文档错误:', error);
    res.status(500).json({ error: '添加文档失败' });
  }
});

export default router;

1.4 RAG 优化技巧

优化 1：混合检索（向量 + 关键词）

async searchHybrid(query, topK = 5) {
  // 1. 向量检索
  const vectorResults = await this.search(query, topK);
  
  // 2. 关键词检索
  const keywordResults = await this.keywordSearch(query, topK);
  
  // 3. 结果合并和重排序
  const combined = this.mergeResults(vectorResults, keywordResults);
  
  return combined.slice(0, topK);
}

优化 2：重排序（Reranking）

async rerank(query, documents) {
  // 使用更强的模型对检索结果重新排序
  const scores = [];
  
  for (const doc of documents) {
    const prompt = `
评分任务：评估文档与问题的相关性（0-10分）

问题：${query}
文档：${doc.text}

相关性评分（只输出数字）：`;

    const response = await callLLM(prompt);
    scores.push(parseFloat(response) || 0);
  }
  
  // 按分数排序
  return documents
    .map((doc, i) => ({ ...doc, rerankScore: scores[i] }))
    .sort((a, b) => b.rerankScore - a.rerankScore);
}

优化 3：自适应检索

async adaptiveRAG(query) {
  // 1. 判断是否需要检索
  const needRetrieval = await this.needsRetrieval(query);
  
  if (!needRetrieval) {
    return await this.directGenerate(query);
  }
  
  // 2. 动态调整检索数量
  const complexity = this.assessComplexity(query);
  const topK = complexity === 'high' ? 10 : 5;
  
  // 3. RAG 生成
  return await this.generate(query, topK);
}

🎓 SFT 监督微调

难度: ⭐⭐⭐
推荐指数: ⭐⭐⭐⭐⭐
实施时间: 1-2周
需要准备训练数据！

2.1 什么是 SFT？

SFT（Supervised Fine-Tuning）是用特定领域的标注数据重新训练模型，使其适应特定任务。

预训练模型（通用能力）
    ↓
+ 领域数据（问答对）
    ↓
微调训练
    ↓
定制模型（领域专家）

2.2 SFT 适用场景

场景	是否需要 SFT	建议
企业客服	✅	准备常见问答对
医疗咨询	✅	使用专业数据
法律顾问	✅	法律条文 + 案例
代码生成	⚠️	考虑使用专用模型
通用对话	❌	无需微调，RAG 即可

2.3 数据准备

数据格式（Alpaca 格式）

[
  {
    "instruction": "什么是机器学习？",
    "input": "",
    "output": "机器学习是人工智能的一个分支，它使计算机能够在没有明确编程的情况下从数据中学习。主要包括监督学习、无监督学习和强化学习三种类型..."
  },
  {
    "instruction": "解释{concept}的概念",
    "input": "神经网络",
    "output": "神经网络是一种模仿人脑神经元工作方式的计算模型..."
  }
]

数据收集方式

现有数据整理
- 客服聊天记录
- 专业文档
- FAQ 数据库
人工标注
- 雇佣领域专家
- 使用标注工具
- 质量审核

模型生成（数据增强）

# 使用强大的模型生成训练数据
prompt = f"生成10个关于{topic}的问答对"

2.4 使用 Ollama + Unsloth 微调

方案 A：Unsloth（推荐，快速）

# 安装 Unsloth
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps xformers trl peft accelerate bitsandbytes

# 微调脚本

创建 scripts/finetune-sft.py：

"""
SFT 监督微调脚本
使用 Unsloth 加速微调
"""

from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments

# 1. 加载模型
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/qwen2-7b-bnb-4bit",  # 4-bit 量化模型
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

# 2. 准备 LoRA 微调配置
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,                    # LoRA rank
    target_modules = [         # 要微调的层
        "q_proj", "k_proj", "v_proj",
        "o_proj", "gate_proj", "up_proj", "down_proj"
    ],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = True,
    random_state = 3407,
)

# 3. 加载训练数据
dataset = load_dataset("json", data_files="training_data.json", split="train")

# 4. 定义 Prompt 模板
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        text = alpaca_prompt.format(instruction, input, output) + tokenizer.eos_token
        texts.append(text)
    return { "text" : texts, }

# 应用格式化
dataset = dataset.map(
    formatting_prompts_func,
    batched = True,
)

# 5. 配置训练参数
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,              # 调整为实际需要的步数
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

# 6. 开始训练
trainer_stats = trainer.train()

# 7. 保存微调后的模型
model.save_pretrained("qwen2-7b-finetuned")
tokenizer.save_pretrained("qwen2-7b-finetuned")

# 8. 导出为 Ollama 格式（可选）
# model.save_pretrained_gguf("qwen2-7b-finetuned-gguf", tokenizer)

print("✅ 微调完成！")

运行微调

# 准备训练数据
# 创建 training_data.json

# 运行微调（需要 GPU）
python scripts/finetune-sft.py

# 预计时间：
# - 1000 样本，RTX 3090：~1小时
# - 10000 样本，RTX 3090：~10小时

导入 Ollama

# 方法 1：创建 Modelfile
cat > Modelfile << 'EOF'
FROM ./qwen2-7b-finetuned

TEMPLATE """{{ .System }}
{{ .Prompt }}"""

PARAMETER stop "<|endoftext|>"
EOF

# 导入模型
ollama create qwen2-finetuned -f Modelfile

# 测试
ollama run qwen2-finetuned "测试微调效果"

2.5 SFT 最佳实践

1. 数据质量 > 数据数量

❌ 10000 条低质量数据
✅ 1000 条高质量数据

2. 数据多样性

{
  "style_diversity": ["正式", "口语", "技术"],
  "length_diversity": ["简短", "中等", "详细"],
  "difficulty_diversity": ["基础", "中级", "高级"]
}

3. 过拟合检测

# 每 100 步验证一次
if step % 100 == 0:
    val_loss = evaluate(model, val_dataset)
    if val_loss > best_val_loss:
        print("⚠️  过拟合，停止训练")
        break

🤖 RLHF 强化学习

难度: ⭐⭐⭐⭐⭐
推荐指数: ⭐⭐⭐
实施时间: 1-2月
需要大量人工反馈！

3.1 什么是 RLHF？

RLHF（Reinforcement Learning from Human Feedback）通过人类反馈来优化模型的输出，使其更符合人类偏好。

SFT 微调模型
    ↓
生成多个候选回答
    ↓
人类排序（哪个回答更好？）
    ↓
训练 Reward Model（奖励模型）
    ↓
PPO 强化学习优化
    ↓
对齐人类偏好的模型

3.2 RLHF 三阶段

阶段 1：SFT 基础模型

# 首先需要一个 SFT 微调后的模型
base_model = load_model("qwen2-7b-sft")

阶段 2：训练 Reward Model

"""
奖励模型训练
学习人类偏好
"""

from transformers import AutoModelForSequenceClassification

# 1. 准备对比数据
comparison_data = [
    {
        "prompt": "什么是AI？",
        "chosen": "AI是人工智能的缩写，指让机器具有智能行为的技术...",  # 👍 更好
        "rejected": "AI就是电脑变聪明"  # 👎 较差
    },
    # ... 更多数据
]

# 2. 加载模型
reward_model = AutoModelForSequenceClassification.from_pretrained(
    "qwen2-7b-sft",
    num_labels=1  # 输出奖励分数
)

# 3. 训练 Reward Model
for batch in comparison_data:
    chosen_score = reward_model(batch["chosen"])
    rejected_score = reward_model(batch["rejected"])
    
    # Loss：让 chosen 得分更高
    loss = -torch.log(torch.sigmoid(chosen_score - rejected_score))
    loss.backward()

阶段 3：PPO 强化学习

"""
使用 PPO 算法优化模型
"""

from trl import PPOTrainer, PPOConfig

# 1. 配置
ppo_config = PPOConfig(
    model_name="qwen2-7b-sft",
    learning_rate=1e-5,
    batch_size=128,
)

# 2. 创建 Trainer
ppo_trainer = PPOTrainer(
    config=ppo_config,
    model=base_model,
    ref_model=reference_model,  # 参考模型（防止偏离太远）
    tokenizer=tokenizer,
    reward_model=reward_model,
)

# 3. 训练循环
for epoch in range(epochs):
    for batch in dataset:
        # 生成回答
        query_tensors = batch["input_ids"]
        response_tensors = ppo_trainer.generate(query_tensors)
        
        # 计算奖励
        rewards = reward_model(response_tensors)
        
        # PPO 更新
        stats = ppo_trainer.step(query_tensors, response_tensors, rewards)

3.3 简化版 RLHF（Direct Preference Optimization - DPO）

DPO 是 RLHF 的简化版本，无需训练 Reward Model。

"""
DPO：直接偏好优化
更简单，效果接近 RLHF
"""

from trl import DPOTrainer

trainer = DPOTrainer(
    model,
    ref_model,
    beta=0.1,  # KL 散度系数
    train_dataset=preference_dataset,
    tokenizer=tokenizer,
)

trainer.train()

3.4 实施建议

对于大多数企业应用，不推荐直接使用 RLHF，原因：

成本高 - 需要大量人工标注
复杂度高 - 训练稳定性差
效果增量有限 - SFT + RAG 已足够

推荐替代方案：

SFT（特定任务） + RAG（知识更新） + Prompt Engineering（行为控制）

🎯 综合应用方案

方案 1：RAG + Prompt（最简单，推荐）

// 适合 90% 的场景
async function enhancedChat(query) {
  // 1. RAG 检索知识
  const knowledge = await ragSearch(query);
  
  // 2. 精心设计的 Prompt
  const prompt = `
你是一个专业的客服助手。请基于以下知识库回答用户问题。

知识库：
${knowledge}

回答要求：
1. 准确、专业
2. 简洁明了
3. 如不确定，说明需要人工处理

用户问题：${query}

你的回答：`;

  // 3. 调用模型
  return await callLLM(prompt);
}

方案 2：RAG + SFT（推荐企业）

1. RAG 处理知识更新
   └─ 企业文档、政策、产品信息

2. SFT 微调模型风格
   └─ 公司语气、专业术语、回答模板

3. 组合使用
   └─ 知识来自 RAG，风格来自 SFT

方案 3：完整流程（大型企业）

1. RAG：知识检索
   ↓
2. SFT 模型：理解和生成
   ↓
3. 后处理：
   - 敏感词过滤
   - 格式优化
   - 事实核查
   ↓
4. 人工审核（可选）
   ↓
5. 用户反馈收集
   ↓
6. 定期 SFT 更新

📊 技术选择决策树

需要提升大模型能力？
    ↓
【问题 1】是否有企业文档/知识库？
    是 → 使用 RAG ✅
    否 → 继续
    ↓
【问题 2】是否需要特定的回答风格/格式？
    是 → 使用 SFT ✅
    否 → 使用 Prompt Engineering
    ↓
【问题 3】是否需要对齐复杂的人类偏好？
    是 → 使用 RLHF（谨慎）
    否 → SFT + RAG 足够

🛠️ 实施路线图

第 1 周：RAG（快速见效）

✅ 部署 Qdrant
✅ 准备企业文档
✅ 实现 RAG 流程
✅ 测试和优化

第 2-3 周：数据准备

✅ 收集标注数据
✅ 数据清洗和格式化
✅ 质量审核
✅ 划分训练/验证集

第 4-5 周：SFT 微调

✅ 环境配置
✅ 模型微调
✅ 效果评估
✅ 导入 Ollama

第 6-8 周：集成和优化

✅ RAG + SFT 集成
✅ API 开发
✅ 性能优化
✅ 用户测试

📈 效果评估

评估指标

指标	RAG	SFT	RLHF
准确性	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
实时性	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐
可解释性	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐
成本	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐

测试方法

# 评估脚本
test_cases = [
    {"query": "...", "expected": "...", "type": "factual"},
    {"query": "...", "expected": "...", "type": "style"},
]

for case in test_cases:
    response = model.generate(case["query"])
    score = evaluate(response, case["expected"])
    print(f"Case: {case['type']}, Score: {score}")

💡 最佳实践

1. 从简单开始

第 1 步：RAG（1周）
第 2 步：Prompt Engineering（1周）
第 3 步：如果还不够，考虑 SFT（2周）
第 4 步：几乎不需要 RLHF

2. 持续迭代

收集用户反馈
    ↓
分析问题类型
    ↓
RAG 可以解决？→ 添加文档
SFT 可以解决？→ 添加训练数据
Prompt 可以解决？→ 优化 Prompt

3. 监控和维护

// 记录每次查询
logger.info({
  query: query,
  method: 'RAG' | 'SFT' | 'Direct',
  satisfaction: userFeedback,
  latency: responseTime,
});

// 定期分析
// 1. 哪些问题 RAG 解决不了？→ 需要 SFT
// 2. 哪些文档经常被检索？→ 重点优化
// 3. 用户满意度趋势？→ 整体效果

🎊 总结

技术优先级

RAG - 立即实施，成本低效果好
Prompt Engineering - 配合 RAG 使用
SFT - 有明确需求时再做
RLHF - 大型企业长期项目

资源需求

技术	人力	算力	数据	时间
RAG	1人	CPU	文档	1周
SFT	2-3人	GPU	标注	1月
RLHF	5+人	GPU 集群	大量反馈	3月+

预期效果

原始模型：60分
+ RAG：75分（+15分）
+ Prompt：80分（+5分）
+ SFT：85分（+5分）
+ RLHF：88分（+3分）

更新时间：2026-01-26
作者：马年行大运

🎯 从 RAG 开始，循序渐进！

posted @ 2026-01-26 16:09 XiaoZhengTou 阅读(6) 评论(0) 收藏举报

刷新页面返回顶部

前端+AI的结合