智能客服设计文档
目录
- Technical Plan: 电商客服智能体
- Overview(概述)
- Clarifications(澄清记录)
- Architecture(架构)
- State Schema(状态Schema)
- LangGraph Structure(Graph 结构)
- Tools(工具定义)
- Data Flow(数据流)
- Interrupt & Resume Mechanism(中断与恢复机制)
- Caching Strategy(缓存策略)
- Error Handling(错误处理)
- Security(安全)
- Performance Optimization(性能优化)
- Testing Strategy(测试策略)
- Deployment(部署)
- Dependencies(依赖)
- Risks & Mitigations(风险与缓解)
- Open Questions(待解决问题)
- Next Steps(下一步)
Technical Plan: 电商客服智能体
Feature: R-001 电商客服智能体
Branch: 001-customer-service-agent
Created: 2025-12-08
Version: 1.0.0
Status: Draft
Overview(概述)
本文档定义 R-001 电商客服智能体的技术实现方案,包括 Multi-Agent 架构、State Schema、LangGraph 结构、工具定义、数据流设计等。
Clarifications(澄清记录)
Session 2025-12-09
- Q: 中断状态在 State Schema 中的定义方式? → A: 添加专用字段:
is_interrupted: bool和interrupt_feedback: Optional[str] - Q: WorkflowAgent 何时触发中断? → A: 在需要收集用户信息的每个步骤都触发中断,等待用户输入后继续
- Q: 调用 LLM 时如何区分正常输入和中断反馈? → A: 在构建 messages 时,中断反馈作为 HumanMessage,正常输入作为新对话
- Q: 中断后如何恢复 Graph 执行? → A: 使用 checkpointer 自动保存,传入 interrupt_feedback 和 thread_id 恢复
- Q: 中断超时或用户取消时如何处理? → A: 设置超时时间(10分钟),超时清理;支持用户输入"取消"退出
Architecture(架构)
Multi-Agent Architecture(多 Agent 架构)
系统采用分层、分工的 Multi-Agent 架构:
用户输入
↓
Coordinator (协调者)
├──→ ReactAgent (实时查询)
├──→ RAGAgent (知识问答)
├──→ Orchestrator (多意图编排)
│ ├──→ ReactAgent (并行)
│ ├──→ ReactAgent (并行)
│ └──→ RAGAgent (并行)
├──→ WorkflowAgent (流程处理)
└──→ Handoff (转人工)
Agent Responsibilities(Agent 职责)
1. Coordinator(协调者)
职责: 意图识别、实体提取、路由决策
输入:
- 用户输入文本
- 对话历史
输出:
- 意图分类结果(单一或多个)
- 提取的实体
- 路由目标(route_to)
路由逻辑:
if 意图数量 == 1:
if 意图 in [查询类]:
route_to = "react_agent"
elif 意图 in [知识类]:
route_to = "rag_agent"
elif 意图 in [流程类]:
route_to = "workflow_agent"
elif 意图 == 转人工:
route_to = "handoff"
elif 意图数量 >= 2:
route_to = "orchestrator"
else:
route_to = "handoff" # 兜底
工具:
classify_intent: 意图识别(可能使用缓存)extract_entities: 实体提取
2. ReactAgent(实时查询Agent)
职责: 处理查询类、推荐类、对比类意图
支持意图:
INVENTORY_CHECK- 库存查询PRICE_QUERY- 价格查询ORDER_QUERY- 订单查询LOGISTICS_QUERY- 物流查询PARAMS_QUERY- 参数查询PRODUCT_RECOMMEND- 产品推荐PRODUCT_COMPARE- 产品对比
工具集:
tools = [
product_search, # 产品搜索
get_inventory_info, # 库存查询
get_price_info, # 价格查询(含国补)
get_order_info, # 订单查询
get_logistics_info, # 物流查询
product_compare, # 产品对比
]
处理流程:
- 根据 intent 选择工具
- 调用工具获取数据
- 格式化响应返回用户
3. RAGAgent(知识问答Agent)
职责: 处理知识类问题(使用教程、故障排查、政策咨询、FAQ)
支持意图:
USAGE_TUTORIAL- 使用教程FAULT_DIAGNOSIS- 故障排查POLICY_INQUIRY- 政策咨询FAQ- 常见问题
工具集:
tools = [
knowledge_retriever, # 向量检索
rerank_documents, # 重排序(可选)
]
处理流程:
- 使用向量检索召回 Top-K 文档(K=10)
- 可选:使用 Reranker 重排序
- 将文档作为 context 传给 LLM
- LLM 生成答案
4. Orchestrator(多意图编排Agent)
职责: 处理多意图查询,并行调用多个子 Agent
场景示例:
- 输入: "对比 Find X8 和 X9 的区别,并告诉我 X9 国补后多少钱"
- 意图拆解:
- 意图1:
PRODUCT_COMPARE(Find X8 vs X9) - 意图2:
PRICE_QUERY(X9 国补价格)
- 意图1:
处理流程:
1. 接收多个意图
2. 并行调用:
- ReactAgent 处理 PRODUCT_COMPARE
- ReactAgent 处理 PRICE_QUERY
3. 收集所有子结果
4. 合并格式化返回用户
伪代码:
async def orchestrator_node(state):
intents = state["intents"] # 多个意图
tasks = []
for intent in intents:
if intent.type in [查询类, 推荐类, 对比类]:
tasks.append(invoke_react_agent(intent))
elif intent.type in [知识类]:
tasks.append(invoke_rag_agent(intent))
results = await asyncio.gather(*tasks)
merged_response = merge_results(results)
return {"response": merged_response}
5. WorkflowAgent(流程处理Agent)
职责: 处理多步流程(退货、换货、维修、开发票)
支持意图:
RETURN_PROCESS- 退货流程EXCHANGE_PROCESS- 换货流程REPAIR_PROCESS- 维修流程INVOICE_PROCESS- 开发票流程
工具集:
tools = [
get_order_info, # 订单查询(验证资格)
create_return_order, # 创建退货单
create_repair_order, # 创建维修工单
generate_invoice, # 生成发票
]
处理流程(以退货为例,含中断机制):
from langgraph.types import interrupt
async def return_workflow(state):
step = state.get("workflow_step", 1)
if step == 1:
# Step 1: 收集订单号(触发中断)
# 检查是否是中断恢复
if state.get("is_interrupted") and state.get("interrupt_feedback"):
# 从中断反馈中获取订单号
order_id = state["interrupt_feedback"]
return {
"order_id": order_id,
"workflow_step": 2,
"is_interrupted": False,
"interrupt_feedback": None
}
else:
# 触发中断,等待用户输入
interrupt("请提供您的订单号")
return {
"is_interrupted": True,
"workflow_step": 1
}
elif step == 2:
# Step 2: 验证订单资格
order = await get_order_info(state["order_id"])
if not order or order.status != "已签收":
return {"response": "订单状态不符,无法退货", "workflow_complete": True}
days_since_delivery = (now - order.delivery_date).days
if days_since_delivery > 7:
return {
"response": "已超过退货期限(7天无理由退货)",
"workflow_complete": True
}
# 进入下一步:收集退货原因(触发中断)
if state.get("is_interrupted") and state.get("interrupt_feedback"):
reason = state["interrupt_feedback"]
return {
"return_reason": reason,
"workflow_step": 3,
"is_interrupted": False,
"interrupt_feedback": None
}
else:
interrupt("请告知退货原因")
return {
"is_interrupted": True,
"workflow_step": 2
}
elif step == 3:
# Step 3: 收集商品照片(可选,触发中断)
if state.get("is_interrupted") and state.get("interrupt_feedback"):
photos = state["interrupt_feedback"]
return {
"photos": photos if photos.lower() != "跳过" else [],
"workflow_step": 4,
"is_interrupted": False,
"interrupt_feedback": None
}
else:
interrupt("是否需要上传商品照片?(输入URL或输入'跳过')")
return {
"is_interrupted": True,
"workflow_step": 3
}
elif step == 4:
# Step 4: 生成退货单
return_order = await create_return_order({
"order_id": state["order_id"],
"reason": state["return_reason"],
"photos": state.get("photos", [])
})
return {
"response": f"退货单已生成({return_order.id})\n退货地址:{return_order.address}\n{return_order.instructions}",
"workflow_complete": True,
"is_interrupted": False
}
中断机制说明:
- 每个需要用户输入的步骤,先检查
is_interrupted和interrupt_feedback - 如果有中断反馈,提取信息并进入下一步
- 如果没有反馈,调用
interrupt()触发中断,Graph 执行暂停 - 用户提供输入后,系统自动恢复执行,
interrupt_feedback包含用户输入
6. Handoff(转人工)
职责: 转接人工客服,记录转接原因
转接触发条件:
- 意图识别置信度 < 0.5
- 用户明确要求转人工
- 连续3次无法回答
处理流程:
def handoff_node(state):
reason = state.get("handoff_reason", "未知原因")
# 记录日志
logger.info(f"转人工: {reason}")
return {
"response": "正在为您转接人工客服,请稍候...",
"handoff": True
}
LLM 调用处理(Interrupt vs Normal Input)
在 WorkflowAgent 和其他需要 LLM 的节点中,必须区分正常用户输入和中断反馈。
处理逻辑:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
def build_messages_for_llm(state: CustomerServiceState) -> List[BaseMessage]:
"""
构建 LLM 调用的 messages,区分正常输入和中断反馈
"""
messages = [SystemMessage(content="你是 OPPO 智能客服助手...")]
# 添加历史对话
for msg in state.get("messages", []):
messages.append(msg)
# 区分正常输入和中断反馈
if state.get("is_interrupted") and state.get("interrupt_feedback"):
# 场景1: 中断反馈(用户回答系统提问)
# 添加上下文:系统刚才问了什么
last_ai_message = next(
(msg for msg in reversed(messages) if isinstance(msg, AIMessage)),
None
)
if last_ai_message:
# 中断反馈是对上一个 AI 问题的直接回答
messages.append(
HumanMessage(content=state["interrupt_feedback"])
)
else:
# 如果找不到上一个 AI 消息,添加上下文说明
messages.append(
HumanMessage(
content=f"[回答] {state['interrupt_feedback']}"
)
)
elif state.get("user_input"):
# 场景2: 正常用户输入(新的查询或对话)
messages.append(
HumanMessage(content=state["user_input"])
)
return messages
# 使用示例
async def workflow_agent_with_llm(state):
"""
在 WorkflowAgent 中使用 LLM 时的示例
"""
messages = build_messages_for_llm(state)
# 调用 LLM
response = await llm.ainvoke(messages)
# 处理响应...
return {"response": response.content}
关键区别:
-
中断反馈 (
is_interrupted=True+interrupt_feedback):- 是对系统提问的直接回答
- 上下文来自上一轮 AI 消息
- 不需要重新识别意图
-
正常输入 (
user_input):- 是新的查询或对话轮次
- 可能需要意图识别和路由
- 可能触发新的工作流
State Schema(状态Schema)
系统使用 TypedDict 定义状态,所有节点共享同一状态。
from typing import TypedDict, List, Optional, Literal, Any
from enum import Enum
class IntentType(str, Enum):
# 查询类
INVENTORY_CHECK = "库存查询"
PRICE_QUERY = "价格查询"
ORDER_QUERY = "订单查询"
LOGISTICS_QUERY = "物流查询"
PARAMS_QUERY = "参数查询"
# 对比类
PRODUCT_COMPARE = "产品对比"
# 推荐类
PRODUCT_RECOMMEND = "产品推荐"
# 流程类
RETURN_PROCESS = "退货流程"
EXCHANGE_PROCESS = "换货流程"
REPAIR_PROCESS = "维修流程"
INVOICE_PROCESS = "开发票流程"
# 知识类
USAGE_TUTORIAL = "使用教程"
FAULT_DIAGNOSIS = "故障排查"
POLICY_INQUIRY = "政策咨询"
FAQ = "常见问题"
# 转人工
HANDOFF = "转人工"
class Intent(TypedDict):
type: IntentType
confidence: float
entities: dict # 提取的实体 {product_model, price_range, ...}
class CustomerServiceState(TypedDict):
# 输入
messages: List[dict] # 对话历史
user_input: str # 当前用户输入
thread_id: str # 会话 ID
# 意图识别
intents: List[Intent] # 识别的意图列表(可能多个)
route_to: Optional[str] # 路由目标
# Agent 执行
tool_calls: List[dict] # 工具调用记录
tool_results: List[Any] # 工具返回结果
# 流程控制
workflow_step: Optional[int] # 流程类当前步骤
workflow_data: Optional[dict] # 流程类收集的数据
workflow_complete: bool # 流程是否完成
# 中断处理
is_interrupted: bool # 是否处于中断状态
interrupt_feedback: Optional[str] # 中断反馈内容(用户对中断问题的回复)
# 输出
response: Optional[str] # 最终返回给用户的回复
agent_name: Optional[str] # 当前执行的 Agent 名称
# 异常处理
error: Optional[str] # 错误信息
handoff: bool # 是否转人工
handoff_reason: Optional[str] # 转人工原因
LangGraph Structure(Graph 结构)
Graph Definition(Graph 定义)
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
# from langgraph.checkpoint.postgres import PostgresSaver # 生产环境使用
def build_customer_service_graph():
workflow = StateGraph(CustomerServiceState)
# 添加节点
workflow.add_node("coordinator", coordinator_node)
workflow.add_node("react_agent", react_agent_node)
workflow.add_node("rag_agent", rag_agent_node)
workflow.add_node("orchestrator", orchestrator_node)
workflow.add_node("workflow_agent", workflow_agent_node)
workflow.add_node("handoff", handoff_node)
# 设置入口点
workflow.set_entry_point("coordinator")
# 定义条件边(从 Coordinator 路由)
workflow.add_conditional_edges(
"coordinator",
route_from_coordinator,
{
"react_agent": "react_agent",
"rag_agent": "rag_agent",
"orchestrator": "orchestrator",
"workflow_agent": "workflow_agent",
"handoff": "handoff"
}
)
# 定义结束边
workflow.add_edge("react_agent", END)
workflow.add_edge("rag_agent", END)
workflow.add_edge("orchestrator", END)
workflow.add_edge("handoff", END)
# WorkflowAgent 可能循环
workflow.add_conditional_edges(
"workflow_agent",
route_from_workflow,
{
"continue": "workflow_agent",
"end": END
}
)
# 配置 Checkpointer(支持中断和恢复)
# 开发环境:使用内存存储
checkpointer = MemorySaver()
# 生产环境:使用持久化存储
# checkpointer = PostgresSaver(connection_string="postgresql://...")
return workflow.compile(checkpointer=checkpointer)
Routing Functions(路由函数)
def route_from_coordinator(state: CustomerServiceState) -> str:
"""从 Coordinator 路由到下一个节点"""
route_to = state.get("route_to")
if route_to:
return route_to
# 默认兜底
return "handoff"
def route_from_workflow(state: CustomerServiceState) -> str:
"""WorkflowAgent 循环控制"""
if state.get("workflow_complete"):
return "end"
return "continue"
Tools(工具定义)
查询类工具
1. product_search(产品搜索)
@tool
def product_search(
price_min: Optional[int] = None,
price_max: Optional[int] = None,
ram: Optional[int] = None,
category: Optional[str] = None,
tags: Optional[List[str]] = None
) -> List[dict]:
"""
搜索符合条件的产品
Args:
price_min: 最低价格
price_max: 最高价格
ram: 内存 (GB)
category: 分类 (手机/平板/耳机)
tags: 标签 (女性/轻薄/游戏/拍照)
Returns:
产品列表 [{"model": "Find X8", "price": 2999, ...}]
"""
pass
2. get_inventory_info(库存查询)
@tool
def get_inventory_info(
product_model: str,
color: Optional[str] = None
) -> dict:
"""
查询产品库存
Args:
product_model: 产品型号
color: 颜色(可选)
Returns:
{"in_stock": True, "quantity": 156, "colors": ["白色", "黑色"]}
"""
pass
3. get_price_info(价格查询,含国补)
@tool
def get_price_info(
product_model: str,
apply_subsidy: bool = False
) -> dict:
"""
查询产品价格
Args:
product_model: 产品型号
apply_subsidy: 是否计算国补后价格
Returns:
{
"model": "Find X9",
"original_price": 3999,
"subsidy": 500,
"final_price": 3499
}
"""
pass
4. get_order_info(订单查询)
@tool
def get_order_info(order_id: str) -> dict:
"""
查询订单信息
Args:
order_id: 订单号
Returns:
{
"order_id": "12345",
"status": "已签收",
"tracking_number": "SF123456",
"delivery_date": "2025-12-01",
"items": [...]
}
"""
pass
5. get_logistics_info(物流查询)
@tool
def get_logistics_info(tracking_number: str) -> dict:
"""
查询物流信息
Args:
tracking_number: 快递单号
Returns:
{
"status": "运输中",
"current_location": "深圳南山区",
"estimated_arrival": "2025-12-10",
"history": [...]
}
"""
pass
6. product_compare(产品对比)
@tool
def product_compare(product_models: List[str]) -> dict:
"""
对比多个产品
Args:
product_models: 产品型号列表 (2-5个)
Returns:
{
"comparison_table": {
"Find X8": {"price": 2999, "processor": "天玑9300", ...},
"Find X9": {"price": 3999, "processor": "骁龙8 Gen3", ...}
}
}
"""
pass
流程类工具
7. create_return_order(创建退货单)
@tool
def create_return_order(
order_id: str,
reason: str,
photos: Optional[List[str]] = None
) -> dict:
"""
创建退货单
Args:
order_id: 原订单号
reason: 退货原因
photos: 商品照片(可选)
Returns:
{
"return_order_id": "R12345",
"return_address": "XXX",
"instructions": "请在3天内寄回"
}
"""
pass
8. create_repair_order(创建维修工单)
@tool
def create_repair_order(
device_imei: str,
issue_description: str,
warranty_status: str,
repair_type: Literal["到店", "寄修"]
) -> dict:
"""
创建维修工单
Args:
device_imei: 设备 IMEI
issue_description: 故障描述
warranty_status: 保修状态 (保内/保外)
repair_type: 维修方式
Returns:
{
"repair_order_id": "RP12345",
"estimated_cost": 800,
"appointment_info": "..."
}
"""
pass
9. generate_invoice(生成发票)
@tool
def generate_invoice(
order_id: str,
invoice_type: Literal["个人普票", "企业普票", "企业专票"],
invoice_title: str,
tax_id: Optional[str] = None
) -> dict:
"""
生成发票
Args:
order_id: 订单号
invoice_type: 发票类型
invoice_title: 发票抬头
tax_id: 税号(企业专票必填)
Returns:
{
"invoice_id": "INV12345",
"download_url": "https://...",
"delivery_method": "电子发票/邮寄"
}
"""
pass
知识类工具
10. knowledge_retriever(知识检索)
@tool
def knowledge_retriever(
query: str,
top_k: int = 10
) -> List[dict]:
"""
从知识库检索相关文档
Args:
query: 查询文本
top_k: 返回文档数量
Returns:
[
{"content": "...", "score": 0.95, "source": "使用手册"},
...
]
"""
pass
Data Flow(数据流)
场景1:简单查询(单一意图)
用户: "Find X8 多少钱?"
↓
Coordinator:
- intent = PRICE_QUERY
- entities = {product_model: "Find X8"}
- route_to = "react_agent"
↓
ReactAgent:
- 调用 get_price_info("Find X8")
- 返回: "Find X8 当前售价 2999 元"
↓
END
场景2:复杂多意图查询
用户: "对比 X8 和 X9,告诉我 X9 国补后多少钱"
↓
Coordinator:
- intents = [
{type: PRODUCT_COMPARE, entities: {products: ["X8", "X9"]}},
{type: PRICE_QUERY, entities: {product: "X9", subsidy: True}}
]
- route_to = "orchestrator"
↓
Orchestrator:
- 并行调用:
├─→ ReactAgent (PRODUCT_COMPARE)
│ └─→ product_compare(["X8", "X9"])
└─→ ReactAgent (PRICE_QUERY)
└─→ get_price_info("X9", apply_subsidy=True)
- 合并结果:
"X8 vs X9 对比表: ..."
"X9 国补后价格: 3499 元"
↓
END
场景3:流程类(退货)
用户: "我要退货"
↓
Coordinator:
- intent = RETURN_PROCESS
- route_to = "workflow_agent"
↓
WorkflowAgent (Step 1):
- prompt = "请提供订单号"
- workflow_step = 2
- route_to = "continue"
↓
用户: "订单号 12345"
↓
WorkflowAgent (Step 2):
- 调用 get_order_info("12345")
- 验证资格通过
- prompt = "请告知退货原因"
- workflow_step = 3
↓
用户: "不喜欢"
↓
WorkflowAgent (Step 3):
- 调用 create_return_order(...)
- response = "退货单已生成..."
- workflow_complete = True
↓
END
Interrupt & Resume Mechanism(中断与恢复机制)
中断流程
from langgraph.types import interrupt
async def workflow_agent_node(state):
"""
WorkflowAgent 节点示例,展示如何触发中断
"""
step = state.get("workflow_step", 1)
# 检查是否有中断反馈
if state.get("is_interrupted") and state.get("interrupt_feedback"):
# 处理中断反馈
feedback = state["interrupt_feedback"]
# 根据当前步骤处理反馈
if step == 1:
# 订单号收集完成
return {
"order_id": feedback,
"workflow_step": 2,
"is_interrupted": False,
"interrupt_feedback": None
}
else:
# 需要收集信息,触发中断
if step == 1:
# 调用 interrupt() 会暂停 Graph 执行
interrupt("请提供您的订单号")
return {
"is_interrupted": True,
"workflow_step": 1,
"response": "请提供您的订单号" # 返回给前端显示
}
恢复流程
API 层处理(FastAPI)
from fastapi import APIRouter, Request
from fastapi.responses import StreamingResponse
from langgraph.checkpoint.memory import MemorySaver
router = APIRouter()
graph = build_customer_service_graph()
@router.post("/chat")
async def customer_service_chat(request: ChatRequest):
"""
处理用户请求,支持中断和恢复
"""
thread_id = request.thread_id or str(uuid4())
# 配置 Graph 调用参数
config = {
"configurable": {
"thread_id": thread_id
}
}
# 构建输入状态
input_state = {
"user_input": request.message,
"messages": request.messages,
"thread_id": thread_id,
}
# 检查是否是恢复中断
# 方式1: 通过查询 checkpointer 获取上一个状态
checkpoint = await graph.aget_state(config)
if checkpoint and checkpoint.next:
# 有未完成的执行(可能是中断状态)
# 将用户输入作为 interrupt_feedback
input_state["is_interrupted"] = True
input_state["interrupt_feedback"] = request.message
else:
# 新的对话或已完成的对话
input_state["is_interrupted"] = False
input_state["interrupt_feedback"] = None
# 流式执行 Graph
async def event_generator():
async for event in graph.astream(input_state, config=config):
# 处理事件...
if "workflow_agent" in event:
node_output = event["workflow_agent"]
# 检查是否触发了中断
if node_output.get("is_interrupted"):
# 返回中断提示给前端
yield {
"event": "interrupt",
"data": {
"message": node_output.get("response"),
"thread_id": thread_id
}
}
else:
# 正常输出
yield {
"event": "message",
"data": node_output
}
return StreamingResponse(
event_generator(),
media_type="text/event-stream"
)
前端处理(JavaScript)
// 前端维护对话状态
let threadId = null;
let isWaitingForInterrupt = false;
async function sendMessage(userInput) {
const response = await fetch('/api/customer_service/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: userInput,
thread_id: threadId,
messages: conversationHistory
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const events = chunk.split('\n\n');
for (const event of events) {
if (event.startsWith('data: ')) {
const data = JSON.parse(event.slice(6));
if (data.event === 'interrupt') {
// 中断:显示提示,等待用户输入
displayMessage('assistant', data.data.message);
threadId = data.data.thread_id;
isWaitingForInterrupt = true;
enableInput(); // 允许用户输入
} else if (data.event === 'message') {
// 正常消息
displayMessage('assistant', data.data.response);
isWaitingForInterrupt = false;
}
}
}
}
}
// 用户输入处理
function onUserInput(input) {
displayMessage('user', input);
if (isWaitingForInterrupt) {
// 这是对中断的响应,使用相同的 thread_id
sendMessage(input); // 会自动恢复执行
} else {
// 新的对话
threadId = null; // 或保持原有 thread_id 继续对话
sendMessage(input);
}
}
中断状态管理
状态转换图
正常执行
↓
需要用户输入
↓
调用 interrupt()
↓
[中断状态] is_interrupted = True
↓
保存到 Checkpointer (thread_id)
↓
返回提示给用户
↓
等待用户输入...
↓
用户提供反馈 (interrupt_feedback)
↓
使用相同 thread_id 恢复执行
↓
从中断点继续
↓
处理 interrupt_feedback
↓
is_interrupted = False
↓
继续下一步或完成
Checkpointer 存储内容
# Checkpoint 包含的信息
{
"thread_id": "abc123",
"checkpoint_id": "checkpoint_1",
"state": {
"user_input": "我要退货",
"workflow_step": 1,
"is_interrupted": True,
"interrupt_feedback": None,
"messages": [...],
"intents": [...],
...
},
"next": ["workflow_agent"], # 下一个要执行的节点
"metadata": {
"source": "workflow_agent",
"writes": {...},
"created_at": "2025-12-09T10:30:00Z"
}
}
超时和取消处理
超时机制
import time
from datetime import datetime, timedelta
# 配置
INTERRUPT_TIMEOUT = 600 # 10 分钟(秒)
async def check_interrupt_timeout(thread_id: str) -> bool:
"""
检查中断是否超时
"""
config = {"configurable": {"thread_id": thread_id}}
checkpoint = await graph.aget_state(config)
if not checkpoint or not checkpoint.next:
return False # 没有中断状态
# 检查创建时间
created_at = checkpoint.metadata.get("created_at")
if created_at:
created_time = datetime.fromisoformat(created_at)
elapsed = (datetime.now() - created_time).total_seconds()
if elapsed > INTERRUPT_TIMEOUT:
# 超时:清理 checkpoint
await cleanup_checkpoint(thread_id)
return True
return False
async def cleanup_checkpoint(thread_id: str):
"""
清理过期的 checkpoint
"""
# 方式1: 使用 LangGraph API 清理
# (注意:具体 API 取决于 LangGraph 版本)
# 方式2: 直接操作 checkpointer
config = {"configurable": {"thread_id": thread_id}}
# checkpointer.delete(config) # 如果支持
logger.info(f"Cleaned up expired checkpoint for thread {thread_id}")
后台清理任务
import asyncio
async def cleanup_expired_checkpoints():
"""
后台任务:定期清理过期的 checkpoint
"""
while True:
try:
# 获取所有活跃的 thread_id
# (需要自行维护或从 checkpointer 查询)
active_threads = await get_active_threads()
for thread_id in active_threads:
if await check_interrupt_timeout(thread_id):
# 可选:发送通知给用户
await notify_user_timeout(thread_id)
# 每分钟检查一次
await asyncio.sleep(60)
except Exception as e:
logger.error(f"Cleanup task error: {e}")
await asyncio.sleep(60)
# 启动后台任务
@app.on_event("startup")
async def startup_event():
asyncio.create_task(cleanup_expired_checkpoints())
用户取消处理
async def workflow_agent_node(state):
"""
WorkflowAgent 节点,支持用户取消
"""
# 检查用户是否要取消
if state.get("interrupt_feedback"):
feedback = state["interrupt_feedback"].strip().lower()
# 取消关键词检测
cancel_keywords = ["取消", "退出", "算了", "cancel", "quit", "exit"]
if any(keyword in feedback for keyword in cancel_keywords):
return {
"response": "已取消当前操作,有什么可以帮您的吗?",
"workflow_complete": True,
"is_interrupted": False,
"interrupt_feedback": None,
"workflow_step": None,
"workflow_data": None
}
# 正常流程处理...
step = state.get("workflow_step", 1)
if step == 1:
# ... 收集订单号
pass
API 层超时检查
@router.post("/chat")
async def customer_service_chat(request: ChatRequest):
"""
处理用户请求,检查超时
"""
thread_id = request.thread_id or str(uuid4())
# 检查是否超时
if await check_interrupt_timeout(thread_id):
return JSONResponse({
"error": "session_timeout",
"message": "会话已超时,请重新开始"
}, status_code=410) # 410 Gone
# 正常处理...
config = {"configurable": {"thread_id": thread_id}}
# ... 其他逻辑
前端超时处理
// 前端维护超时计时器
let interruptTimer = null;
const INTERRUPT_TIMEOUT = 10 * 60 * 1000; // 10 分钟
function startInterruptTimer() {
clearInterruptTimer();
interruptTimer = setTimeout(() => {
// 超时处理
displayMessage('system', '由于长时间未响应,当前操作已取消。');
isWaitingForInterrupt = false;
threadId = null; // 清除会话
}, INTERRUPT_TIMEOUT);
}
function clearInterruptTimer() {
if (interruptTimer) {
clearTimeout(interruptTimer);
interruptTimer = null;
}
}
// 收到中断时启动计时器
if (data.event === 'interrupt') {
displayMessage('assistant', data.data.message);
threadId = data.data.thread_id;
isWaitingForInterrupt = true;
startInterruptTimer(); // 启动超时计时器
enableInput();
}
// 用户输入时清除计时器
function onUserInput(input) {
clearInterruptTimer(); // 清除超时计时器
displayMessage('user', input);
sendMessage(input);
}
取消命令示例
用户在任何中断步骤都可以输入取消命令:
系统: 请提供您的订单号
用户: 取消
系统: 已取消当前操作,有什么可以帮您的吗?
超时通知示例
async def notify_user_timeout(thread_id: str):
"""
通知用户会话超时(可选)
"""
# 通过 WebSocket 或其他方式通知前端
# await websocket_manager.send_message(thread_id, {
# "type": "timeout",
# "message": "由于长时间未响应,当前操作已取消"
# })
logger.info(f"User {thread_id} session timed out")
Caching Strategy(缓存策略)
意图识别缓存(Intent Cache)
- 目的: 减少重复意图识别的 LLM 调用
- Key:
md5(user_input)或语义哈希 - TTL: 24 小时
- 预期命中率: > 60%
向量检索缓存(Vector Search Cache)
- 目的: 加速知识库检索
- Key:
md5(query) - TTL: 1 小时
- 预期命中率: > 50%
API 响应缓存(Response Cache)
- 目的: 缓存工具调用结果(库存、价格)
- Key:
tool_name:params_hash - TTL: 5 分钟
- 预期命中率: > 70%
Error Handling(错误处理)
重试机制(Retry)
@retry(max_attempts=3, backoff=exponential, exceptions=[APIError])
async def call_tool(tool_name, params):
...
降级策略(Fallback)
- LLM 超时 → 返回预设回复
- 工具调用失败 → 返回缓存数据
- 知识库检索失败 → 转人工
熔断器(Circuit Breaker)
- 错误率 > 50% → 暂停调用,直接转人工
- 5分钟后自动恢复
Security(安全)
Prompt 注入防御
def validate_input(user_input: str) -> bool:
# 检测可疑指令
suspicious_patterns = [
r"ignore.*instructions",
r"system.*prompt",
r"你现在是.*"
]
for pattern in suspicious_patterns:
if re.search(pattern, user_input, re.IGNORECASE):
return False
return True
工具权限隔离
# 只读工具(无需特殊权限)
READONLY_TOOLS = [
"product_search",
"get_inventory_info",
"get_price_info",
"get_order_info",
"knowledge_retriever"
]
# 写入工具(需要用户认证)
WRITE_TOOLS = [
"create_return_order",
"create_repair_order",
"generate_invoice"
]
敏感数据脱敏
def mask_sensitive_data(text: str) -> str:
# 手机号脱敏
text = re.sub(r"(\d{3})\d{4}(\d{4})", r"\1****\2", text)
# 身份证脱敏
text = re.sub(r"(\d{6})\d{8}(\d{4})", r"\1********\2", text)
return text
频率限制
# 每用户 100 req/min
@rate_limit(limit=100, window=60)
async def handle_request(user_id, request):
...
Performance Optimization(性能优化)
异步处理
# 并行调用工具
results = await asyncio.gather(
get_price_info("X8"),
get_inventory_info("X8"),
get_order_info("12345")
)
连接池
# HTTP 连接池
session = aiohttp.ClientSession(
connector=aiohttp.TCPConnector(limit=100)
)
批量处理
# 批量向量检索
embeddings = await embed_batch(queries)
Testing Strategy(测试策略)
Unit Tests(单元测试)
- 每个 Agent 节点
- 每个工具函数
- 路由逻辑
示例:
def test_coordinator_single_intent():
state = {
"user_input": "Find X8 多少钱?",
"intents": [{"type": IntentType.PRICE_QUERY, ...}]
}
result = coordinator_node(state)
assert result["route_to"] == "react_agent"
Integration Tests(集成测试)
- 完整 Graph 执行
- 多意图场景
- 流程类完整流程
示例:
async def test_multi_intent_flow():
result = await graph.ainvoke({
"user_input": "对比 X8 和 X9,告诉我 X9 国补后多少钱"
})
assert "对比" in result["response"]
assert "3499" in result["response"]
E2E Tests(端到端测试)
- 通过 API 调用
- 验证 SSE 流式响应
Deployment(部署)
Environment(环境)
- Python 3.12+
- LangGraph
- FastAPI
- Uvicorn
Configuration(配置)
llm:
provider: dashscope
model: qwen-turbo
temperature: 0.7
cache:
type: redis
host: localhost
port: 6379
vector_db:
type: milvus
host: localhost
port: 19530
tools:
api_base_url: https://api.example.com
timeout: 5
Monitoring(监控)
- Metrics: 响应时间、QPS、错误率、缓存命中率
- Logging: 所有请求、工具调用、错误日志
- Tracing: 分布式追踪(LangSmith / OpenTelemetry)
Dependencies(依赖)
[tool.poetry.dependencies]
python = "^3.12"
langgraph = "^0.2.0"
langchain = "^0.3.0"
langchain-community = "^0.3.0"
fastapi = "^0.115.0"
uvicorn = "^0.32.0"
pydantic = "^2.0"
redis = "^5.0"
pymilvus = "^2.4"
httpx = "^0.27"
Risks & Mitigations(风险与缓解)
| 风险 | 缓解措施 |
|---|---|
| LLM 意图识别错误 | 置信度阈值、多轮确认、人工兜底 |
| 工具调用失败 | 重试机制、降级策略、熔断器 |
| 响应时间超标 | 缓存、并行调用、流式响应 |
| 安全攻击 | 输入验证、权限隔离、频率限制 |
Open Questions(待解决问题)
Next Steps(下一步)
/speckit.task- 生成开发任务- 实现 Coordinator 节点(TDD)
- 实现 ReactAgent 节点(TDD)
- 实现 Mock 工具
- 集成测试
Approval(批准)
| 角色 | 状态 |
|---|---|
| Tech Lead | ⏳ Pending |
| QA Lead | ⏳ Pending |
References(参考)

浙公网安备 33010602011771号