[LangGrpah] MapReduce

MapReduce模式，常用于：

拆分任务
并发处理
汇总结果

的场景，例如：

批量调用多个外部接口
对列表进行向量化处理
对多个检索结果做合并
....

核心思想

分而治之

对应LangGraph两个阶段：

Map阶段：将一份输入（列表）拆分成多份，并发的执行，扇出的阶段
Reduce阶段：将所有分支的结果做一个汇总，传入下一个节点，扇入的阶段

举一个例子：对多个关键词并发检索，然后合并为统一答案。

整体流程为：

输入：用户问题
Map：使用MultiQuery技术拆成多个查询，分别发给检索系统
将用户一个原始的问题，改成不同的相关查询。

问：LangGraph的MapReduce是怎么工作的？

LLM：针对上面的问题进行更改
1. LangGraph MapReduce 的执行机制是什么？
2. LangGraph 如何实现扇出与reducer？
3. LangGraph 中的并发 fan-out + fan-in 是如何运作的？
4. LangGraph MapReduce与传统MapReduce有何区别？
RAG技术中一个非常常见的技巧。
Reduce：把多路检索结果合并，去重排序
给LLM最终答案

流程示意图如下：

flowchart TD A[原始输入] --> B{Map 扇出} B --> C1[检索 1] B --> C2[检索 2] B --> C3[检索 3] C1 --> D{Reduce 汇总} C2 --> D C3 --> D D --> E[合并后的最终输出]

和并发的区别

🙋这玩意儿和并发有啥区别呢？看上去就是之前的并发。

并发：这是一种能力

MapReduce：基于并发这种能力而实现的一种模式

并发

能同时跑多个任务的能力。

节点 A →（分发到两个独立节点）→ B、C 同时跑

B 和 C 只是并发，不需要相互汇总，A 也没根据输入数据拆分任务。这种叫并发，是一种基本的能力

flowchart LR A[输入] --> B[任务 B] A --> C[任务 C] B --> D[后续节点] C --> E[另一个后续节点] style B fill:#A7E1FA,stroke:#0182C4 style C fill:#A7E1FA,stroke:#0182C4

MapReduce

基于并发实现的：

任务拆分
并发跑
汇总结果

的一种特性流程模式。

举个例子，输入的数据是 ["A", "B", "C"]，也就是说是一组数据，接下来会：

Map：为每个元素创建一个任务，从而让3个任务并发跑
Reduce：等待3个任务都结束，把结果合并成数组

在这个过程中，有3个关键点：

根据数据动态创建任务数量，例如N条输入会产生N个并发节点
需要等待所有分支完成
最终要把结果收拢为一个合并结果

flowchart TD A[原始输入] --> M{Map 拆分} M --> M1[map 子任务1] M --> M2[map 子任务2] M --> M3[map 子任务3] M1 --> B((Barrier)) M2 --> B M3 --> B B --> R[Reduce 汇总节点] R --> Out[最终输出] style M1 fill:#FFE9A3,stroke:#D6A200 style M2 fill:#FFE9A3,stroke:#D6A200 style M3 fill:#FFE9A3,stroke:#D6A200 style R fill:#C7F7C4,stroke:#29A745 style B fill:#FFFFFF,stroke:#777,stroke-dasharray: 5 5

SendAPI

在LangGraph中，每次调用 send() 就相当于在图中派生出一条新的任务分支，这类似于Map阶段的扇出器。

基础语法

new Send(targetNode, state)

targetNode：要把任务派发给哪一个节点
state：发送给该节点的状态数据

假设我们要处理一组列表：

items = ["A", "B", "C"]

Map节点（专门负责派发出多个任务）内部：

return items.map((item) => new Send("workerNode", { item }));

在上面的代码中：

会动态产生3条并发任务
每一条都会流向workerNode
每一条任务所携带的状态数据为{item}

使用条件边

之前我们在添加边的时候，使用的是addEdge()方法，这种方法添加的是静态边。例如：

A → B

静态边不支持运行时新增多条边，永远只有一条路径、一个任务，一份 state，自然也就无法表达：

“有N个 item，需要N个任务”

这种动态逻辑。

因此，这种动态产生分支数的需求，需要通过条件边来实现。

// 这是一个扇出节点
const fanout = (state: TState) => state.items.map((item) => new Send("worker", { item }));
// 添加条件边
graph.addConditionalEdges(START, fanout)

实战演练

1. 快速上手

import { StateGraph, START, END, Send } from "@langchain/langgraph";
import { registry } from "@langchain/langgraph/zod";
import { z } from "zod";

// 状态Schema
const Schema = z.object({
  items: z.array(z.string()), // items: ["a", "b", "c"]
  results: z.array(z.string()).register(registry, {
    reducer: {
      fn: (prev: string[], next: string[]) => prev.concat(next), // 结果的一个合并
    },
    default: () => [],
  }),
});

// 根据Schema生成对应的 ts 类型
type TState = z.infer<typeof Schema>;

// map节点
const fanout = (state: TState) =>
  state.items.map((item) => new Send("worker", { item }));

// 负责将小写字母转为大写
const worker = (state: { item: string }) => ({
  results: [state.item.toUpperCase()],
});

// 构建图
const graph = new StateGraph(Schema)
  .addNode("worker", worker)
  .addConditionalEdges(START, fanout)
  .addEdge("worker", END)
  .compile();

const result = await graph.invoke({
  items: ["abc", "def", "ghi"],
});
console.log(result);

2. 场景案例

电商风控系统，对一批订单并发打风险分，再统一汇总高危订单与统计报表。

业务需求：

输入：一批待审核订单 orders
Map 阶段：对每个订单并发调用“风控服务”（可以是 HTTP 服务，也可以是 LLM：是否存在欺诈风险？）
Reduce 阶段：
- 收集所有订单的风险评分
- 计算统计信息（高危订单数量、比例等）
- 输出一个“高危订单列表 + 简单统计报表”

对应到 LangGraph 的 MapReduce 模式：

fanoutOrders(Map)：对 orders 做 Send("scoreOrder", { order })
scoreOrder(Worker)：调用风控服务，返回 { orderId, riskScore, level }
aggregateRisk(Reduce)：从 riskResults[] 中做汇总，生成最终输出

import { StateGraph, START, END, Send } from "@langchain/langgraph";
import { registry } from "@langchain/langgraph/zod";
import { z } from "zod";
import pLimit from "p-limit";

// 单个订单结构
const OrderSchema = z.object({
  id: z.string(),
  userId: z.string(),
  amount: z.number(), // 订单金额
  ip: z.string(), // 下单 IP
  device: z.string().optional(), // 设备信息
});

// 单个订单的风险评估结果
const RiskResultSchema = z.object({
  orderId: z.string(),
  riskScore: z.number(), // 0～100
  level: z.enum(["low", "mid", "high"]),
});

const State = z.object({
  // 输入：一批待审核订单
  orders: z.array(OrderSchema),

  // Map阶段输出：每个订单的风险结果
  riskResults: z.array(RiskResultSchema).register(registry, {
    reducer: {
      fn: (prev: string[], next: string[]) => prev.concat(next), // 合并数组
    },
    default: () => [],
  }),

  // Reduce阶段生成的汇总数据
  summary: z
    .object({
      total: z.number(),
      highRiskCount: z.number(),
      highRiskRate: z.number(),
    })
    .optional(),
});

// 创建一个限流器，最多同时执行 5 个任务
const limit = pLimit(5);

// Map阶段：根据订单列表，扇出多个评分任务
const fanoutOrders = (state: z.infer<typeof State>) => {
  return state.orders.map(
    (order) => new Send("scoreOrder", { order }), // 每个订单成为一个独立子任务
  );
};

// 模拟外部风控服务：根据金额简单打分
async function callRiskService(order: z.infer<typeof OrderSchema>) {
  const startTime = new Date().toISOString().split("T")[1];
  console.log(`[${startTime}]开始处理订单: ${order.id}`);

  // 模拟API调用延迟（1-3秒随机延迟）
  const delay = 1000 + Math.random() * 2000;
  await new Promise((resolve) => setTimeout(resolve, delay));

  const base = order.amount / 100;
  const random = Math.random() * 10;
  const score = Math.min(100, base * 10 + random);

  let level: "low" | "mid" | "high" = "low";
  if (score >= 70) level = "high";
  else if (score >= 40) level = "mid";

  const endTime = new Date().toISOString().split("T")[1];
  console.log(
    `[${endTime}]完成处理订单: ${order.id} (耗时: ${Math.round(
      delay,
    )}ms, 风险等级: ${level})`,
  );

  return {
    orderId: order.id,
    riskScore: score,
    level,
  } as z.infer<typeof RiskResultSchema>;
}

// Worker 阶段：对单个订单做风险评估（使用限流器控制并发）
const scoreOrder = async (state: { order: z.infer<typeof OrderSchema> }) => {
  const queueTime = new Date().toISOString().split("T")[1];
  console.log(`[${queueTime}] 订单 ${state.order.id} 进入队列等待...`);

  // 通过限流器执行，确保同时最多只有 5 个任务在执行
  const risk = await limit(() => callRiskService(state.order));

  return {
    riskResults: [risk],
  };
};

// Reduce 阶段：汇总风控结果，生成 summary
const aggregateRisk = (state: z.infer<typeof State>) => {
  const total = state.orders.length;
  const highRisk = state.riskResults.filter((r) => r.level === "high");

  const highRiskCount = highRisk.length;
  const highRiskRate = total === 0 ? 0 : highRiskCount / total;

  return {
    summary: {
      total,
      highRiskCount,
      highRiskRate,
    },
  };
};

// 构建工作流图
const graph = new StateGraph(State)
  .addNode("scoreOrder", scoreOrder)
  .addNode("aggregateRisk", aggregateRisk)
  // START → Map：动态扇出
  .addConditionalEdges(START, fanoutOrders)
  // 所有 scoreOrder 完成后 → Reduce
  .addEdge("scoreOrder", "aggregateRisk")
  .addEdge("aggregateRisk", END)
  .compile();

const input = {
  orders: [
    { id: "o1", userId: "u1", amount: 99, ip: "1.1.1.1" },
    { id: "o2", userId: "u2", amount: 560, ip: "2.2.2.2" },
    { id: "o3", userId: "u3", amount: 3000, ip: "3.3.3.3" },
    { id: "o4", userId: "u4", amount: 1500, ip: "4.4.4.4" },
    { id: "o5", userId: "u5", amount: 800, ip: "5.5.5.5" },
    { id: "o6", userId: "u6", amount: 200, ip: "6.6.6.6" },
    { id: "o7", userId: "u7", amount: 4500, ip: "7.7.7.7" },
    { id: "o8", userId: "u8", amount: 120, ip: "8.8.8.8" },
    { id: "o9", userId: "u9", amount: 2800, ip: "9.9.9.9" },
    { id: "o10", userId: "u10", amount: 670, ip: "10.10.10.10" },
    { id: "o11", userId: "u11", amount: 1900, ip: "11.11.11.11" },
    { id: "o12", userId: "u12", amount: 350, ip: "12.12.12.12" },
  ],
  riskResults: [],
};

console.log("=".repeat(60));
console.log(`开始处理 ${input.orders.length} 个订单，最大并发数: 5`);
console.log("=".repeat(60));

const startTime = Date.now();
const result = await graph.invoke(input);
const endTime = Date.now();

console.log("=".repeat(60));
console.log(
  `✨ 所有订单处理完成！总耗时: ${((endTime - startTime) / 1000).toFixed(2)}秒`,
);
console.log("=".repeat(60));
console.log("\n汇总结果:");
console.log(JSON.stringify(result.summary, null, 2));
console.log("\n详细风险评估:");
console.log(JSON.stringify(result.riskResults, null, 2));

posted @ 2026-02-13 14:31 Zhentiw 阅读(2) 评论(0) 收藏举报

刷新页面返回顶部

Answer1215

[LangGrpah] MapReduce

核心思想

和并发的区别

SendAPI

实战演练

公告