LLM大模型:OpenManus原理
继deepseek之后,武汉一个开发monica的团队又开发了manus,号称是全球第一个通用的agent!各路自媒体企图复刻下一个deepseek,疯狂报道!然而manus发布后不久,metaGPT团队5个工程师号称耗时3小时就搞定了一个demo版本的manus,取名openManus,才几天时间就收获了34.4K的start,又火出圈了!今天就研究一下openManus的核心原理!
1、先说说为什么要agent:
- 目前的LLM只能做决策,无法落地实施,所以还需要外部的tool具体干活
- 目前的LLM虽然已经有各种COT,但纯粹依靠LLM自己完成整个链条是不行的,还是需要人为介入做plan、action、review等工作
所以agent诞生了!不管是deep search、deep research、manus等,核心思路都是一样的:plan->action->review->action->review...... 如此循环下去,直到触发结束的条件!大概的流程如下:
具体到openManus,核心的流程是这样的:用户输入prompt后,有专门的agent调用LLM针对prompt做任务拆分,把复杂的问题拆解成一个个细分的、逻辑连贯的小问题,然后对于这些小问题,挨个调用tool box的工具执行,最后返回结果给用户!
这类通用agent最核心的竞争力就两点了:
- plan是否准确:这个主要看底层LLM的能力,对prompt做命名实体识别和意图识别!
- tool box的工具是否丰富:用户的需求是多样的,tool是否足够满足用户需求?
2、先来看看openManus的目录结构:4个文件夹,分别是agent、flow、prompt、tool,只看名字就知道这个模块的功能了!
整个程序入口肯定是各种agent啦!各大agent之间的关系如下:
(1)agent核心的功能之一不就是plan么,openManus的prompt是这么干的:promt中就直接说明了是expert plan agent,需要生成可执行的plan!
PLANNING_SYSTEM_PROMPT = """ You are an expert Planning Agent tasked with solving problems efficiently through structured plans. Your job is: 1. Analyze requests to understand the task scope 2. Create a clear, actionable plan that makes meaningful progress with the `planning` tool 3. Execute steps using available tools as needed 4. Track progress and adapt plans when necessary 5. Use `finish` to conclude immediately when the task is complete Available tools will vary by task but may include: - `planning`: Create, update, and track plans (commands: create, update, mark_step, etc.) - `finish`: End the task when complete Break tasks into logical steps with clear outcomes. Avoid excessive detail or sub-steps. Think about dependencies and verification methods. Know when to conclude - don't continue thinking once objectives are met. """ NEXT_STEP_PROMPT = """ Based on the current state, what's your next action? Choose the most efficient path forward: 1. Is the plan sufficient, or does it need refinement? 2. Can you execute the next step immediately? 3. Is the task complete? If so, use `finish` right away. Be concise in your reasoning, then select the appropriate tool or action. """
prompt有了,接着就是让LLM对prompt生成plan了,在agent/planning.py文件中:
async def create_initial_plan(self, request: str) -> None: """Create an initial plan based on the request.""" logger.info(f"Creating initial plan with ID: {self.active_plan_id}") messages = [ Message.user_message( f"Analyze the request and create a plan with ID {self.active_plan_id}: {request}" ) ] self.memory.add_messages(messages) response = await self.llm.ask_tool( messages=messages, system_msgs=[Message.system_message(self.system_prompt)], tools=self.available_tools.to_params(), tool_choice=ToolChoice.AUTO, ) assistant_msg = Message.from_tool_calls( content=response.content, tool_calls=response.tool_calls ) self.memory.add_message(assistant_msg) plan_created = False for tool_call in response.tool_calls: if tool_call.function.name == "planning": result = await self.execute_tool(tool_call) logger.info( f"Executed tool {tool_call.function.name} with result: {result}" ) # Add tool response to memory tool_msg = Message.tool_message( content=result, tool_call_id=tool_call.id, name=tool_call.function.name, ) self.memory.add_message(tool_msg) plan_created = True break if not plan_created: logger.warning("No plan created from initial request") tool_msg = Message.assistant_message( "Error: Parameter `plan_id` is required for command: create" ) self.memory.add_message(tool_msg)
plan生成后,就是think和act的循环啦!同理,这部分实现代码在agent/toolcall.py中,如下:think的功能是让LLM选择干活的工具,act负责调用具体的工具执行
async def think(self) -> bool: """Process current state and decide next actions using tools""" if self.next_step_prompt: user_msg = Message.user_message(self.next_step_prompt) self.messages += [user_msg] # Get response with tool options:让LLM选择使用哪种工具干活 response = await self.llm.ask_tool( messages=self.messages, system_msgs=[Message.system_message(self.system_prompt)] if self.system_prompt else None, tools=self.available_tools.to_params(), tool_choice=self.tool_choices, ) self.tool_calls = response.tool_calls # Log response info logger.info(f"✨ {self.name}'s thoughts: {response.content}") logger.info( f"🛠️ {self.name} selected {len(response.tool_calls) if response.tool_calls else 0} tools to use" ) if response.tool_calls: logger.info( f"🧰 Tools being prepared: {[call.function.name for call in response.tool_calls]}" ) try: # Handle different tool_choices modes if self.tool_choices == ToolChoice.NONE: if response.tool_calls: logger.warning( f"🤔 Hmm, {self.name} tried to use tools when they weren't available!" ) if response.content: self.memory.add_message(Message.assistant_message(response.content)) return True return False # Create and add assistant message assistant_msg = ( Message.from_tool_calls( content=response.content, tool_calls=self.tool_calls ) if self.tool_calls else Message.assistant_message(response.content) ) self.memory.add_message(assistant_msg) if self.tool_choices == ToolChoice.REQUIRED and not self.tool_calls: return True # Will be handled in act() # For 'auto' mode, continue with content if no commands but content exists if self.tool_choices == ToolChoice.AUTO and not self.tool_calls: return bool(response.content) return bool(self.tool_calls) except Exception as e: logger.error(f"🚨 Oops! The {self.name}'s thinking process hit a snag: {e}") self.memory.add_message( Message.assistant_message( f"Error encountered while processing: {str(e)}" ) ) return False async def act(self) -> str: """Execute tool calls and handle their results""" if not self.tool_calls: if self.tool_choices == ToolChoice.REQUIRED: raise ValueError(TOOL_CALL_REQUIRED) # Return last message content if no tool calls return self.messages[-1].content or "No content or commands to execute" results = [] for command in self.tool_calls: result = await self.execute_tool(command)#调用具体的工具干活 if self.max_observe: result = result[: self.max_observe] logger.info( f"🎯 Tool '{command.function.name}' completed its mission! Result: {result}" ) # Add tool response to memory tool_msg = Message.tool_message( content=result, tool_call_id=command.id, name=command.function.name ) self.memory.add_message(tool_msg) results.append(result) return "\n\n".join(results)
think和act是循环执行的,直到满足停止条件,这部分功能在agent/base.py实现的:
async def run(self, request: Optional[str] = None) -> str: """Execute the agent's main loop asynchronously. Args: request: Optional initial user request to process. Returns: A string summarizing the execution results. Raises: RuntimeError: If the agent is not in IDLE state at start. """ if self.state != AgentState.IDLE: raise RuntimeError(f"Cannot run agent from state: {self.state}") if request: self.update_memory("user", request) results: List[str] = [] async with self.state_context(AgentState.RUNNING): while ( # 循环停止的条件:达到最大步数,或agent的状态已经是完成的了 self.current_step < self.max_steps and self.state != AgentState.FINISHED ): self.current_step += 1 logger.info(f"Executing step {self.current_step}/{self.max_steps}") step_result = await self.step() # Check for stuck state if self.is_stuck(): self.handle_stuck_state() results.append(f"Step {self.current_step}: {step_result}") if self.current_step >= self.max_steps: self.current_step = 0 self.state = AgentState.IDLE results.append(f"Terminated: Reached max steps ({self.max_steps})") return "\n".join(results) if results else "No steps executed"
既然是while循环迭代,那每次迭代又有啥不一样的了?举个例子:查找AI最新的新闻,并保存到文件中。第一次think,调用LLM的时候输入用户的prompt和相应的人设、能使用的tool,让LLM自己选择一个合适的tool,并输出到response中!这里的LLM选择了google search去查找新闻,并提供了google search的query!
第二次think,给LLM输入的prompt带上了第一轮的prompt和response,类似多轮对话,把多个context收集到一起作为这次的最新的prompt,让LLM继续输出结果,也就是第三次的action是啥!
第三次think:同样包含前面两次的promt!但这次LLM反馈已经不需要调用任何工具了,所以这个query至此已经完全结束!
整个流程简单!另外,用户也可以添加自己的tool,只要符合MCP协议就行!
这里大胆预测一下:2025~2026两年AI相关的细分热点:
- 推理模型
- 通用agent
- 多模态
- 推理优化
参考:
1、https://github.com/mannaandpoem/OpenManus/blob/main/README_zh.md
2、https://www.bilibili.com/video/BV1WzQPYWEGY/?spm_id_from=333.1007.tianma.8-3-29.click&vd_source=241a5bcb1c13e6828e519dd1f78f35b2 openmanus核心代码解读
3、https://www.bilibili.com/video/BV1AnQNYxEsy?spm_id_from=333.788.recommend_more_video.1&vd_source=241a5bcb1c13e6828e519dd1f78f35b2 MCP技术协议