LLM大模型：OpenManus原理

　继deepseek之后，武汉一个开发monica的团队又开发了manus，号称是全球第一个通用的agent！各路自媒体企图复刻下一个deepseek，疯狂报道！然而manus发布后不久，metaGPT团队5个工程师号称耗时3小时就搞定了一个demo版本的manus，取名openManus，才几天时间就收获了34.4K的start，又火出圈了！今天就研究一下openManus的核心原理！

　 1、先说说为什么要agent：

目前的LLM只能做决策，无法落地实施，所以还需要外部的tool具体干活
目前的LLM虽然已经有各种COT，但纯粹依靠LLM自己完成整个链条是不行的，还是需要人为介入做plan、action、review等工作

　　所以agent诞生了！不管是deep search、deep research、manus等，核心思路都是一样的：plan->action->review->action->review...... 如此循环下去，直到触发结束的条件！大概的流程如下：

　　具体到openManus，核心的流程是这样的：用户输入prompt后，有专门的agent调用LLM针对prompt做任务拆分，把复杂的问题拆解成一个个细分的、逻辑连贯的小问题，然后对于这些小问题，挨个调用tool box的工具执行，最后返回结果给用户！

　　这类通用agent最核心的竞争力就两点了：

plan是否准确：这个主要看底层LLM的能力，对prompt做命名实体识别和意图识别！
tool box的工具是否丰富：用户的需求是多样的，tool是否足够满足用户需求？

　 2、先来看看openManus的目录结构：4个文件夹，分别是agent、flow、prompt、tool，只看名字就知道这个模块的功能了！

　　整个程序入口肯定是各种agent啦！各大agent之间的关系如下：

　（1）agent核心的功能之一不就是plan么，openManus的prompt是这么干的：promt中就直接说明了是expert plan agent，需要生成可执行的plan！

PLANNING_SYSTEM_PROMPT = """
You are an expert Planning Agent tasked with solving problems efficiently through structured plans.
Your job is:
1. Analyze requests to understand the task scope
2. Create a clear, actionable plan that makes meaningful progress with the `planning` tool
3. Execute steps using available tools as needed
4. Track progress and adapt plans when necessary
5. Use `finish` to conclude immediately when the task is complete


Available tools will vary by task but may include:
- `planning`: Create, update, and track plans (commands: create, update, mark_step, etc.)
- `finish`: End the task when complete
Break tasks into logical steps with clear outcomes. Avoid excessive detail or sub-steps.
Think about dependencies and verification methods.
Know when to conclude - don't continue thinking once objectives are met.
"""

NEXT_STEP_PROMPT = """
Based on the current state, what's your next action?
Choose the most efficient path forward:
1. Is the plan sufficient, or does it need refinement?
2. Can you execute the next step immediately?
3. Is the task complete? If so, use `finish` right away.

Be concise in your reasoning, then select the appropriate tool or action.
"""

　　prompt有了，接着就是让LLM对prompt生成plan了，在agent/planning.py文件中：

async def create_initial_plan(self, request: str) -> None:
        """Create an initial plan based on the request."""
        logger.info(f"Creating initial plan with ID: {self.active_plan_id}")

        messages = [
            Message.user_message(
                f"Analyze the request and create a plan with ID {self.active_plan_id}: {request}"
            )
        ]
        self.memory.add_messages(messages)
        response = await self.llm.ask_tool(
            messages=messages,
            system_msgs=[Message.system_message(self.system_prompt)],
            tools=self.available_tools.to_params(),
            tool_choice=ToolChoice.AUTO,
        )
        assistant_msg = Message.from_tool_calls(
            content=response.content, tool_calls=response.tool_calls
        )

        self.memory.add_message(assistant_msg)

        plan_created = False
        for tool_call in response.tool_calls:
            if tool_call.function.name == "planning":
                result = await self.execute_tool(tool_call)
                logger.info(
                    f"Executed tool {tool_call.function.name} with result: {result}"
                )

                # Add tool response to memory
                tool_msg = Message.tool_message(
                    content=result,
                    tool_call_id=tool_call.id,
                    name=tool_call.function.name,
                )
                self.memory.add_message(tool_msg)
                plan_created = True
                break

        if not plan_created:
            logger.warning("No plan created from initial request")
            tool_msg = Message.assistant_message(
                "Error: Parameter `plan_id` is required for command: create"
            )
            self.memory.add_message(tool_msg)

　　plan生成后，就是think和act的循环啦！同理，这部分实现代码在agent/toolcall.py中，如下：think的功能是让LLM选择干活的工具，act负责调用具体的工具执行

async def think(self) -> bool:
        """Process current state and decide next actions using tools"""
        if self.next_step_prompt:
            user_msg = Message.user_message(self.next_step_prompt)
            self.messages += [user_msg]

        # Get response with tool options：让LLM选择使用哪种工具干活
        response = await self.llm.ask_tool(
            messages=self.messages,
            system_msgs=[Message.system_message(self.system_prompt)]
            if self.system_prompt
            else None,
            tools=self.available_tools.to_params(),
            tool_choice=self.tool_choices,
        )
        self.tool_calls = response.tool_calls

        # Log response info
        logger.info(f"✨ {self.name}'s thoughts: {response.content}")
        logger.info(
            f"🛠️ {self.name} selected {len(response.tool_calls) if response.tool_calls else 0} tools to use"
        )
        if response.tool_calls:
            logger.info(
                f"🧰 Tools being prepared: {[call.function.name for call in response.tool_calls]}"
            )

        try:
            # Handle different tool_choices modes
            if self.tool_choices == ToolChoice.NONE:
                if response.tool_calls:
                    logger.warning(
                        f"🤔 Hmm, {self.name} tried to use tools when they weren't available!"
                    )
                if response.content:
                    self.memory.add_message(Message.assistant_message(response.content))
                    return True
                return False

            # Create and add assistant message
            assistant_msg = (
                Message.from_tool_calls(
                    content=response.content, tool_calls=self.tool_calls
                )
                if self.tool_calls
                else Message.assistant_message(response.content)
            )
            self.memory.add_message(assistant_msg)

            if self.tool_choices == ToolChoice.REQUIRED and not self.tool_calls:
                return True  # Will be handled in act()

            # For 'auto' mode, continue with content if no commands but content exists
            if self.tool_choices == ToolChoice.AUTO and not self.tool_calls:
                return bool(response.content)

            return bool(self.tool_calls)
        except Exception as e:
            logger.error(f"🚨 Oops! The {self.name}'s thinking process hit a snag: {e}")
            self.memory.add_message(
                Message.assistant_message(
                    f"Error encountered while processing: {str(e)}"
                )
            )
            return False

    async def act(self) -> str:
        """Execute tool calls and handle their results"""
        if not self.tool_calls:
            if self.tool_choices == ToolChoice.REQUIRED:
                raise ValueError(TOOL_CALL_REQUIRED)

            # Return last message content if no tool calls
            return self.messages[-1].content or "No content or commands to execute"

        results = []
        for command in self.tool_calls:
            result = await self.execute_tool(command)#调用具体的工具干活

            if self.max_observe:
                result = result[: self.max_observe]

            logger.info(
                f"🎯 Tool '{command.function.name}' completed its mission! Result: {result}"
            )

            # Add tool response to memory
            tool_msg = Message.tool_message(
                content=result, tool_call_id=command.id, name=command.function.name
            )
            self.memory.add_message(tool_msg)
            results.append(result)

        return "\n\n".join(results)

　　think和act是循环执行的，直到满足停止条件，这部分功能在agent/base.py实现的：

async def run(self, request: Optional[str] = None) -> str:
        """Execute the agent's main loop asynchronously.

        Args:
            request: Optional initial user request to process.

        Returns:
            A string summarizing the execution results.

        Raises:
            RuntimeError: If the agent is not in IDLE state at start.
        """
        if self.state != AgentState.IDLE:
            raise RuntimeError(f"Cannot run agent from state: {self.state}")

        if request:
            self.update_memory("user", request)

        results: List[str] = []
        async with self.state_context(AgentState.RUNNING):
            while ( # 循环停止的条件：达到最大步数，或agent的状态已经是完成的了
                self.current_step < self.max_steps and self.state != AgentState.FINISHED
            ):
                self.current_step += 1
                logger.info(f"Executing step {self.current_step}/{self.max_steps}")
                step_result = await self.step()

                # Check for stuck state
                if self.is_stuck():
                    self.handle_stuck_state()

                results.append(f"Step {self.current_step}: {step_result}")

            if self.current_step >= self.max_steps:
                self.current_step = 0
                self.state = AgentState.IDLE
                results.append(f"Terminated: Reached max steps ({self.max_steps})")

        return "\n".join(results) if results else "No steps executed"

　　既然是while循环迭代，那每次迭代又有啥不一样的了？举个例子：查找AI最新的新闻，并保存到文件中。第一次think，调用LLM的时候输入用户的prompt和相应的人设、能使用的tool，让LLM自己选择一个合适的tool，并输出到response中！这里的LLM选择了google search去查找新闻，并提供了google search的query！

　　第二次think，给LLM输入的prompt带上了第一轮的prompt和response，类似多轮对话，把多个context收集到一起作为这次的最新的prompt，让LLM继续输出结果，也就是第三次的action是啥！

　　第三次think：同样包含前面两次的promt！但这次LLM反馈已经不需要调用任何工具了，所以这个query至此已经完全结束！

　　整个流程简单！另外，用户也可以添加自己的tool，只要符合MCP协议就行！

　　这里大胆预测一下：2025~2026两年AI相关的细分热点：

推理模型
通用agent
多模态
推理优化

参考：

1、https://github.com/mannaandpoem/OpenManus/blob/main/README_zh.md

2、https://www.bilibili.com/video/BV1WzQPYWEGY/?spm_id_from=333.1007.tianma.8-3-29.click&vd_source=241a5bcb1c13e6828e519dd1f78f35b2 openmanus核心代码解读

3、https://www.bilibili.com/video/BV1AnQNYxEsy?spm_id_from=333.788.recommend_more_video.1&vd_source=241a5bcb1c13e6828e519dd1f78f35b2 MCP技术协议

posted @ 2025-03-15 23:40 第七子007 阅读(2503) 评论(0) 收藏举报

刷新页面返回顶部

第七子007

LLM大模型：OpenManus原理

公告