LLM · Agent | 使用 LLM 的通识决策能力,玩星际争霸




← 返回目录

01 main contribution

  • 首先,这篇文章提供了一个将 starcraft 的游戏接口转变为 “输入 text,输出 text” 的实现(TextStarCraft II),这样,observation 和 action 都可以用文本表示。

  • 然后,用 LLM 来生成 action。

    • 在这里,LLM 生成的是 high-level 的 action,比如建设建筑、训练新兵、学习技能等;
    • low-level 的 action 使用一种 rule-based 方法执行,采用预定义的 Python 脚本,类似于 OpenAI Five(先前的神秘工作)使用的方法。
  • LLM 生成 action 的过程:

    • 使用 Chain of Summarization (CoS) 得到 observation:Single-Frame Summarization 使用 rule-based 方法得到,多个 Single-Frame Summarization 再通过 rule-based 方法组合成 Multi-Frame Summarization,最终形成 L1 Results,代表当前情况。
    • 然后,将 L1 results 提供给 LLM,通过 prompt 得到 action。
  • 微调 LLM:使用 LLM 对战的胜利回合 / 前 25% 回合等数据,来微调 Qwen1.8B。

02 原文中的 prompt

L1 results:

Refer to caption

得到 L1 results 后,给 agent 的 prompt 如下:

You are an AI trained in analyzing and summarizing StarCraft II games. You understand the nuances and strategies of the protoss (or zerg) race. Based on the summaries of multiple rounds in a game, we want you to analyze the game progression in a structured way. Your analysis should include the following aspects:

  1. Game Overview: Provide a brief overview of the current situation based on all the rounds.
  2. Current Game Stage: Determine the stage of the game based on the information of all rounds. Is it the early game, mid-game, or late game?
  3. Our Situation: Describe our current status in terms of:
    1. Units and Buildings: Analyze the state of our units and buildings.
    2. Economy: Evaluate our economic condition, including resource collection and usage.
    3. Technology: Describe the status of our technological research and what technologies we have unlocked so far. Analyze our technology tree, indicating the available and potential upgrades or units.
  4. Our Strategy: Infer our potential strategy based on our current situation and the information of all rounds.
  5. Enemy's Strategy: Infer the enemy’s potential strategy, based on the available information.
  6. Key Information: Highlight the most important aspects from all rounds that have significantly influenced the game.
  7. race specific prompt
    1. Zerg: For Zerg, pay attention to whether there are enough larvae. If not, we should consider adding the INJECTLARVA command to the queue.
    2. Protoss: For Protoss, keep an eye on Nexus’s energy to Chrono Boost important structures.
  8. Based on the game situation and strategies used by both sides, provide specific suggestions for the following areas:
    1. Our Strategy: Propose adjustments to our current strategy to counter the enemy’s moves and capitalize on our strengths.
    2. Units and Buildings: Offer ways to enhance our unit composition and improve our building layout, suited to the current stage of the game.
    3. Economy: Recommend better practices for resource gathering and usage, in line with our strategic needs.
    4. Technology: Suggest focused research paths to gain technological advantages, considering our current research status and technology tree.
  9. Lastly, consider the current situation and the suggestions provided, make {K} actionable and specific decisions from the action dictionary protoss_action_dict . This dictionary comprises four categories of actions: unit production, building construction, technology research, and other actions. Remember to align these decisions with the current stage of the game, and avoid proposing actions that are not currently feasible.

misc

  • 这篇工作目前只支持 Protoss(LLM)对抗 Zerg(内置 AI),并没有实时性保证。
  • 代码中似乎包含许多 hard-code 操作,代码很神秘。
  • 听说(只是听说)这个代码进行了很多针对 StarCraft 的优化和 prompt,实验效果实际上一般。

← 返回目录



posted @ 2025-03-10 16:46  MoonOut  阅读(173)  评论(0)    收藏  举报