openai api多轮对话

是什么东西在燃烧我的token?

我在做我的一个简单的测试程序的时候，翻看 openai 官方文档的时候感觉非常难受，因为我一开始用的是 Completions API，然而他的多轮对话实现非常啰嗦，用户的内容被放在一个列表里面，
多轮对话就是不断的追加内容上去。这样子的对话有一个很大的缺点：每次对话都会附带上完整的 prompt 和整个历史对话，无论是从网络流量考虑还是 token 数考虑这都不是好的实现。
并且我实现的版本里面并没有把 ai 的回复也追加上去，这使得这个多轮对话并没有带上 ai 自己的回复，如果我提及它说过什么它是回答不上来的。

这是多轮对话?

def chat_with_Haley():
    messages = [{"role": "system", "content": global_prompt + get_info()}]

    while True:
        user_input = input("你: ")
        if user_input.lower() == "#exit":
            print("结束对话。")
            break
        messages.append({"role": "user", "content": user_input})
        reply = chat(messages)
        print(f'Haley:{reply}')

如我前面所说，这样的多轮对话实现其实就是一次次的单轮对话，但是附上了历史内容，这感觉其实根本就不像一个多轮对话的功能。
我对多轮对话的期望其实是像网页端那样的回复。于是我翻看文档，搜索关键词 multi，找到了关于 multi-turn conservation 相关的说明，发现文档里面的实现也是像我这样，基于单轮对话的历史记录式的多轮对话。

真正的多轮对话

我继续翻看文档，寻找真正的多轮对话的最佳实践，里面有一个用 Response API 的实现：

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-4o-mini",
    input="tell me a joke",
)
print(response.output_text)

second_response = client.responses.create(
    model="gpt-4o-mini",
    previous_response_id=response.id,
    input=[{"role": "user", "content": "explain why this is funny."}],
)
print(second_response.output_text)

它通过 previous_response_id 字段来完成回复，并且返回的 response 也会带有这次对话的 id 来让我进行回复。可以在 openai 的后台看到每次对话的 log 信息，我觉得这样的代码才符合我对多轮对话的预期,并且 response API 的调用是可以在 openai 的后台面板上产生记录的。

不过还是有件小事，就是 completion API 需要用一些代理来~~绕过墙~~进行加速，但是 response API 似乎~~没有被墙~~不需要加速，并且有一些代理是没有 response API/没有免费的 response API 的。

从 completion API 迁移到 response API 并不简单，他们的调用方式有一些差异,对比如下。

示例对比

Completion API


from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(

model="gpt-4.1",

messages=[

{

"role": "user",

"content": "Write a one-sentence bedtime story about a unicorn."

}

]

)
print(completion.choices[0].message.content)

Response API


from openai import OpenAI
client = OpenAI()
response = client.responses.create(

model="gpt-4.1",

input=[

{

"role": "user",

"content": "Write a one-sentence bedtime story about a unicorn."

}

]

)
print(response.output_text)

返回的 JSON 对比

Completion API

[
  {
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Under the soft glow of the moon, Luna the unicorn danced through fields of twinkling stardust, leaving trails of dreams for every child asleep.",
      "refusal": null
    },
    "logprobs": null,
    "finish_reason": "stop"
  }
]

Response API

[
  {
    "id": "msg_67b73f697ba4819183a15cc17d011509",
    "type": "message",
    "role": "assistant",
    "content": [
      {
        "type": "output_text",
        "text": "Under the soft glow of the moon, Luna the unicorn danced through fields of twinkling stardust, leaving trails of dreams for every child asleep.",
        "annotations": []
      }
    ]
  }
]

可以看到,尽管他们都是LLM接口,但是二者的接口细节还是千差万别的,并且在文字功能以外他们差别也很大:

功能支持对比

Capabilities	Chat Completions API	Responses API
Text generation	✔️	✔️
Audio	✔️	Coming soon
Vision	✔️	✔️
Structured Outputs	✔️	✔️
Function calling	✔️	✔️
Web search	❌	✔️
File search	❌	✔️
Computer use	❌	✔️
Code interpreter	❌	Coming soon

以上部分内容均摘抄自 openai 官方文档

显然，openai 更推崇 Response API。
这二者的参数和调用上也有很大差别，如果你和我原来一样都是使用 Completions API，我的建议是尽早迁移到 Response API 上。

posted @ 2025-05-28 20:04 mcrock 阅读(354) 评论(0) 收藏举报

刷新页面返回顶部

mcrock