小智AI 接入 Coze 智能体：流式输出+延时实测

picture.image

猴哥的第 183 期分享，欢迎追看

写在前面：

大家好，我是猴哥，我会在这里围绕 AI，分享工具、技术、思路、感悟。。。

所有文章纯手打，无 AI 添加剂，更多是个人思考和实操总结，欢迎交流。

若对您有帮助，欢迎一键三连，帮我增加曝光，触达更多志趣相投的朋友，谢谢支持~

前面分享了，小智 AI 如何接入 LLM：

低延迟小智AI服务端搭建-LLM篇

服务端统一采用 OpenAI 格式调用，虽然这是目前的最优解，能兼容不同厂商的大模型，并通过 MCP 接入各种外部工具。

不过，缺陷也很明显：如果 LLM 的指令遵循不 OK，就会经常一本正经地胡说八道。

picture.image

实测了以上 LLM，只有 deepseek-v3 和 glm-4-flash 能够 90% 以上成功调用 MCP，无论怎么刁难它。

没错，就这个免费的 glm-4-flash，在工具调用方面，相当出色！且延时低，非常适合拿来测试。

以上，所有 LLM，陪聊还行。

但凡要发挥点生产力，最好得是接入 Agent (智能体)！

而提到 Agent，自然绕不开Coze/扣子。

因此，本文，来聊：

如何将Coze/扣子通过 API 形式，接入小智 AI。

Coze/扣子体验

https://www.coze.cn/

Coze/扣子 是字节出品的智能体搭建工具，相信关注笔者的朋友都不陌生。

picture.image

Coze/扣子 极大降低了普通人使用AI的门槛。

如下，搭建一个陪伴型女友智能体，只需简单几步操作即可：

picture.image

测试没问题后，点击右上角发布，注意勾选下方 API：

picture.image

下面，我们重点聊聊如何调用 Coze 的 API。

Coze/扣子 API 调用

API 文档：https://www.coze.cn/open/docs/developer\_guides

picture.image

个人免费版只有 100 次免费额度，勉强够测试用，因此要想使用 Coze 的 API，需付费到个人进阶版。

Coze 为了配合前端降低大家使用AI的门槛，后端做了大量工作，为此专门设计了一套独立的 API。

这套 API 不兼容 OpenAI 格式，因此要接入小智 AI 的服务端，需进行一番适配。

2.1 准备工作

step 1 : 申请访问令牌

前往：https://www.coze.cn/open/oauth/pats

添加新令牌，注意有效期，现在最多只有一个月:

picture.image

step 2 : 获取 bot\_id

智能体的搭建页面，URL 中 bot 参数后的数字就是bot\_id。

例如 https://www.coze.cn/space/123/bot/73428668，`bot\_id` 为73428668。

2.2 创建会话

会话（Conversation）是 Coze 中一个特有的概念：

picture.image

聊天必须在会话中进行，所以须先创建会话。

python 示例代码如下：

  
headers = {  
    'Authorization': 'Bearer pat\_xxx',  
}  
def create\_conversation():  
    url = 'https://api.coze.cn/v1/conversation/create'      
    data = {  
        'bot\_id': '752'  
    }  
    response = requests.post(url, headers=headers, data=data)  
    if response.status\_code == 200:  
        print(response.json())  
        conversation\_id = response.json()['data']['id']  
        print(f'Conversation created with id: {conversation\_id}')

2.3 对话-非流式调用

非流式接口（stream: false），是异步的，它不会立即返回 LLM 的回复，而是：

先返回一个状态，再查询是否完成，最后获取结果。

因此，一次对话，需要三次请求：

step 1 : 发起会话请求：只会返回一个 chatId

  
def send\_nonstream():  
    # 不包含模型处理结果，只返回状态  
    url = 'https://api.coze.cn/v3/chat'  
    params = {'conversation\_id': '7521349315088973833'}  
    data = {  
        'bot\_id': '752',  
        'user\_id': '123',  
        'stream': False,  
        'additional\_messages': [{'role': 'user', 'content': '你好', 'content\_type': 'text'}],  
    }  
    response = requests.post(url, headers=headers, params=params, json=data)  
    if response.status\_code == 200:  
        print(response.json())  
        data = response.json()['data']  
        chat\_id = data['id']  
        print(f'Message sent with id: {chat\_id}')

step 2 : 查询是否完成：等待 status === 'completed'

  
def chat\_retrieve():  
    # 查看对话详情  
    url = 'https://api.coze.cn/v3/chat/retrieve'  
    params = {'conversation\_id': '7521349315088973833', 'chat\_id': '7521354516424458286'}  
    response = requests.get(url, headers=headers, params=params)  
    if response.status\_code == 200:  
        print(response.json())  
        data = response.json()['data']  
        status = data['status'] # in\_progress, completed, cancelled, failed  
        tokenCounts = data['usage']  
        print(f'Chat status: {status}, tokenCounts: {tokenCounts}')

step 3 : 获取结果：

  
def chat\_message():  
    # 查看对话消息详情  
    url = 'https://api.coze.cn/v3/chat/message/list'  
    params = {'conversation\_id': '7521349315088973833', 'chat\_id': '7521354516424458286'}  
    response = requests.get(url, headers=headers, params=params)  
    if response.status\_code == 200:  
        data = response.json()['data']  
        for message in data:  
            msgType = message['type'] # verbose answer follow\_up  
            content = message['content']  
            print(f'Message type: {msgType}, content: {content}')

注：实际业务场景中，可设置定时轮询，需要异步 I/O 和事件循环。

如下：

  
2025-06-30T03:06:04.472Z Message sent with id: 7521573344756662322  
2025-06-30T03:06:14.669Z Chat completed with status: completed  
2025-06-30T03:06:14.894Z llm: 亲爱的，我查到啦，北京海淀现在是阴天，温度27度，体感温度29度，湿度76%，西北风1级呢。不过比起这天气，我更在意你有没有好好照顾自己呀。你今天有没有遇到什么开心的事儿呢？

2.4 对话-流式调用

流式接口返回的是 Server-Sent Events (SSE) 格式。

流式响应的事件类型有：

chat：对话状态事件（如 created, in_progress, completed, ...）
message：消息内容事件（包含模型回复）
audio：音频事件（如果有）
还有可能有 error、end 等

每个事件的格式大致如下：

  
event: message  
data: {...}  
# or  
event: chat  
data: {...}

你需要逐行解析，并判断 event: 和 data:.

示例代码如下：

  
def send\_stream():  
    url = 'https://api.coze.cn/v3/chat'  
    params = {'conversation\_id': '7521349315088973833'}  
    data = {  
        'bot\_id': '752',  
        'user\_id': '123',  
        'stream': True,  
        'additional\_messages': [{'role': 'user', 'content': '你好', 'content\_type': 'text'}],  
    }  
    response = requests.post(url, headers=headers, params=params, json=data, stream=True)  
    if response.status\_code == 200:  
        for line in response.iter\_lines():  
            if line:  
                decoded\_line = line.decode('utf-8')  
                if decoded\_line.startswith('data:'):  
                    try:  
                        json\_data = json.loads(decoded\_line[5:].strip())  
                        msgType = json\_data.get('type', '') # verbose answer follow\_up  
                        status = json\_data.get('status', '') # created in\_progress, completed, cancelled, failed  
                        if status:  
                            tokenCounts = json\_data['usage']  
                            print(f'Chat status: {status}, tokenCounts: {tokenCounts}')  
                        if msgType:  
                            created\_at = json\_data.get('created\_at', '')  
                            content = json\_data['content']  
                            print(f'Message type: {msgType}, content: {content} at {created\_at}')  
                    except Exception as e:  
                        print(f'Invalid JSON data: {decoded\_line}')