应用引入LLM实践-一次性输出和流式输出(思维链)

在大模型应用时，有的场景希望根据prompt要求一次性输出结果，有的场景则希望输出整个思维过程以及最后的结果。

这部分在网上看了一些文章说的都不一样，自己尝试了一下，正确的写法是这样的，记录一下。

一次性输出：

from openai import OpenAI

def generate_huoshan(prompt):
    client = OpenAI(
    # 从环境变量中读取您的方舟API Key
        api_key="**", 
        base_url="https://ark.cn-beijing.volces.com/api/v3",
        # 深度推理模型耗费时间会较长，建议您设置一个较长的超时时间，推荐为30分钟
        timeout=1800,
    )

    response = client.chat.completions.create(
        model="deepseek-r1-250120",
        messages=[
            {"role": "system", "content": "You are a professional market research assistant who needs to accurately obtain retail price information for specified electronic products in a specific market"},
            {"role": "user", "content": prompt},

        ],
        max_tokens=1024,
        temperature=0.6,
        stream=False
    )
    answer = response.choices[0].message.content
    return answer.strip()

View Code

流式输出：

from openai import OpenAI

def generate_huoshan(prompt):
    client = OpenAI(
        api_key="*", 
        base_url="https://ark.cn-beijing.volces.com/api/v3",
        # 深度推理模型耗费时间会较长，建议您设置一个较长的超时时间，推荐为30分钟
        timeout=1800,
    )

    response = client.chat.completions.create(
    model="deepseek-r1-250120",
    messages=[
    {"role": "system", "content": "You are a professional market research assistant who needs to accurately obtain retail price information for specified electronic products in a specific market"},
    {"role": "user", "content": prompt},

    ],
    max_tokens=1024,
    temperature=0.6,
    stream=True
    )

    for chunk in response:
        delta = chunk.choices[0].delta
        # 优先提取思维链内容
        if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
            yield delta.reasoning_content
            #print(f"[推理过程] {delta.reasoning_content}", end="\n", flush=True)
        # 处理最终回答内容
        elif delta.content:
            yield delta.content
            #print(f"[最终回答] {delta.content}", end="", flush=True)
        else:
            continue

View Code

外层通过这样返回：

   def generate_stream():
        try:
            for chunk in generate_text(model_name, prompt):
                #yield chunk
                yield json.dumps({"msg": "Success", "code": 200, "data": chunk})+ '\n'
               
        except Exception as e:
                yield json.dumps({"code": 500, "message": str(e)})+ '\n'  # Yield a JSON string
    headers = {
        'Content-Type':'text/event-stream',
        'Cache-Control': 'no-cache',
        'X-Accel-Buffering':'no',
    }
    return Response(generate_stream(), mimetype='text/event-stream',headers=headers)

View Code

然后，前端相应做解析即可。

posted @ 2025-04-10 16:56 MasonZhang 阅读(164) 评论(0) 收藏举报

刷新页面返回顶部

miketwais

work up

Less is more

Learn to choose and strive to be more effective

应用引入LLM实践-一次性输出和流式输出(思维链)

公告