AI开发-python-langchain框架（1-12 返回json-格式解析器）

关键点来了，现在json格式是开发中是最为普遍的数据格式，尤其在前后端交互中应用十分广泛，如何让大模型返回的数据是标准的json格式？

看如下代码：

from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
import os


# 定义您想要的数据结构。
class Book(BaseModel):
    title: str = Field(description="书名")
    author: str = Field(description="作者")
    description: str = Field(description="书的简介")

# Set up a parser + inject instructions into the prompt template.
output_parser = JsonOutputParser(pydantic_object=Book)


format_instructions = output_parser.get_format_instructions()
print('原版提示词')
print(format_instructions)
print('#############')


#改成中文提示词
format_instructions = '''输出应格式化为符合以下 JSON 结构的 JSON 实例。
JSON结构
```
{
'title': '书的标题',
'author': '作者',
'description': '书的简介'
}
```
'''
prompt = PromptTemplate(
    template="{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": format_instructions},
)



# 初始化聊天模型（使用DeepSeek API）
llm = ChatOpenAI(
    api_key=os.getenv("DEEPSEEK_API_KEY"),            # 从环境变量读取API密钥
    base_url=os.getenv("BASE_URL"),                   # 从环境变量读取API基础URL（如 https://api.deepseek.com）
    model="deepseek-v3:671b",                         # 指定使用的模型版本
    temperature=0.7,                                  # 生成随机性控制：0.7 适中创造性
    max_tokens=1024                                   # 单次响应最大token数
)

chain = prompt | llm | output_parser

print('--------------')
# 以及旨在提示语言模型填充数据结构的查询。
query = "请给我介绍2本学习中国历史的经典书籍"
result = chain.invoke({"query": query})
print(result)

#流式输出
# for s in chain.stream({"query": query}):
#     print(s)

输出：

原版提示词
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"title": {"title": "Title", "description": "\u4e66\u540d", "type": "string"}, "author": {"title": "Author", "description": "\u4f5c\u8005", "type": "string"}, "description": {"title": "Description", "description": "\u4e66\u7684\u7b80\u4ecb", "type": "string"}}, "required": ["title", "author", "description"]}
```
#############
--------------
[{'title': '中国通史', 'author': '吕思勉', 'description': '《中国通史》是吕思勉先生的代表作之一，系统全面地介绍了中国从远古时代到近代的历史发展脉络。该书内容详实，分析深入，是学习中国历史的经典入门书籍。'}, {'title': '万历十五年', 'author': '黄仁宇', 'description': '《万历十五年》是黄仁宇先生的经典著作，以明朝万历十五年为切入点，通过细致入微的历史分析，展现了当时社会的政治、经济和文化状况。该书视角独特，文笔流畅，深受读者喜爱。'}]

看这个返回数据是不是就是需要的标准json格式

上面这段代码的核心是通过定义数据结构、构建提示词、调用大模型、解析输出的完整流程，精准控制大模型返回指定格式的 JSON 数据。

首先通过 Pydantic 的 BaseModel 定义 Book 类，明确要求输出包含 title、author、description 三个字段及对应含义，为 JSON 输出提供规则蓝本；

接着利用 JsonOutputParser 绑定该数据结构（关键点），既自动生成格式提示词，又能后续校验并解析模型输出，同时自定义中文格式提示词强化大模型对 JSON 结构的理解，确保字段与定义完全匹配；

再通过 PromptTemplate 将格式要求与用户查询拼接为标准化提示词，明确告知大模型需返回符合结构的 JSON 实例；

初始化兼容第三方模型的 ChatOpenAI 时，将 temperature 设为 0.7 平衡创造性与格式合规性，降低输出偏离 JSON 结构的概率；

最后通过 LangChain 的链式调用（prompt | llm | output_parser）自动化完成 “提示词拼接→模型生成→JSON 解析” 全流程，

最终输出可直接操作的 Python 字典，全程通过结构约束、提示词引导、解析器校验三重保障，实现大模型稳定返回合规 JSON 数据的核心目标。

posted @ 2026-02-06 10:40 万笑佛阅读(20) 评论(0) 收藏举报

刷新页面返回顶部

AI开发-python-langchain框架（1-12 返回json-格式解析器）

公告