本文将详细介绍如何通过 API 调用生成图像、分析图像、识别内容，助你快速构建 AI 图像类产品。

🚀 本文使用 OpenAI 接口示例基于 https://api.aaaaapi.com，如你所在地区调用速度慢，可参考我们的稳定中转服务：快速接入中转 API。

一、图像相关能力概览

OpenAI 的视觉模型可以分析图片中的物体、颜色、文字、纹理，甚至理解上下文。核心模型如下：

API 接口	功能说明
Responses API	可接收图像作为输入，并返回图文混合的结果
Images API	支持纯图像生成与编辑
Chat Completions API	支持图片输入生成文字、语音等

二、图像生成（Image Generation）

你可以使用最新的 gpt-image-1 模型或 DALL·E 3 来生成高质量图像。

# 安装依赖：pip install openai
from openai import OpenAI
import base64

client = OpenAI(base_url="https://api.aaaaapi.com", api_key="your-key")

response = client.responses.create(
    model="gpt-4.1-mini",
    input="生成一张灰色虎斑猫拥抱戴着橙色围巾的水獭的图像",
    tools=[{"type": "image_generation"}],
)

image_data = [
    output.result for output in response.output if output.type == "image_generation_call"
]

if image_data:
    with open("cat_otter.png", "wb") as f:
        f.write(base64.b64decode(image_data[0]))

生成后的图像可以直接用于网页、App、小程序等前端展示。

三、图像识别与理解（Vision Analysis）

你可以上传本地图片、通过 URL 或使用 Base64 的方式给模型识别图像内容。

方式一：通过图片 URL 分析

response = client.responses.create(
    model="gpt-4.1-mini",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "这张图片中是什么？"},
            {
                "type": "input_image",
                "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            },
        ],
    }],
)

print(response.output_text)

方式二：上传本地文件（通过 Files API）

with open("your_image.jpg", "rb") as file:
    file_id = client.files.create(file=file, purpose="vision").id

response = client.responses.create(
    model="gpt-4.1-mini",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "图片中包含什么？"},
            {"type": "input_image", "file_id": file_id},
        ],
    }],
)

print(response.output_text)

方式三：Base64 图像输入

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image("your_image.jpg")

response = client.responses.create(
    model="gpt-4.1",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "图片内容是什么？"},
            {
                "type": "input_image",
                "image_url": f"data:image/jpeg;base64,{base64_image}",
                "detail": "high"
            },
        ],
    }],
)

print(response.output_text)

四、图像输入要求与计费说明

支持格式与限制：

参数项	要求
图片格式	PNG, JPEG, WEBP, 非动图 GIF
文件大小	单张最大 50MB，总请求大小不超过 500 张
内容限制	禁止水印、NSFW 内容，需清晰可辨识

`detail` 参数说明：

low：压缩到 512x512 分辨率，速度快、计费少
high：保留细节但会消耗更多 tokens
auto：模型自行判断

建议分析简单图像结构时使用 low，如需 OCR、细节分析时使用 high。

五、图像计费逻辑简述

图像输入会转换为 tokens，根据图像尺寸、质量和模型类型计费。以下是常见模型的图像计费规则简述：

模型	低精度固定成本	高精度每 tile 成本	tile 尺寸
GPT-4o / GPT-4.1	85 tokens	170 tokens	每 512px tile
GPT-4.1-mini	自定义方式，按 32x32 分块计

更多详细的计费策略可以参考官网：定价计算器

小结

OpenAI 图像功能正逐步拓展，已经支持生成类 Midjourney 的图像，也能实现类 GPT-4 Vision 的图像识别功能。无论你是做电商、教育、医疗、还是 AI 工具，都可以集成这些能力快速上线视觉功能。

若你所在地区访问速度慢、不稳定，推荐使用高可用中转站接入 OpenAI API 👉 访问中转 API 服务提高调用稳定性。

如果你觉得这篇文章对你有帮助，欢迎收藏、点赞、评论，后续我们还会发布更多关于 OpenAI 模型多模态应用（图像+文本+语音）开发干货。

如需进一步开发接入文档，可直接访问 OpenAI 官方文档（中文版）。

posted on 2025-07-29 16:55 码农工程师阅读(308) 评论(0) 收藏举报

刷新页面返回顶部

公告

导航