OpenAI vs Anthropic vs Google:三大 LLM API 深度对比与统一兼容方案

OpenAI vs Anthropic vs Google:三大 LLM API 深度对比与统一兼容方案

在构建 AI 应用时,开发者面临一个现实问题:三大主流 LLM 提供商(OpenAI、Anthropic、Google)的 API 各不相同,如何选择?如何统一调用?如何设计架构以支持灵活切换?

本文将深入对比三大 API 的差异,并提供生产级的统一兼容方案。

一、API 设计哲学对比

1.1 OpenAI:行业标准制定者

OpenAI API 是事实上的行业标准,其设计简洁直观:

from openai import OpenAI

client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7,
    max_tokens=150
)

print(response.choices[0].message.content)
# 输出: The capital of France is Paris.

关键特征
- 系统提示词作为第一条消息(role: "system"
- 响应结构:response.choices[0].message.content
- 支持 Function Calling 和 JSON Mode
- 流式响应使用 Server-Sent Events (SSE)

1.2 Anthropic:长上下文专家

Claude API 独立设置系统提示词,更适合复杂指令:

from anthropic import Anthropic

client = Anthropic(api_key="sk-ant-...")

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=150,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7
)

print(response.content[0].text)
# 输出: The capital of France is Paris.

关键差异
- 独立的 system 参数(不在 messages 数组中)
- 响应结构:response.content[0].text(数组格式,为多模态准备)
- 支持 200K token 上下文(业界最长)
- 使用 tool_use 实现 Function Calling

1.3 Google:多模态原生支持

Gemini API 采用完全不同的设计,原生支持多模态:

import google.generativeai as genai

genai.configure(api_key="...")

model = genai.GenerativeModel('gemini-pro')

response = model.generate_content(
    "What is the capital of France?",
    generation_config={
        "temperature": 0.7,
        "max_output_tokens": 150
    }
)

print(response.text)
# 输出: The capital of France is Paris.

关键特征
- 不使用 messages 数组,直接传入字符串或 parts
- 响应结构:response.text
- 原生支持视频、音频等多模态输入
- Gemini 1.5 Pro 支持 1M token 上下文

二、核心功能对比

2.1 系统提示词处理

提供商 系统提示词位置 更新灵活性
OpenAI messages[0] ❌ 流式模式下无法更新
Anthropic 独立 system 参数 ✅ 可在流式响应中更新
Google system_instruction ✅ 独立配置

OpenAI 的局限

# OpenAI: 系统提示词必须在第一条消息
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
]
# ❌ 无法在流式响应中途修改系统提示词

Anthropic 的优势

# Anthropic: 系统提示词独立,可在流式响应中更新
stream = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="Initial system prompt",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

# ✅ 可以在流式响应中动态调整系统行为

2.2 Function Calling 实现

OpenAI 方式

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": "What's the weather in SF?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    }]
)

tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name)  # "get_weather"
print(tool_call.function.arguments)  # '{"location": "SF"}'

Anthropic 方式

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "What's the weather in SF?"}],
    tools=[{
        "name": "get_weather",
        "description": "Get the current weather",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }]
]

tool_use = response.content[0]
print(tool_use.name)  # "get_weather"
print(tool_use.input)  # {'location': 'SF'}

关键差异
- OpenAI 返回 JSON 字符串(需手动解析)
- Anthropic 直接返回 Python 字典
- Anthropic 使用 input_schema 替代 parameters

2.3 多模态支持

OpenAI Vision API

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image:"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)

Anthropic Vision API

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image:"},
            {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "..."}}
        ]
    }]
)

Google Gemini(原生多模态)

model = genai.GenerativeModel('gemini-pro-vision')

response = model.generate_content([
    "Describe this image:",
    {"mime_type": "image/jpeg", "data": "..."}
])

# ✅ Gemini 原生支持视频和音频
video_response = model.generate_content([
    "Describe this video:",
    {"mime_type": "video/mp4", "data": "..."}
])

对比
- OpenAI 和 Anthropic 需要将图片转为 base64 或 URL
- Gemini 原生支持视频、音频,无需额外处理
- Gemini 的多模态能力最全面

2.4 流式响应

OpenAI 流式

stream = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": "Tell me a joke"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Anthropic 流式

stream = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a joke"}],
    stream=True
)

for event in stream:
    if event.type == "content_block_delta":
        print(event.delta.text, end="")

差异
- OpenAI 使用 delta.content 获取增量内容
- Anthropic 使用事件类型系统(content_block_delta
- Anthropic 的事件类型更丰富(错误、元数据等)

三、计费与性能对比

指标 OpenAI GPT-4 Turbo Anthropic Claude 3.5 Sonnet Google Gemini 1.5 Pro
输入价格 $10 / 1M tokens $3 / 1M tokens $3.50 / 1M tokens
输出价格 $30 / 1M tokens $15 / 1M tokens $10.50 / 1M tokens
上下文窗口 128K tokens 200K tokens 1M tokens
平均延迟 ~2-3s ~3-4s ~4-5s
可靠性 ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐

成本优化建议
- 长文本处理:优先选择 Claude 3.5 Sonnet(200K 上下文 + 低价格)
- 快速响应:选择 OpenAI GPT-4 Turbo(最低延迟)
- 多模态任务:选择 Gemini Pro(原生支持 + 1M 上下文)

四、统一兼容方案

4.1 方案一:LiteLLM(推荐生产环境)

LiteLLM 提供统一的接口,支持 100+ LLM 提供商,自动处理差异。

安装配置

pip install litellm

基础使用

from litellm import completion

# 设置 API keys
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."
os.environ["GOOGLE_API_KEY"] = "..."

# 统一接口调用
def ask_llm(provider: str, question: str) -> str:
    response = completion(
        model=provider,  # "gpt-4", "claude-3-sonnet", "gemini-pro"
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": question}
        ],
        temperature=0.7,
        max_tokens=150
    )
    return response.choices[0].message.content

# 使用示例
print(ask_llm("gpt-4", "What is the capital of France?"))
print(ask_llm("claude-3-opus", "Explain quantum computing"))
print(ask_llm("gemini-pro", "Write a poem about AI"))

自动回退机制

from litellm import completion, fallbacks

# 配置回退链:OpenAI -> Anthropic -> Google
fallbacks = ["claude-3-opus", "gemini-pro"]

response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    fallbacks=fallbacks  # ✅ 自动切换到备用模型
)

# ✅ 当 GPT-4 限流时,自动切换到 Claude Opus

负载均衡

from litellm import completion

# 配置多个提供商的相同模型
models = ["gpt-4", "claude-3-opus", "gemini-pro"]

response = completion(
    model=models,  # ✅ 自动负载均衡
    messages=[{"role": "user", "content": "Hello!"}]
)

流式响应统一

from litellm import completion

response = completion(
    model="claude-3-opus",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

# ✅ LiteLLM 自动处理不同提供商的流式格式差异

生产级配置

from litellm import completion
import os

# 环境变量配置
os.environ["OPENAI_API_KEY"] = "sk-..."
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."
os.environ["GOOGLE_API_KEY"] = "..."

# 配置重试和超时
os.environ["LITELLM_RETRIES"] = "3"
os.environ["LITELLM_TIMEOUT"] = "30"
os.environ["LITELLM_FALLBACKS"] = "['claude-3-opus', 'gemini-pro']"

# 配置日志
os.environ["LITELLM_LOG"] = "INFO"

def robust_llm_call(prompt: str) -> str:
    """生产级 LLM 调用函数"""
    try:
        response = completion(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7,
            num_retries=3,
            timeout=30
        )
        return response.choices[0].message.content
    except Exception as e:
        print(f"LLM call failed: {e}")
        return None

# 使用示例
result = robust_llm_call("Explain async programming in Python")

4.2 方案二:OpenRouter(适合开源模型)

OpenRouter 提供统一接口访问 100+ 模型,包括最新的开源模型。

基础使用

import openai

# 使用 OpenRouter 的 base_url
client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-..."
)

# 访问不同提供商的模型
def ask_model(model_name: str, question: str) -> str:
    response = client.chat.completions.create(
        model=model_name,
        messages=[{"role": "user", "content": question}],
        temperature=0.7
    )
    return response.choices[0].message.content

# 示例:使用不同的模型
print(ask_model("openai/gpt-4-turbo", "What is AI?"))
print(ask_model("anthropic/claude-3.5-sonnet", "Explain machine learning"))
print(ask_model("google/gemini-pro-1.5", "Write code in Python"))
print(ask_model("meta-llama/llama-3-70b", "Tell me a joke"))  # 开源模型

优势

  • ✅ 统一计费和配额
  • ✅ 社区驱动的模型评分
  • ✅ 支持微调模型
  • ✅ 访问最新的开源模型(Llama 3, Mistral 等)

4.3 方案三:LangChain(适合复杂应用)

LangChain 提供更高层次的抽象,适合构建 RAG、Agent 等复杂应用。

统一接口

from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
from langchain_google_genai import ChatGoogleGenerativeAI

# 统一初始化
def get_llm(provider: str):
    if provider == "openai":
        return ChatOpenAI(model="gpt-4-turbo", temperature=0.7)
    elif provider == "anthropic":
        return ChatAnthropic(model="claude-3-5-sonnet-20241022", temperature=0.7)
    elif provider == "google":
        return ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.7)

# 使用示例
llm = get_llm("anthropic")
response = llm.invoke("Explain quantum computing in simple terms")
print(response.content)

构建 RAG 应用

from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA

# 创建向量数据库
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(
    ["Python is a programming language", "JavaScript is for web development"],
    embeddings
)

# 创建 RAG 链
llm = get_llm("openai")
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)

# 查询
result = qa_chain.run("What is Python used for?")
print(result)

构建 Agent

from langchain.agents import initialize_agent, Tool
from langchain.tools import Tool

# 定义工具
def search_wiki(query: str) -> str:
    # 实现维基百科搜索
    return f"Search results for: {query}"

tools = [
    Tool(
        name="Wikipedia",
        func=search_wiki,
        description="Search Wikipedia for information"
    )
]

# 创建 Agent
llm = get_llm("anthropic")
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent="zero-shot-react-description"
)

# 运行 Agent
result = agent.run("What is the capital of France?")
print(result)

五、架构设计建议

5.1 生产级架构图

MERMAID_BLOCK_0

5.2 配置管理

# config/llm_config.yaml
providers:
  openai:
    api_key_env: OPENAI_API_KEY
    models:
      primary: gpt-4-turbo
      fallback: gpt-3.5-turbo
    max_tokens: 4096
    temperature: 0.7

  anthropic:
    api_key_env: ANTHROPIC_API_KEY
    models:
      primary: claude-3-5-sonnet-20241022
      fallback: claude-3-haiku-20240307
    max_tokens: 8192
    temperature: 0.7

  google:
    api_key_env: GOOGLE_API_KEY
    models:
      primary: gemini-pro
      fallback: gemini-1.5-flash
    max_tokens: 2048
    temperature: 0.7

fallback_order:
  - openai
  - anthropic
  - google

monitoring:
  log_level: INFO
  track_costs: true
  max_retries: 3
  timeout: 30

5.3 统一封装类

# llm_manager.py
import yaml
from typing import Optional, List
from litellm import completion
import os

class LLMManager:
    """统一 LLM 管理器"""

    def __init__(self, config_path: str = "config/llm_config.yaml"):
        with open(config_path, 'r') as f:
            self.config = yaml.safe_load(f)
        self.current_provider = self.config['fallback_order'][0]

    def ask(
        self,
        prompt: str,
        provider: Optional[str] = None,
        system_prompt: Optional[str] = None,
        temperature: Optional[float] = None,
        max_tokens: Optional[int] = None
    ) -> str:
        """统一 LLM 调用接口"""
        provider = provider or self.current_provider
        provider_config = self.config['providers'][provider]

        # 构建消息
        messages = []
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        messages.append({"role": "user", "content": prompt})

        # 获取参数
        model = provider_config['models']['primary']
        temperature = temperature or provider_config.get('temperature', 0.7)
        max_tokens = max_tokens or provider_config.get('max_tokens', 4096)

        try:
            response = completion(
                model=model,
                messages=messages,
                temperature=temperature,
                max_tokens=max_tokens,
                fallbacks=self.config['fallback_order']
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"Error with {provider}: {e}")
            # 尝试下一个提供商
            next_provider = self._get_next_provider(provider)
            if next_provider:
                return self.ask(prompt, next_provider, system_prompt, temperature, max_tokens)
            raise

    def _get_next_provider(self, current_provider: str) -> Optional[str]:
        """获取下一个备用提供商"""
        idx = self.config['fallback_order'].index(current_provider)
        if idx + 1 < len(self.config['fallback_order']):
            return self.config['fallback_order'][idx + 1]
        return None

    def stream(self, prompt: str, provider: Optional[str] = None):
        """流式响应"""
        provider = provider or self.current_provider
        provider_config = self.config['providers'][provider]
        model = provider_config['models']['primary']

        response = completion(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            stream=True
        )

        for chunk in response:
            if chunk.choices[0].delta.content:
                yield chunk.choices[0].delta.content

# 使用示例
llm_manager = LLMManager()

# 基础调用
result = llm_manager.ask("Explain async programming in Python")
print(result)

# 流式调用
for chunk in llm_manager.stream("Tell me a story about AI"):
    print(chunk, end="")

# 指定提供商
result = llm_manager.ask("What is the capital of France?", provider="anthropic")
print(result)

六、最佳实践建议

6.1 错误处理

from tenacity import retry, stop_after_attempt, wait_exponential
import litellm

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def robust_llm_call(prompt: str) -> str:
    """带重试的 LLM 调用"""
    try:
        response = completion(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            timeout=30
        )
        return response.choices[0].message.content
    except litellm.exceptions.RateLimitError:
        print("Rate limited, waiting...")
        raise
    except litellm.exceptions.APIError as e:
        print(f"API error: {e}")
        raise
    except Exception as e:
        print(f"Unexpected error: {e}")
        raise

6.2 成本监控

import litellm

# 启用成本跟踪
litellm.set_verbose = True

def track_costs(prompt: str) -> dict:
    """跟踪调用成本"""
    response = completion(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )

    # 获取成本信息
    cost = litellm.completion_cost(completion_response=response)
    tokens = response.usage

    return {
        "cost": cost,
        "prompt_tokens": tokens.prompt_tokens,
        "completion_tokens": tokens.completion_tokens,
        "total_tokens": tokens.total_tokens
    }

# 使用示例
result = track_costs("Explain quantum computing")
print(f"Cost: ${result['cost']:.4f}")
print(f"Tokens: {result['total_tokens']}")

6.3 性能优化

import asyncio
from litellm import acompletion  # 异步接口

async def batch_ask(prompts: List[str]) -> List[str]:
    """批量异步调用"""
    tasks = [
        acompletion(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        for prompt in prompts
    ]

    responses = await asyncio.gather(*tasks)
    return [r.choices[0].message.content for r in responses]

# 使用示例
prompts = [
    "What is Python?",
    "What is JavaScript?",
    "What is Rust?"
]

results = asyncio.run(batch_ask(prompts))
for question, answer in zip(prompts, results):
    print(f"Q: {question}")
    print(f"A: {answer}\n")

七、总结

7.1 三大 API 对比总结

特性 OpenAI Anthropic Google
易用性 ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐
长上下文 128K 200K 1M
多模态 ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐⭐
成本效益 ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
生态成熟度 ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐

7.2 方案选择建议

场景 推荐方案 理由
生产环境 LiteLLM 统一接口、自动回退、生产级可靠性
快速原型 LangChain 高层抽象、丰富的预制组件
成本优化 OpenRouter 访问开源模型、统一计费
企业级 Azure OpenAI / Bedrock 合规性、SLA 保障
本地部署 Ollama / vLLM 数据隐私、零 API 成本

7.3 架构建议

  1. 使用抽象层:始终使用 LiteLLM 或 LangChain,避免直接调用原生 API
  2. 配置多提供商回退:OpenAI → Anthropic → Google
  3. 监控成本和性能:不同模型的成本差异可达 10 倍
  4. 异步批量调用:使用异步接口提升吞吐量
  5. 缓存常见查询:减少重复调用的成本

7.4 未来展望

随着 LLM 市场的快速发展,我们预期:

  • API 标准化:三大厂商可能会逐步统一接口设计
  • 成本下降:竞争加剧将推动价格下降
  • 性能提升:上下文窗口、响应速度持续改进
  • 专业化模型:针对特定领域(代码、数学、医疗)的专用模型

参考资料


作者注:本文基于 2026 年 4 月的 API 版本编写,LLM 领域发展迅速,建议查阅最新官方文档获取最新信息。

相关阅读
- 《LLM 应用开发最佳实践》
- 《构建生产级 RAG 系统》
- 《AI Agent 架构设计指南》

posted @ 2026-04-07 08:17  iTech  阅读(10)  评论(0)    收藏  举报