OpenAI vs Anthropic vs Google:三大 LLM API 深度对比与统一兼容方案
OpenAI vs Anthropic vs Google:三大 LLM API 深度对比与统一兼容方案
在构建 AI 应用时,开发者面临一个现实问题:三大主流 LLM 提供商(OpenAI、Anthropic、Google)的 API 各不相同,如何选择?如何统一调用?如何设计架构以支持灵活切换?
本文将深入对比三大 API 的差异,并提供生产级的统一兼容方案。
一、API 设计哲学对比
1.1 OpenAI:行业标准制定者
OpenAI API 是事实上的行业标准,其设计简洁直观:
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
temperature=0.7,
max_tokens=150
)
print(response.choices[0].message.content)
# 输出: The capital of France is Paris.
关键特征:
- 系统提示词作为第一条消息(role: "system")
- 响应结构:response.choices[0].message.content
- 支持 Function Calling 和 JSON Mode
- 流式响应使用 Server-Sent Events (SSE)
1.2 Anthropic:长上下文专家
Claude API 独立设置系统提示词,更适合复杂指令:
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=150,
system="You are a helpful assistant.",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
temperature=0.7
)
print(response.content[0].text)
# 输出: The capital of France is Paris.
关键差异:
- 独立的 system 参数(不在 messages 数组中)
- 响应结构:response.content[0].text(数组格式,为多模态准备)
- 支持 200K token 上下文(业界最长)
- 使用 tool_use 实现 Function Calling
1.3 Google:多模态原生支持
Gemini API 采用完全不同的设计,原生支持多模态:
import google.generativeai as genai
genai.configure(api_key="...")
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content(
"What is the capital of France?",
generation_config={
"temperature": 0.7,
"max_output_tokens": 150
}
)
print(response.text)
# 输出: The capital of France is Paris.
关键特征:
- 不使用 messages 数组,直接传入字符串或 parts
- 响应结构:response.text
- 原生支持视频、音频等多模态输入
- Gemini 1.5 Pro 支持 1M token 上下文
二、核心功能对比
2.1 系统提示词处理
| 提供商 | 系统提示词位置 | 更新灵活性 |
|---|---|---|
| OpenAI | messages[0] |
❌ 流式模式下无法更新 |
| Anthropic | 独立 system 参数 |
✅ 可在流式响应中更新 |
system_instruction |
✅ 独立配置 |
OpenAI 的局限:
# OpenAI: 系统提示词必须在第一条消息
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
# ❌ 无法在流式响应中途修改系统提示词
Anthropic 的优势:
# Anthropic: 系统提示词独立,可在流式响应中更新
stream = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system="Initial system prompt",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
# ✅ 可以在流式响应中动态调整系统行为
2.2 Function Calling 实现
OpenAI 方式
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "What's the weather in SF?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}]
)
tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name) # "get_weather"
print(tool_call.function.arguments) # '{"location": "SF"}'
Anthropic 方式
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "What's the weather in SF?"}],
tools=[{
"name": "get_weather",
"description": "Get the current weather",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}]
]
tool_use = response.content[0]
print(tool_use.name) # "get_weather"
print(tool_use.input) # {'location': 'SF'}
关键差异:
- OpenAI 返回 JSON 字符串(需手动解析)
- Anthropic 直接返回 Python 字典
- Anthropic 使用 input_schema 替代 parameters
2.3 多模态支持
OpenAI Vision API
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image:"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}]
)
Anthropic Vision API
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image:"},
{"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "..."}}
]
}]
)
Google Gemini(原生多模态)
model = genai.GenerativeModel('gemini-pro-vision')
response = model.generate_content([
"Describe this image:",
{"mime_type": "image/jpeg", "data": "..."}
])
# ✅ Gemini 原生支持视频和音频
video_response = model.generate_content([
"Describe this video:",
{"mime_type": "video/mp4", "data": "..."}
])
对比:
- OpenAI 和 Anthropic 需要将图片转为 base64 或 URL
- Gemini 原生支持视频、音频,无需额外处理
- Gemini 的多模态能力最全面
2.4 流式响应
OpenAI 流式
stream = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Tell me a joke"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Anthropic 流式
stream = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Tell me a joke"}],
stream=True
)
for event in stream:
if event.type == "content_block_delta":
print(event.delta.text, end="")
差异:
- OpenAI 使用 delta.content 获取增量内容
- Anthropic 使用事件类型系统(content_block_delta)
- Anthropic 的事件类型更丰富(错误、元数据等)
三、计费与性能对比
| 指标 | OpenAI GPT-4 Turbo | Anthropic Claude 3.5 Sonnet | Google Gemini 1.5 Pro |
|---|---|---|---|
| 输入价格 | $10 / 1M tokens | $3 / 1M tokens | $3.50 / 1M tokens |
| 输出价格 | $30 / 1M tokens | $15 / 1M tokens | $10.50 / 1M tokens |
| 上下文窗口 | 128K tokens | 200K tokens | 1M tokens |
| 平均延迟 | ~2-3s | ~3-4s | ~4-5s |
| 可靠性 | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
成本优化建议:
- 长文本处理:优先选择 Claude 3.5 Sonnet(200K 上下文 + 低价格)
- 快速响应:选择 OpenAI GPT-4 Turbo(最低延迟)
- 多模态任务:选择 Gemini Pro(原生支持 + 1M 上下文)
四、统一兼容方案
4.1 方案一:LiteLLM(推荐生产环境)
LiteLLM 提供统一的接口,支持 100+ LLM 提供商,自动处理差异。
安装配置
pip install litellm
基础使用
from litellm import completion
# 设置 API keys
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."
os.environ["GOOGLE_API_KEY"] = "..."
# 统一接口调用
def ask_llm(provider: str, question: str) -> str:
response = completion(
model=provider, # "gpt-4", "claude-3-sonnet", "gemini-pro"
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": question}
],
temperature=0.7,
max_tokens=150
)
return response.choices[0].message.content
# 使用示例
print(ask_llm("gpt-4", "What is the capital of France?"))
print(ask_llm("claude-3-opus", "Explain quantum computing"))
print(ask_llm("gemini-pro", "Write a poem about AI"))
自动回退机制
from litellm import completion, fallbacks
# 配置回退链:OpenAI -> Anthropic -> Google
fallbacks = ["claude-3-opus", "gemini-pro"]
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}],
fallbacks=fallbacks # ✅ 自动切换到备用模型
)
# ✅ 当 GPT-4 限流时,自动切换到 Claude Opus
负载均衡
from litellm import completion
# 配置多个提供商的相同模型
models = ["gpt-4", "claude-3-opus", "gemini-pro"]
response = completion(
model=models, # ✅ 自动负载均衡
messages=[{"role": "user", "content": "Hello!"}]
)
流式响应统一
from litellm import completion
response = completion(
model="claude-3-opus",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
# ✅ LiteLLM 自动处理不同提供商的流式格式差异
生产级配置
from litellm import completion
import os
# 环境变量配置
os.environ["OPENAI_API_KEY"] = "sk-..."
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."
os.environ["GOOGLE_API_KEY"] = "..."
# 配置重试和超时
os.environ["LITELLM_RETRIES"] = "3"
os.environ["LITELLM_TIMEOUT"] = "30"
os.environ["LITELLM_FALLBACKS"] = "['claude-3-opus', 'gemini-pro']"
# 配置日志
os.environ["LITELLM_LOG"] = "INFO"
def robust_llm_call(prompt: str) -> str:
"""生产级 LLM 调用函数"""
try:
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
num_retries=3,
timeout=30
)
return response.choices[0].message.content
except Exception as e:
print(f"LLM call failed: {e}")
return None
# 使用示例
result = robust_llm_call("Explain async programming in Python")
4.2 方案二:OpenRouter(适合开源模型)
OpenRouter 提供统一接口访问 100+ 模型,包括最新的开源模型。
基础使用
import openai
# 使用 OpenRouter 的 base_url
client = openai.OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-..."
)
# 访问不同提供商的模型
def ask_model(model_name: str, question: str) -> str:
response = client.chat.completions.create(
model=model_name,
messages=[{"role": "user", "content": question}],
temperature=0.7
)
return response.choices[0].message.content
# 示例:使用不同的模型
print(ask_model("openai/gpt-4-turbo", "What is AI?"))
print(ask_model("anthropic/claude-3.5-sonnet", "Explain machine learning"))
print(ask_model("google/gemini-pro-1.5", "Write code in Python"))
print(ask_model("meta-llama/llama-3-70b", "Tell me a joke")) # 开源模型
优势
- ✅ 统一计费和配额
- ✅ 社区驱动的模型评分
- ✅ 支持微调模型
- ✅ 访问最新的开源模型(Llama 3, Mistral 等)
4.3 方案三:LangChain(适合复杂应用)
LangChain 提供更高层次的抽象,适合构建 RAG、Agent 等复杂应用。
统一接口
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
from langchain_google_genai import ChatGoogleGenerativeAI
# 统一初始化
def get_llm(provider: str):
if provider == "openai":
return ChatOpenAI(model="gpt-4-turbo", temperature=0.7)
elif provider == "anthropic":
return ChatAnthropic(model="claude-3-5-sonnet-20241022", temperature=0.7)
elif provider == "google":
return ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.7)
# 使用示例
llm = get_llm("anthropic")
response = llm.invoke("Explain quantum computing in simple terms")
print(response.content)
构建 RAG 应用
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA
# 创建向量数据库
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(
["Python is a programming language", "JavaScript is for web development"],
embeddings
)
# 创建 RAG 链
llm = get_llm("openai")
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever()
)
# 查询
result = qa_chain.run("What is Python used for?")
print(result)
构建 Agent
from langchain.agents import initialize_agent, Tool
from langchain.tools import Tool
# 定义工具
def search_wiki(query: str) -> str:
# 实现维基百科搜索
return f"Search results for: {query}"
tools = [
Tool(
name="Wikipedia",
func=search_wiki,
description="Search Wikipedia for information"
)
]
# 创建 Agent
llm = get_llm("anthropic")
agent = initialize_agent(
tools=tools,
llm=llm,
agent="zero-shot-react-description"
)
# 运行 Agent
result = agent.run("What is the capital of France?")
print(result)
五、架构设计建议
5.1 生产级架构图
MERMAID_BLOCK_0
5.2 配置管理
# config/llm_config.yaml
providers:
openai:
api_key_env: OPENAI_API_KEY
models:
primary: gpt-4-turbo
fallback: gpt-3.5-turbo
max_tokens: 4096
temperature: 0.7
anthropic:
api_key_env: ANTHROPIC_API_KEY
models:
primary: claude-3-5-sonnet-20241022
fallback: claude-3-haiku-20240307
max_tokens: 8192
temperature: 0.7
google:
api_key_env: GOOGLE_API_KEY
models:
primary: gemini-pro
fallback: gemini-1.5-flash
max_tokens: 2048
temperature: 0.7
fallback_order:
- openai
- anthropic
- google
monitoring:
log_level: INFO
track_costs: true
max_retries: 3
timeout: 30
5.3 统一封装类
# llm_manager.py
import yaml
from typing import Optional, List
from litellm import completion
import os
class LLMManager:
"""统一 LLM 管理器"""
def __init__(self, config_path: str = "config/llm_config.yaml"):
with open(config_path, 'r') as f:
self.config = yaml.safe_load(f)
self.current_provider = self.config['fallback_order'][0]
def ask(
self,
prompt: str,
provider: Optional[str] = None,
system_prompt: Optional[str] = None,
temperature: Optional[float] = None,
max_tokens: Optional[int] = None
) -> str:
"""统一 LLM 调用接口"""
provider = provider or self.current_provider
provider_config = self.config['providers'][provider]
# 构建消息
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
# 获取参数
model = provider_config['models']['primary']
temperature = temperature or provider_config.get('temperature', 0.7)
max_tokens = max_tokens or provider_config.get('max_tokens', 4096)
try:
response = completion(
model=model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
fallbacks=self.config['fallback_order']
)
return response.choices[0].message.content
except Exception as e:
print(f"Error with {provider}: {e}")
# 尝试下一个提供商
next_provider = self._get_next_provider(provider)
if next_provider:
return self.ask(prompt, next_provider, system_prompt, temperature, max_tokens)
raise
def _get_next_provider(self, current_provider: str) -> Optional[str]:
"""获取下一个备用提供商"""
idx = self.config['fallback_order'].index(current_provider)
if idx + 1 < len(self.config['fallback_order']):
return self.config['fallback_order'][idx + 1]
return None
def stream(self, prompt: str, provider: Optional[str] = None):
"""流式响应"""
provider = provider or self.current_provider
provider_config = self.config['providers'][provider]
model = provider_config['models']['primary']
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
# 使用示例
llm_manager = LLMManager()
# 基础调用
result = llm_manager.ask("Explain async programming in Python")
print(result)
# 流式调用
for chunk in llm_manager.stream("Tell me a story about AI"):
print(chunk, end="")
# 指定提供商
result = llm_manager.ask("What is the capital of France?", provider="anthropic")
print(result)
六、最佳实践建议
6.1 错误处理
from tenacity import retry, stop_after_attempt, wait_exponential
import litellm
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def robust_llm_call(prompt: str) -> str:
"""带重试的 LLM 调用"""
try:
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
timeout=30
)
return response.choices[0].message.content
except litellm.exceptions.RateLimitError:
print("Rate limited, waiting...")
raise
except litellm.exceptions.APIError as e:
print(f"API error: {e}")
raise
except Exception as e:
print(f"Unexpected error: {e}")
raise
6.2 成本监控
import litellm
# 启用成本跟踪
litellm.set_verbose = True
def track_costs(prompt: str) -> dict:
"""跟踪调用成本"""
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
# 获取成本信息
cost = litellm.completion_cost(completion_response=response)
tokens = response.usage
return {
"cost": cost,
"prompt_tokens": tokens.prompt_tokens,
"completion_tokens": tokens.completion_tokens,
"total_tokens": tokens.total_tokens
}
# 使用示例
result = track_costs("Explain quantum computing")
print(f"Cost: ${result['cost']:.4f}")
print(f"Tokens: {result['total_tokens']}")
6.3 性能优化
import asyncio
from litellm import acompletion # 异步接口
async def batch_ask(prompts: List[str]) -> List[str]:
"""批量异步调用"""
tasks = [
acompletion(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
for prompt in prompts
]
responses = await asyncio.gather(*tasks)
return [r.choices[0].message.content for r in responses]
# 使用示例
prompts = [
"What is Python?",
"What is JavaScript?",
"What is Rust?"
]
results = asyncio.run(batch_ask(prompts))
for question, answer in zip(prompts, results):
print(f"Q: {question}")
print(f"A: {answer}\n")
七、总结
7.1 三大 API 对比总结
| 特性 | OpenAI | Anthropic | |
|---|---|---|---|
| 易用性 | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 长上下文 | 128K | 200K | 1M |
| 多模态 | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 成本效益 | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 生态成熟度 | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
7.2 方案选择建议
| 场景 | 推荐方案 | 理由 |
|---|---|---|
| 生产环境 | LiteLLM | 统一接口、自动回退、生产级可靠性 |
| 快速原型 | LangChain | 高层抽象、丰富的预制组件 |
| 成本优化 | OpenRouter | 访问开源模型、统一计费 |
| 企业级 | Azure OpenAI / Bedrock | 合规性、SLA 保障 |
| 本地部署 | Ollama / vLLM | 数据隐私、零 API 成本 |
7.3 架构建议
- 使用抽象层:始终使用 LiteLLM 或 LangChain,避免直接调用原生 API
- 配置多提供商回退:OpenAI → Anthropic → Google
- 监控成本和性能:不同模型的成本差异可达 10 倍
- 异步批量调用:使用异步接口提升吞吐量
- 缓存常见查询:减少重复调用的成本
7.4 未来展望
随着 LLM 市场的快速发展,我们预期:
- API 标准化:三大厂商可能会逐步统一接口设计
- 成本下降:竞争加剧将推动价格下降
- 性能提升:上下文窗口、响应速度持续改进
- 专业化模型:针对特定领域(代码、数学、医疗)的专用模型
参考资料
- OpenAI API Documentation
- Anthropic Claude API Documentation
- Google Gemini API Documentation
- LiteLLM GitHub
- LangChain Documentation
- OpenRouter
作者注:本文基于 2026 年 4 月的 API 版本编写,LLM 领域发展迅速,建议查阅最新官方文档获取最新信息。
相关阅读:
- 《LLM 应用开发最佳实践》
- 《构建生产级 RAG 系统》
- 《AI Agent 架构设计指南》

浙公网安备 33010602011771号