langfuse
langfuse
https://langfuse.com/cn
什么是 Langfuse?

Langfuse 是一个开源的可观测性和分析平台,专为由大型语言模型(LLM)驱动的应用而设计。我们的使命是帮助开发人员和组织构建并改进 LLM 应用程序。为此,我们通过先进的跟踪和分析模块深入了解模型的成本、质量和延迟。
在我们的公开演示中查看示例跟踪
为什么选择 Langfuse?
Langfuse 是市场上最受欢迎的开源 LLMOps 工具,拥有一个规模庞大的社区,负责构建和维护与最新框架的集成。
Langfuse 易于自助托管,可在几分钟内完成设置。这对受监管行业的企业客户尤其有吸引力。
Langfuse 可提供一流的跟踪服务,帮助您开发和改进产品。
注:点击此处了解更多有关我们为何建造 Langfuse 的信息。
主要功能
Langfuse 提供一系列功能,可在人工智能产品的整个周期中为您提供支持:从开发和测试到生产中的大规模监控和调试。
监测
分析
- 评估:通过设置 llm-as-a-judge 评估或人工标注工作流程,比较不同模型、提示和配置的性能。
- 测试:试验不同版本(A/B)的应用程序,通过测试和提示管理确定最有效的解决方案
- 用户行为:了解用户与人工智能应用程序的交互方式。
调试
- 详细的调试日志:访问所有应用程序活动的综合日志,以排除故障。
- 错误跟踪:检测和跟踪应用程序中的错误和异常。
集成
- 框架支持:与 DeepSeek, BytePlus, LangChain、LlamaIndex 和 AWS Bedrock 等流行的 LLM 框架集成。
- 工具支持:与 Dify 或 LobeChat 等无代码构建工具集成。
- 应用程序接口(API):利用我们开放且功能强大的应用程序接口进行自定义集成和工作流程自动化。
https://langfuse.com/docs
Langfuse Overview
Langfuse is an open-source LLM engineering platform (GitHub) that helps teams collaboratively debug, analyze, and iterate on their LLM applications. All platform features are natively integrated to accelerate the development workflow. Langfuse is open, self-hostable, and extensible (why langfuse?).
Observability
- Log traces
- Lowest level transparency
- Understand cost and latency
Prompts
- Version control and deploy
- Collaborate on prompts
- Test prompts and models
Evaluation
- Measure output quality
- Monitor production health
- Test changes in development
Platform
- API-first architecture
- Data exports to blob storage
- Enterprise security and administration
https://langfuse.com/integrations/frameworks/langchain
LangChain Tracing & LangGraph Integration
Langfuse integrates with LangChain using LangChain Callbacks — the standard mechanism for hooking into the execution of LangChain components. The Langfuse CallbackHandler automatically captures detailed traces of your LangChain executions, LLMs, tools, and retrievers to evaluate and debug your application.
What is LangChain? LangChain is an open-source framework that helps developers build applications powered by large language models (LLMs) by providing tools to connect models with external data, APIs, and logic.
What is LangGraph? LangGraph is a framework built on top of LangChain that makes it easier to design and run stateful, multi-step AI agents using a graph-based architecture.
What is Langfuse? Langfuse is a platform for observability and tracing of LLM applications. It captures everything happening during an LLM interaction: inputs, outputs, tool usage, retries, latencies and costs and allows you to evaluate and debug your application.
https://langfuse.com/docs/evaluation/experiments/experiments-via-sdk
Experiments via SDK
Experiments via SDK are used to programmatically loop your applications or prompts through a dataset and optionally apply Evaluation Methods to the results. You can use a dataset hosted on Langfuse or a local dataset as the foundation for your experiment.
See also the JS/TS SDK reference and the Python SDK reference for more details on running experiments via the SDK.
Why use Experiments via SDK?
- Full flexibility to use your own application logic
- Use custom scoring functions to evaluate the outputs of a single item and the full run
- Run multiple experiments on the same dataset in parallel
- Easy to integrate with your existing evaluation infrastructure
Experiment runner SDK
Both the Python and JS/TS SDKs provide a high-level abstraction for running an experiment on a dataset. The dataset can be both local or hosted on Langfuse. Using the Experiment runner is the recommended way to run an experiment on a dataset with our SDK.
The experiment runner automatically handles:
- Concurrent execution of tasks with configurable limits
- Automatic tracing of all executions for observability
- Flexible evaluation with both item-level and run-level evaluators
- Error isolation so individual failures don’t stop the experiment
- Dataset integration for easy comparison and tracking
The experiment runner SDK supports both datasets hosted on Langfuse and datasets hosted locally. If you are using a dataset hosted on Langfuse for your experiment, the SDK will automatically create a dataset run for you that you can inspect and compare in the Langfuse UI. For locally hosted datasets not on Langfuse, only traces and scores (if evaluations are used) are tracked in Langfuse.
from langfuse import get_client from langfuse.openai import OpenAI # Initialize client langfuse = get_client() # Define your task function def my_task(*, item, **kwargs): question = item["input"] response = OpenAI().chat.completions.create( model="gpt-4.1", messages=[{"role": "user", "content": question}] ) return response.choices[0].message.content # Run experiment on local data local_data = [ {"input": "What is the capital of France?", "expected_output": "Paris"}, {"input": "What is the capital of Germany?", "expected_output": "Berlin"}, ] result = langfuse.run_experiment( name="Geography Quiz", description="Testing basic functionality", data=local_data, task=my_task, ) # Use format method to display results print(result.format())
https://www.cnblogs.com/xiao987334176/p/19039257
分析仪表盘
点击home,可以看到总体的一些信息。
它展示很多有用的分析与统计指标,包括Trace的统计、模型成本分析、评分统计、不同类型环节的响应延迟等,非常适合用来帮助应用优化


浙公网安备 33010602011771号