Stay Hungry,Stay Foolish!

langfuse

langfuse

https://langfuse.com/cn

什么是 Langfuse?

Langfuse Trace Chinese

Langfuse 是一个开源的可观测性和分析平台,专为由大型语言模型(LLM)驱动的应用而设计。我们的使命是帮助开发人员和组织构建并改进 LLM 应用程序。为此,我们通过先进的跟踪和分析模块深入了解模型的成本、质量和延迟。

在我们的公开演示中查看示例跟踪

为什么选择 Langfuse?

Langfuse 是市场上最受欢迎的开源 LLMOps 工具,拥有一个规模庞大的社区,负责构建和维护与最新框架的集成。

Langfuse 易于自助托管,可在几分钟内完成设置。这对受监管行业的企业客户尤其有吸引力。

Langfuse 可提供一流的跟踪服务,帮助您开发和改进产品。

注:点击此处了解更多有关我们为何建造 Langfuse 的信息。

主要功能

Langfuse 提供一系列功能,可在人工智能产品的整个周期中为您提供支持:从开发和测试到生产中的大规模监控和调试。

监测

  • 跟踪:捕捉产品的完整上下文,包括外部 API 或工具调用、上下文、提示等。
  • 实时指标:监控关键性能指标,如响应时间、错误率和吞吐量。
  • 反馈:收集用户反馈,以改进应用程序的性能和用户体验。

分析

  • 评估:通过设置 llm-as-a-judge 评估或人工标注工作流程,比较不同模型、提示和配置的性能。
  • 测试:试验不同版本(A/B)的应用程序,通过测试提示管理确定最有效的解决方案
  • 用户行为:了解用户与人工智能应用程序的交互方式。

调试

  • 详细的调试日志:访问所有应用程序活动的综合日志,以排除故障。
  • 错误跟踪:检测和跟踪应用程序中的错误和异常。

集成

 

https://langfuse.com/docs

Langfuse Overview

Langfuse is an open-source LLM engineering platform (GitHub) that helps teams collaboratively debug, analyze, and iterate on their LLM applications. All platform features are natively integrated to accelerate the development workflow. Langfuse is open, self-hostable, and extensible (why langfuse?).

Observability

  • Log traces
  • Lowest level transparency
  • Understand cost and latency

Prompts

  • Version control and deploy
  • Collaborate on prompts
  • Test prompts and models

Evaluation

  • Measure output quality
  • Monitor production health
  • Test changes in development

Platform

  • API-first architecture
  • Data exports to blob storage
  • Enterprise security and administration

 

https://langfuse.com/integrations/frameworks/langchain

LangChain Tracing & LangGraph Integration

Langfuse integrates with LangChain using LangChain Callbacks — the standard mechanism for hooking into the execution of LangChain components. The Langfuse CallbackHandler automatically captures detailed traces of your LangChain executions, LLMs, tools, and retrievers to evaluate and debug your application.

What is LangChain? LangChain is an open-source framework that helps developers build applications powered by large language models (LLMs) by providing tools to connect models with external data, APIs, and logic.

What is LangGraph? LangGraph is a framework built on top of LangChain that makes it easier to design and run stateful, multi-step AI agents using a graph-based architecture.

What is Langfuse? Langfuse is a platform for observability and tracing of LLM applications. It captures everything happening during an LLM interaction: inputs, outputs, tool usage, retries, latencies and costs and allows you to evaluate and debug your application.

 

https://langfuse.com/docs/evaluation/experiments/experiments-via-sdk

Experiments via SDK

Experiments via SDK are used to programmatically loop your applications or prompts through a dataset and optionally apply Evaluation Methods to the results. You can use a dataset hosted on Langfuse or a local dataset as the foundation for your experiment.

See also the JS/TS SDK reference and the Python SDK reference for more details on running experiments via the SDK.

Why use Experiments via SDK?

  • Full flexibility to use your own application logic
  • Use custom scoring functions to evaluate the outputs of a single item and the full run
  • Run multiple experiments on the same dataset in parallel
  • Easy to integrate with your existing evaluation infrastructure

Experiment runner SDK

Both the Python and JS/TS SDKs provide a high-level abstraction for running an experiment on a dataset. The dataset can be both local or hosted on Langfuse. Using the Experiment runner is the recommended way to run an experiment on a dataset with our SDK.

The experiment runner automatically handles:

  • Concurrent execution of tasks with configurable limits
  • Automatic tracing of all executions for observability
  • Flexible evaluation with both item-level and run-level evaluators
  • Error isolation so individual failures don’t stop the experiment
  • Dataset integration for easy comparison and tracking

The experiment runner SDK supports both datasets hosted on Langfuse and datasets hosted locally. If you are using a dataset hosted on Langfuse for your experiment, the SDK will automatically create a dataset run for you that you can inspect and compare in the Langfuse UI. For locally hosted datasets not on Langfuse, only traces and scores (if evaluations are used) are tracked in Langfuse.

 

from langfuse import get_client
from langfuse.openai import OpenAI
 
# Initialize client
langfuse = get_client()
 
# Define your task function
def my_task(*, item, **kwargs):
    question = item["input"]
    response = OpenAI().chat.completions.create(
        model="gpt-4.1", messages=[{"role": "user", "content": question}]
    )
 
    return response.choices[0].message.content
 
 
# Run experiment on local data
local_data = [
    {"input": "What is the capital of France?", "expected_output": "Paris"},
    {"input": "What is the capital of Germany?", "expected_output": "Berlin"},
]
 
result = langfuse.run_experiment(
    name="Geography Quiz",
    description="Testing basic functionality",
    data=local_data,
    task=my_task,
)
 
# Use format method to display results
print(result.format())

https://www.cnblogs.com/xiao987334176/p/19039257

分析仪表盘

点击home,可以看到总体的一些信息。

它展示很多有用的分析与统计指标,包括Trace的统计、模型成本分析、评分统计、不同类型环节的响应延迟等,非常适合用来帮助应用优化

image

 

posted @ 2026-02-23 23:08  lightsong  阅读(92)  评论(0)    收藏  举报
千山鸟飞绝,万径人踪灭