llms.txt 文件

llms.txt

定义
llms.txt文件是Jeremy Howard提出的标准化markdown文件，用于提供信息以帮助LLM在推理时使用网站。与为人类读者设计的传统网络内容不同，llms.txt文件提供了LLM可以快速摄取的简洁、结构化的信息。这对于增强开发环境、为编程库提供文档以及为企业网站、教育机构和个人投资组合等各个领域提供结构化概述特别有用。
llms.txt文件位于网站的根路径/llms.txt，包含按特定顺序排列的部分，包括项目名称、摘要、详细信息以及带有URL的文件列表以获取更多详细信息。这种格式允许LLM高效地访问和处理有关网站的最重要信息。

示例: https://www.brilliantearth.com/llms.txt

# Brilliant Earth # 网站名称

> Brilliant Earth is the global leader in ethically sourced fine jewelry. Explore engagement rings, wedding rings, Beyond Conflict Free Diamonds®, jewelry, and more. # 网站介绍

## Product List Pages #网站模块

#标题    #链接    #链接内容描述
[Necklaces](https://www.brilliantearth.com/jewelry/necklace/shop-all/): From pearl necklaces and name necklaces to diamond and gold necklace styles, our assortment of sustainable necklaces offers both popular and classic styles.

[Jewelry](https://www.brilliantearth.com/jewelry/shop-all/): From trending styles to classic everyday staples, browse our collection of jewelry. Featuring sustainable and ethical earrings, necklaces, bracelets, and more.

llms.txt 怎么被LLM(大模型)使用

LLMs.txt 是一个结构化的文本文件，通常包含网站的 URL、标题和描述信息。大模型（如 ChatGPT、GPT-4 等）可以通过以下方式使用它来获取网站信息：

1. 直接读取文件内容

大模型可以解析 llms.txt 的格式（如 Markdown 或纯文本），提取关键信息：
```
markdownApply# Brilliant Earth
> Brilliant Earth is the global leader in ethically sourced fine jewelry...

## Product List Pages
[Wedding Rings Buying Guide](https://uat.brilliantearth.com/wedding-rings/buying-guide/): Explore styles, trends, and budgets...
[Engagement Rings Buying Guide](https://uat.brilliantearth.com/engagement-rings/buying-guide/): Learn about diamonds, metals...
```
用途：快速了解网站结构和核心内容。

2. 作为知识库（RAG 应用）

在 检索增强生成（Retrieval-Augmented Generation, RAG） 中：

步骤：
1. 将 llms.txt 分割成片段（chunks）。
2. 向量化存储到数据库（如 FAISS、Pinecone）。
3. 用户提问时，先检索相关片段，再生成答案。
示例：

用户提问："Brilliant Earth 的婚戒购买指南有哪些内容？"

模型动作：

检索 llms.txt 中 ## Product List Pages 下的 URL 和描述。

生成答案："婚戒指南包括款式、雕刻、预算等信息，详见 [链接]..."

3. 自动化任务支持

SEO 分析：提取标题和描述，优化搜索引擎排名。

内容审核：检查描述是否包含敏感词或过期链接。

多语言翻译：批量翻译标题/描述以支持国际化。

4. 结合爬虫动态更新

流程：
1. 定期运行 generate_llms_txt.py 更新文件。
2. 大模型读取最新版本，确保信息时效性。
优势：避免人工维护，适合大规模网站。

5. 限制与注意事项

静态性：文件需定期更新，否则可能过时。

结构化要求：需保持一致的格式（如 [标题](URL): 描述）。

隐私合规：确保公开 URL，不包含敏感数据。

代码示例（Python + OpenAI）
```
pythonApplyfrom openai import OpenAI
import re

# 读取llms.txt
with open("llms.txt", "r") as f:
    content = f.read()

# 分割成条目
entries = re.findall(r'\[(.*?)\]\((.*?)\): (.*?)\n', content)

# 检索增强生成
client = OpenAI(api_key="your-api-key")
def ask_question(question):
    # 1. 检索相关条目
    relevant_entries = [e for e in entries if question.lower() in e[0].lower() or question.lower() in e[2].lower()]
    
    # 2. 调用大模型生成答案
    context = "\n".join([f"{title}: {desc} ({url})" for title, url, desc in relevant_entries])
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": f"基于以下信息回答：\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# 示例使用
print(ask_question("Brilliant Earth 的婚戒指南有哪些内容？"))
```
总结

llms.txt 通过提供结构化的网站元数据，帮助大模型：
1. 理解网站内容：无需实时爬取。
2. 精准回答用户问题：结合 RAG 技术。
3. 自动化运维：支持 SEO、多语言等场景。
根据需求选择合适的使用方式！

posted @ 2025-06-15 11:10 最大的敌人是自律阅读(143) 评论(0) 收藏举报

刷新页面返回顶部

加载中...

最大的敌人是自律

llms.txt 文件

llms.txt

公告