关于Karpathy对LLM-wiki想法文件的原文及翻译
LLM Wiki
一种利用大语言模型构建个人知识库的模式。
这是一个想法文件,设计用于复制粘贴到你自己的大语言模型代理(例如 OpenAI Codex、Claude Code、OpenCode/Pi 等)。它的目标是传达高层想法,但你的代理会与你协作构建具体细节。
核心理念
大多数人使用 LLM 处理文档的方式类似于 RAG:你上传一批文件,LLM 在查询时检索相关片段,然后生成答案。这种方法可行,但 LLM 在每个问题上都要从零开始重新发现知识。没有积累。问一个需要综合五个文档的微妙问题,LLM 每次都必须找到并拼凑相关片段。没有任何东西被构建起来。NotebookLM、ChatGPT 文件上传以及大多数 RAG 系统都是这样工作的。
这里的理念不同。LLM 不是仅在查询时从原始文档中检索,而是增量式地构建并维护一个持久的 wiki —— 一个结构化的、相互链接的 markdown 文件集合,位于你与原始来源之间。当你添加新的来源时,LLM 不只是将其索引以便日后检索。它会读取内容,提取关键信息,并将其整合到现有的 wiki 中 —— 更新实体页面、修订主题摘要、记录新数据与旧声明的矛盾之处、加强或挑战不断演变的综合认识。知识被编译一次,然后保持最新,而不是每次查询都重新推导。
这是关键区别:wiki 是一个持久的、不断累积的产物。交叉引用已经存在。矛盾已被标记。综合已经反映了你阅读过的所有内容。每添加一个来源、每提出一个问题,wiki 都会变得更加丰富。
你很少(或从不)自己编写 wiki —— LLM 编写并维护全部内容。你负责提供来源、探索和提出正确的问题。LLM 承担所有繁琐工作 —— 总结、交叉引用、归档和记账,这些工作让知识库随着时间推移真正有用。实践中,我让 LLM 代理在一侧打开,Obsidian 在另一侧打开。LLM 根据我们的对话进行编辑,我实时浏览结果 —— 跟踪链接、查看图谱视图、阅读更新的页面。Obsidian 是 IDE;LLM 是程序员;wiki 是代码库。
这可以应用于许多不同的场景。举几个例子:
- 个人:跟踪你自己的目标、健康、心理、自我提升 —— 归档日记条目、文章、播客笔记,并随着时间的推移构建一个关于你自己的结构化图景。
- 研究:在数周或数月内深入研究某个主题 —— 阅读论文、文章、报告,并逐步构建一个包含不断演变的论点的综合性 wiki。
- 阅读一本书:每读完一章就归档,为人物、主题、情节线索及其联系建立页面。到最后你就拥有一个丰富的伴读 wiki。想想粉丝 wiki,比如 Tolkien Gateway —— 由志愿者社区多年构建的数千个相互链接的页面,涵盖人物、地点、事件、语言。你可以在阅读过程中像这样个人化地构建,由 LLM 完成所有交叉引用和维护工作。
- 商业/团队:由 LLM 维护的内部 wiki,输入来源包括 Slack 讨论串、会议记录、项目文档、客户电话。可能有人类参与审核更新。wiki 保持最新,因为 LLM 做了团队中没人想做的维护工作。
- 竞争分析、尽职调查、旅行规划、课程笔记、爱好深挖 —— 任何你随时间积累知识并希望组织起来而不是散乱分布的事情。
架构
有三层:
原始来源 —— 你精心挑选的源文档集合。文章、论文、图片、数据文件。这些是不可变的 —— LLM 从中读取但从不修改。这是你的事实来源。
wiki —— 一个目录,包含 LLM 生成的 markdown 文件。摘要、实体页面、概念页面、对比、概览、综合。LLM 完全拥有这一层。它创建页面,在新来源到达时更新它们,维护交叉引用,并保持一切一致。你阅读;LLM 编写。
模式 —— 一个文档(例如 Claude Code 的 CLAUDE.md 或 Codex 的 AGENTS.md),告诉 LLM wiki 的结构、约定是什么,以及在接收来源、回答问题或维护 wiki 时应遵循的工作流程。这是关键的配置文件 —— 它使 LLM 成为一个纪律严明的 wiki 维护者,而不是一个通用聊天机器人。你和 LLM 随着时间共同演进这个模式,以适应你的领域。
操作
接收(Ingest)。你将一个新来源放入原始集合,并告诉 LLM 处理它。一个示例流程:LLM 读取来源,与你讨论关键要点,在 wiki 中编写一个摘要页面,更新索引,更新 wiki 中相关的实体和概念页面,并在日志中追加一条条目。一个来源可能触及 10-15 个 wiki 页面。我个人倾向于逐个接收来源并保持参与 —— 我阅读摘要,检查更新,并指导 LLM 强调什么。但你也可以批量接收多个来源,减少监督。你可以自行开发适合你风格的工作流程,并将其记录在模式中供未来会话使用。
查询(Query)。你针对 wiki 提问。LLM 搜索相关页面,阅读它们,并综合出一个带有引用的答案。答案可以根据问题采取不同形式 —— markdown 页面、对比表、幻灯片(Marp)、图表(matplotlib)、画布。重要的见解:好的答案可以归档回 wiki 成为新页面。你要求做的对比、分析、发现的联系 —— 这些是有价值的,不应消失在聊天历史中。这样,你的探索就像接收的来源一样,在知识库中不断累积。
检查(Lint)。定期让 LLM 对 wiki 进行健康检查。检查:页面之间的矛盾、被新来源取代的过时声明、没有入链的孤立页面、提到但缺少独立页面的重要概念、缺失的交叉引用、可以通过网络搜索填补的数据空白。LLM 擅长建议要调查的新问题和要寻找的新来源。这能在 wiki 增长时保持其健康。
索引与日志
两个特殊文件帮助 LLM(和你)在 wiki 增长时进行导航。它们有不同的目的:
index.md 是面向内容的。它是 wiki 中所有内容的目录 —— 每个页面都列有链接、一行摘要,以及可选的元数据(如日期或来源数量)。按类别组织(实体、概念、来源等)。LLM 在每次接收时更新它。当回答查询时,LLM 首先读取索引以找到相关页面,然后深入阅读。这在中等规模(约 100 个来源,数百个页面)下效果出奇地好,避免了基于嵌入的 RAG 基础设施的需要。
log.md 是按时间顺序的。它是事件和时间的仅追加记录 —— 接收、查询、检查。一个有用的技巧:如果每个条目以一致的前缀开头(例如 ## [2026-04-02] ingest | 文章标题),日志就可以用简单的 Unix 工具解析 —— grep "^## \[" log.md | tail -5 可以给你最后 5 条条目。日志提供了 wiki 演变的时间线,帮助 LLM 理解最近完成了什么。
可选:CLI 工具
在某个时候,你可能希望构建一些小工具,帮助 LLM 更高效地操作 wiki。搜索 wiki 页面的搜索引擎是最明显的 —— 在小型规模下索引文件就足够了,但随着 wiki 增长,你需要真正的搜索。qmd 是一个不错的选择:它是一个针对 markdown 文件的本地搜索引擎,具有混合 BM25/向量搜索和 LLM 重排序功能,全部在本地运行。它既有 CLI(LLM 可以通过 shell 调用),也有 MCP 服务器(LLM 可以将其作为原生工具使用)。你也可以自己构建更简单的工具 —— LLM 可以帮助你随需应变地编写一个简单的搜索脚本。
技巧与建议
- Obsidian Web Clipper 是一个浏览器扩展,可将网页文章转换为 markdown。对于快速将来源纳入原始集合非常有用。
- 本地下载图片。在 Obsidian 设置 → 文件和链接中,将“附件文件夹路径”设置为固定目录(例如
raw/assets/)。然后在设置 → 快捷键中,搜索“Download”找到“下载当前文件的附件”并绑定快捷键(例如Ctrl+Shift+D)。剪裁文章后,按下快捷键,所有图片都会被下载到本地磁盘。这是可选的,但很有用 —— 它让 LLM 可以直接查看和引用图片,而不是依赖可能失效的 URL。请注意,LLM 无法一次性原生读取带有内嵌图片的 markdown —— 解决方法是让 LLM 先阅读文本,然后单独查看部分或全部引用的图片以获取额外上下文。这有点笨拙,但效果不错。 - Obsidian 的图谱视图 是查看 wiki 形态的最佳方式 —— 什么与什么相连,哪些页面是枢纽,哪些是孤岛。
- Marp 是一种基于 markdown 的幻灯片格式。Obsidian 有对应的插件。适用于直接从 wiki 内容生成演示文稿。
- Dataview 是一个 Obsidian 插件,可以对页面 frontmatter 运行查询。如果你的 LLM 向 wiki 页面添加 YAML frontmatter(标签、日期、来源数量),Dataview 可以生成动态表格和列表。
- wiki 就是一个 markdown 文件的 git 仓库。你可以免费获得版本历史、分支和协作功能。
为什么这有效
维护知识库的繁琐部分不是阅读或思考 —— 而是记账。更新交叉引用、保持摘要最新、记录新数据与旧声明的矛盾、在数十个页面之间保持一致。人类放弃 wiki 是因为维护负担的增长快于价值。LLM 不会感到无聊,不会忘记更新交叉引用,并且可以一次性触及 15 个文件。wiki 保持维护状态,因为维护成本接近于零。
人类的工作是策划来源、指导分析、提出好的问题以及思考这一切意味着什么。LLM 的工作是其他所有事情。
这个理念在精神上与 Vannevar Bush 的 Memex(1945)有关 —— 一个个人化的、精心策划的知识存储,带有文档之间的关联路径。Bush 的愿景更接近这个,而不是后来网络的样子:私密、主动策划,文档之间的联系与文档本身一样有价值。他无法解决的部分是谁来做维护工作。LLM 处理了这一点。
注意
本文档有意保持抽象。它描述的是理念,而不是具体的实现。具体的目录结构、模式约定、页面格式、工具 —— 所有这些都取决于你的领域、你的偏好以及你选择的 LLM。上述所有内容都是可选的、模块化的 —— 选取有用的,忽略无用的。例如:你的来源可能只有文本,所以完全不需要图像处理。你的 wiki 可能足够小,索引文件就足够了,不需要搜索引擎。你可能不关心幻灯片,只想要 markdown 页面。你可能想要一套完全不同的输出格式。正确使用本文档的方式是,与你的 LLM 代理分享它,并一起努力实例化一个适合你需求的版本。本文档的唯一工作是传达模式。你的 LLM 可以解决其余部分。
LLM Wiki
A pattern for building personal knowledge bases using LLMs.
This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.
The core idea
Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.
The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.
This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.
You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.
This can apply to a lot of different contexts. A few examples:
- Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
- Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
- Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
- Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
- Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.
Architecture
There are three layers:
Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.
The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.
The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.
Operations
Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.
Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.
Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.
Indexing and logging
Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:
index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.
log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.
Optional: CLI tools
At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.
Tips and tricks
- Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
- Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g.
raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough. - Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
- Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
- Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
- The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.
Why this works
The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.
The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.
The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.
Note
This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.
原文链接:https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
纯文本版本:https://gist.githubusercontent.com/karpathy/442a6bf555914893e9891c11519de94f/raw/llm-wiki.md
浙公网安备 33010602011771号