nomic-embed-text
nomic-embed-text 是一个用于生成高质量文本嵌入(embeddings)的工具或模型
将文本转换为固定长度的向量表示,这些向量可以用于语义搜索、文本分类、聚类等任务
使用本地 ollama 部署的 nomic-embed-text
import { OllamaEmbeddings } from "@langchain/ollama";
const embeddings = new OllamaEmbeddings({
model: "nomic-embed-text:latest",
baseUrl: "http://192.168.0.220:11434", // Default value
requestOptions: {
useMMap: true,
numThread: 6,
numGpu: 1,
},
});
const documents = ["Hello!", "abc"];
const embeddings = await embeddings.embedDocuments(documents);
console.log(embeddings);
对本地的文本文件进行embeddings 操作
1.文档加载
import { TextLoader } from "langchain/document_loaders/fs/text";
async function load(path) {
const loader = new TextLoader(path);
const docs = await loader.load();
return docs;
}
2.对文本进行分片
import { CharacterTextSplitter } from "langchain/text_splitter";async function split(documents) {
const splitter = new CharacterTextSplitter({
chunkSize: 500,
chunkOverlap: 20,
});
return splitter.splitDocuments(documents);
}
3.对文本块进行embeddings
import { OllamaEmbeddings } from "@langchain/ollama"; const embeddings = new OllamaEmbeddings({ model: "nomic-embed-text:latest", baseUrl: "http://192.168.0.220:11434", // Default value requestOptions: { useMMap: true, numThread: 6, numGpu: 1, }, }); const docs = await load("说明.txt") const splittedDocs = await split(docs); for (let doc of splittedDocs) { const embedding = await embeddings.embedDocuments(doc.pageContent) console.dir(embedding); }


浙公网安备 33010602011771号