use-case-airflow-llm-rag-finance

https://github.com/fanqingsong/use-case-airflow-llm-rag-finance/tree/main

airflow更新文档到向量数据库，应用使用向量数据库。

LLMOps: Automatic retrieval-augmented generation with Airflow, GPT-4 and Weaviate

This repository contains the DAG code used in the LLMOps: Automatic retrieval-augmented generation with Airflow, GPT-4 and Weaviate use case. The pipeline was modelled after the Ask Astro reference architecture.

The DAGs in this repository use the following tools:

Weaviate Airflow provider

Streamlit

Weaviate

FinBERT

OpenAI GPT-4

Weaviate中的查询

https://www.studywithgpt.com/zh-cn/tutorial/pvb8jo

https://docs.weaviate.io/weaviate/search/similarity

https://docs.weaviate.io/weaviate/configuration

https://jishuzhan.net/article/1953440053019062274#google_vignette

理解 Weaviate 中的查询

在这一节中，将介绍 Weaviate 中的各种查询类型，重点关注 nearText 和 nearVector 搜索。用户将在此过程中学习如何使用 GraphQL 来检索对象，应用过滤器，并利用聚合函数来获取有关元数据的深入信息。

1. Weaviate 查询概述

Weaviate 是一个开源的图数据库，支持通过 GraphQL 进行灵活的查询。它允许用户进行语义搜索（如 nearText）和字面搜索（如 nearVector），并结合使用这些功能以更精确地检索数据对象。在 Weaviate 中，用户可以通过简单的 API 调用来执行复杂的查询。

2. 使用 `Get` 方法检索对象

要从 Weaviate 中检索数据对象，用户可以使用 Get 方法。以下是使用 Python 的示例代码：

import weaviate import json client = weaviate.Client("https://WEAVIATE_INSTANCE_URL/") # 将此处替换为你的 Weaviate 端点 some_objects = client.data_object.get() print(json.dumps(some_objects))

这段代码请求 Weaviate 返回一些 Question 类的对象。

3. 使用 `nearText` 进行向量搜索

nearText 是一种强大的向量搜索功能，允许用户根据查询概念（如“生物学”）在数据中查找最相关的对象。以下是一个使用 nearText 的示例：

import weaviate import weaviate.classes as wvc import os # 最佳实践：将凭证存储在环境变量中 wcd_url = os.environ["WCD_DEMO_URL"] wcd_api_key = os.environ["WCD_DEMO_RO_KEY"] openai_api_key = os.environ["OPENAI_APIKEY"] client = weaviate.connect_to_weaviate_cloud(  cluster_url=wcd_url, # 替换为你的 Weaviate Cloud URL  auth_credentials=wvc.init.Auth.api_key(wcd_api_key), # 替换为你的 Weaviate Cloud 密钥  headers={"X-OpenAI-Api-Key": openai_api_key} # 替换为请求 API 所需的适当头部键/值对 ) try:  questions = client.collections.get("Question")  response = questions.query.near_text(  query="biology",  limit=2  )  print(response.objects[0].properties) # 检查第一个对象的属性 finally:  client.close() # 优雅地关闭客户端

在这个例子中，用户查询了与“生物学”相关的两个问题对象。Weaviate 将通过推理 API 将文本转化为向量，并使用该向量进行搜索。

4. 使用 `nearVector` 进行自定义向量搜索

用户也可以直接输入一个向量进行搜索，例如使用自定义的外部向量化工具。这可以通过 nearVector 操作符来实现。以下是获取 OpenAI 嵌入并使用 nearVector 的示例代码：

import openai openai.api_key = "YOUR-OPENAI-API-KEY" model = "text-embedding-ada-002" oai_resp = openai.Embedding.create(input=["biology"], model=model) oai_embedding = oai_resp['data'][0]['embedding'] result = (  client.query  .get("Question", ["question", "answer"])  .with_near_vector({  "vector": oai_embedding,  "certainty": 0.7  })  .with_limit(2)  .do() ) print(json.dumps(result, indent=4))

在这个示例中，用户首先获取了“生物学”的嵌入向量，然后使用该向量进行搜索并指定相似度阈值 certainty。

posted @ 2024-11-30 22:57 lightsong 阅读(23) 评论(0) 收藏举报

刷新页面返回顶部

Stay Hungry,Stay Foolish!

lightsong

{Web: [React, Vue, NodeJS, HTTP]，DevOps:[Jenkins,Docker,K8S], Languages:[Python, JS, C, Lua, Shell, Groovy]}, AI:[LLM, langchain，langraph]

use-case-airflow-llm-rag-finance

use-case-airflow-llm-rag-finance

LLMOps: Automatic retrieval-augmented generation with Airflow, GPT-4 and Weaviate

Weaviate中的查询

理解 Weaviate 中的查询

1. Weaviate 查询概述

2. 使用 `Get` 方法检索对象

3. 使用 `nearText` 进行向量搜索

4. 使用 `nearVector` 进行自定义向量搜索

公告

Stay Hungry,Stay Foolish!

lightsong

{Web: [React, Vue, NodeJS, HTTP]，DevOps:[Jenkins,Docker,K8S], Languages:[Python, JS, C, Lua, Shell, Groovy]}, AI:[LLM, langchain，langraph]

use-case-airflow-llm-rag-finance

use-case-airflow-llm-rag-finance

LLMOps: Automatic retrieval-augmented generation with Airflow, GPT-4 and Weaviate

Weaviate中的查询

理解 Weaviate 中的查询

1. Weaviate 查询概述

2. 使用 Get 方法检索对象

3. 使用 nearText 进行向量搜索

4. 使用 nearVector 进行自定义向量搜索

公告

2. 使用 `Get` 方法检索对象

3. 使用 `nearText` 进行向量搜索

4. 使用 `nearVector` 进行自定义向量搜索