Stay Hungry,Stay Foolish!

Build a Retrieval Augmented Generation (RAG) App (web search then answer question)

Build a Retrieval Augmented Generation (RAG) App

https://python.langchain.com/docs/tutorials/rag/

One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as Retrieval Augmented Generation, or RAG.

This is a multi-part tutorial:

  • Part 1 (this guide) introduces RAG and walks through a minimal implementation.
  • Part 2 extends the implementation to accommodate conversation-style interactions and multi-step retrieval processes.

This tutorial will show how to build a simple Q&A application over a text data source. Along the way we’ll go over a typical Q&A architecture and highlight additional resources for more advanced Q&A techniques. We’ll also see how LangSmith can help us trace and understand our application. LangSmith will become increasingly helpful as our application grows in complexity.

If you're already familiar with basic retrieval, you might also be interested in this high-level overview of different retrieval techniques.

Note: Here we focus on Q&A for unstructured data. If you are interested for RAG over structured data, check out our tutorial on doing question/answering over SQL data.

Overview

A typical RAG application has two main components:

Indexing: a pipeline for ingesting data from a source and indexing it. This usually happens offline.

Retrieval and generation: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

Note: the indexing portion of this tutorial will largely follow the semantic search tutorial.

The most common full sequence from raw data to answer looks like:

Indexing

  1. Load: First we need to load our data. This is done with Document Loaders.
  2. Split: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and passing it into a model, as large chunks are harder to search over and won't fit in a model's finite context window.
  3. Store: We need somewhere to store and index our splits, so that they can be searched over later. This is often done using a VectorStore and Embeddings model.

index_diagram

Retrieval and generation

  1. Retrieve: Given a user input, relevant splits are retrieved from storage using a Retriever.
  2. Generate: A ChatModel / LLM produces an answer using a prompt that includes both the question with the retrieved data

retrieval_diagram

Once we've indexed our data, we will use LangGraph as our orchestration framework to implement the retrieval and generation steps.

 

 

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")


from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

 

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

 

RAG Bootcamp: Web Search

https://github.com/VectorInstitute/rag_bootcamp/tree/main/web_search

 

# Do a Google web search and parse the results into a big text string
web_documents = []
for result_url in search(query, tld="com", num=5, stop=10, pause=2):
    response = requests.get(result_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    web_documents.append(soup.get_text())

# Wrap text as Document object
docs = [Document(page_content=web_txt, metadata={"source": "web"}) for web_txt in web_documents]
print(f"Number of source documents: {len(docs)}")

# Split the result text into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=32)
chunks = text_splitter.split_documents(docs)
print(f"Number of text chunks: {len(chunks)}\n")


model_kwargs = {'device': 'cuda', 'trust_remote_code': True}
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

print(f"Setting up the embeddings model...")
embeddings = HuggingFaceEmbeddings(
    model_name=EMBEDDING_MODEL_NAME,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
)


vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Retrieve the most relevant context from the vector store based on the query
retrieved_docs = retriever.invoke(query)



rag_pipeline = RetrievalQA.from_llm(llm=llm, retriever=retriever)
result = rag_pipeline.invoke(input=query)
print(f"Result: \n\n{result['result']}")

 

posted @ 2025-02-19 22:26  lightsong  阅读(75)  评论(0)    收藏  举报
千山鸟飞绝,万径人踪灭