deep-searcher --- Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.

deep-searcher

Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.

https://github.com/zilliztech/deep-searcher

DeepSearcher combines cutting-edge LLMs (OpenAI o3, Qwen3, DeepSeek, Grok 3, Claude 4 Sonnet, Llama 4, QwQ, etc.) and Vector Databases (Milvus, Zilliz Cloud etc.) to perform search, evaluation, and reasoning based on private data, providing highly accurate answer and comprehensive report. This project is suitable for enterprise knowledge management, intelligent Q&A systems, and information retrieval scenarios.

🚀 Features

Private Data Search: Maximizes the utilization of enterprise internal data while ensuring data security. When necessary, it can integrate online content for more accurate answers.
Vector Database Management: Supports Milvus and other vector databases, allowing data partitioning for efficient retrieval.
Flexible Embedding Options: Compatible with multiple embedding models for optimal selection.
Multiple LLM Support: Supports DeepSeek, OpenAI, and other large models for intelligent Q&A and content generation.
Document Loader: Supports local file loading, with web crawling capabilities under development.

🎉 Demo

https://zilliztech.github.io/deep-searcher/examples/basic_example/#overview

import logging
import os

from deepsearcher.offline_loading import load_from_local_files
from deepsearcher.online_query import query
from deepsearcher.configuration import Configuration, init_config

httpx_logger = logging.getLogger("httpx")  # disable openai's logger output
httpx_logger.setLevel(logging.WARNING)

current_dir = os.path.dirname(os.path.abspath(__file__))

config = Configuration()  # Customize your config here
init_config(config=config)


# You should clone the milvus docs repo to your local machine first, execute:
# git clone https://github.com/milvus-io/milvus-docs.git
# Then replace the path below with the path to the milvus-docs repo on your local machine
# import glob
# all_md_files = glob.glob('xxx/milvus-docs/site/en/**/*.md', recursive=True)
# load_from_local_files(paths_or_directory=all_md_files, collection_name="milvus_docs", collection_description="All Milvus Documents")

# Hint: You can also load a single file, please execute it in the root directory of the deep searcher project
load_from_local_files(
    paths_or_directory=os.path.join(current_dir, "data/WhatisMilvus.pdf"),
    collection_name="milvus_docs",
    collection_description="All Milvus Documents",
    # force_new_collection=True, # If you want to drop origin collection and create a new collection every time, set force_new_collection to True
)

question = "Write a report comparing Milvus with other vector databases."

_, _, consumed_token = query(question, max_iter=1)
print(f"Consumed tokens: {consumed_token}")

https://zilliz.com/blog/introduce-deepsearcher-a-local-open-source-deep-research

DeepSearcher Architecture

The architecture of DeepSearcher follows our previous post by breaking the problem up into four steps - define/refine the question, research, analyze, synthesize - although this time with some overlap. We go through each step, highlighting DeepSearcher’s improvements.

deepsearcher architecture.png

Define and Refine the Question

Break down the original query into new sub queries: [
  'How has the cultural impact and societal relevance of The Simpsons evolved from its debut to the present?',
  'What changes in character development, humor, and storytelling styles have occurred across different seasons of The Simpsons?', 
  'How has the animation style and production technology of The Simpsons changed over time?',
  'How have audience demographics, reception, and ratings of The Simpsons shifted throughout its run?']

In the design of DeepSearcher, the boundaries between researching and refining the question are blurred. The initial user query is decomposed into sub-queries, much like the previous post. See above for initial subqueries produced from the query “How has The Simpsons changed over time?”. However, the following research step will continue to refine the question as needed.

Research and Analyze

Having broken down the query into sub-queries, the research portion of the agent begins. It has, roughly speaking, four steps: routing, search, reflection, and conditional repeat.

Routing

Our database contains multiple tables or collections from different sources. It would be more efficient if we could restrict our semantic search to only those sources that are relevant to the query at hand. A query router prompts an LLM to decide from which collections information should be retrieved.

Here is the method to form the query routing prompt:

def get_vector_db_search_prompt(
    question: str,
    collection_names: List[str],
    collection_descriptions: List[str],
    context: List[str] = None,
):
    sections = []
    # common prompt
    common_prompt = f"""You are an advanced AI problem analyst. Use your reasoning ability and historical conversation information, based on all the existing data sets, to get absolutely accurate answers to the following questions, and generate a suitable question for each data set according to the data set description that may be related to the question.

Question: {question}
"""
    sections.append(common_prompt)
    
    # data set prompt
    data_set = []
    for i, collection_name in enumerate(collection_names):
        data_set.append(f"{collection_name}: {collection_descriptions[i]}")
    data_set_prompt = f"""The following is all the data set information. The format of data set information is data set name: data set description.

Data Sets And Descriptions:
"""
    sections.append(data_set_prompt + "\n".join(data_set))
    
    # context prompt
    if context:
        context_prompt = f"""The following is a condensed version of the historical conversation. This information needs to be combined in this analysis to generate questions that are closer to the answer. You must not generate the same or similar questions for the same data set, nor can you regenerate questions for data sets that have been determined to be unrelated.

Historical Conversation:
"""
        sections.append(context_prompt + "\n".join(context))
    
    # response prompt
    response_prompt = f"""Based on the above, you can only select a few datasets from the following dataset list to generate appropriate related questions for the selected datasets in order to solve the above problems. The output format is json, where the key is the name of the dataset and the value is the corresponding generated question.

Data Sets:
"""
    sections.append(response_prompt + "\n".join(collection_names))
    
    footer = """Respond exclusively in valid JSON format matching exact JSON schema.

Critical Requirements:
- Include ONLY ONE action type
- Never add unsupported keys
- Exclude all non-JSON text, markdown, or explanations
- Maintain strict JSON syntax"""
    sections.append(footer)
    return "\n\n".join(sections)

We make the LLM return structured output as JSON in order to easily convert its output to a decision on what to do next.

Search

Having selected various database collections via the previous step, the search step performs a similarity search with Milvus. Much like the previous post, the source data has been specified in advance, chunked, embedded, and stored in the vector database. For DeepSearcher, the data sources, both local and online, must be manually specified. We leave online search for future work.

    async def async_retrieve(
        self, original_query: str, **kwargs
    ) -> Tuple[List[RetrievalResult], int, dict]:
        max_iter = kwargs.pop("max_iter", self.max_iter)
        ### SUB QUERIES ###
        log.color_print(f"<query> {original_query} </query>\n")
        all_search_res = []
        all_sub_queries = []
        total_tokens = 0

        sub_queries, used_token = self._generate_sub_queries(original_query)
        total_tokens += used_token
        if not sub_queries:
            log.color_print("No sub queries were generated by the LLM. Exiting.")
            return [], total_tokens, {}
        else:
            log.color_print(
                f"<think> Break down the original query into new sub queries: {sub_queries}</think>\n"
            )
        all_sub_queries.extend(sub_queries)
        sub_gap_queries = sub_queries

        for iter in range(max_iter):
            log.color_print(f">> Iteration: {iter + 1}\n")
            search_res_from_vectordb = []
            search_res_from_internet = []  # TODO

            # Create all search tasks
            search_tasks = [
                self._search_chunks_from_vectordb(query, sub_gap_queries)
                for query in sub_gap_queries
            ]
            # Execute all tasks in parallel and wait for results
            search_results = await asyncio.gather(*search_tasks)
            # Merge all results
            for result in search_results:
                search_res, consumed_token = result
                total_tokens += consumed_token
                search_res_from_vectordb.extend(search_res)

            search_res_from_vectordb = deduplicate_results(search_res_from_vectordb)
            # search_res_from_internet = deduplicate_results(search_res_from_internet)
            all_search_res.extend(search_res_from_vectordb + search_res_from_internet)
            if iter == max_iter - 1:
                log.color_print("<think> Exceeded maximum iterations. Exiting. </think>\n")
                break
            ### REFLECTION & GET GAP QUERIES ###
            log.color_print("<think> Reflecting on the search results... </think>\n")
            sub_gap_queries, consumed_token = self._generate_gap_queries(
                original_query, all_sub_queries, all_search_res
            )
            total_tokens += consumed_token
            if not sub_gap_queries or len(sub_gap_queries) == 0:
                log.color_print("<think> No new search queries were generated. Exiting. </think>\n")
                break
            else:
                log.color_print(
                    f"<think> New search queries for next iteration: {sub_gap_queries} </think>\n"
                )
                all_sub_queries.extend(sub_gap_queries)

        all_search_res = deduplicate_results(all_search_res)
        additional_info = {"all_sub_queries": all_sub_queries}
        return all_search_res, total_tokens, additional_info

Reflection

Unlike the previous post, DeepSearcher illustrates a true form of agentic reflection, inputting the prior outputs as context into a prompt that “reflects” on whether the questions asked so far and the relevant retrieved chunks contain any informational gaps. This can be seen as an analysis step.

Here is the method to create the prompt:

def get_reflect_prompt(
   question: str,
   mini_questions: List[str],
   mini_chuncks: List[str],
):
    mini_chunk_str = ""
    for i, chunk in enumerate(mini_chuncks):
        mini_chunk_str += f"""<chunk_{i}>\n{chunk}\n</chunk_{i}>\n"""
    reflect_prompt = f"""Determine whether additional search queries are needed based on the original query, previous sub queries, and all retrieved document chunks. If further research is required, provide a Python list of up to 3 search queries. If no further research is required, return an empty list.

If the original query is to write a report, then you prefer to generate some further queries, instead return an empty list.

    Original Query: {question}
    Previous Sub Queries: {mini_questions}
    Related Chunks: 
    {mini_chunk_str}
    """
   
    
    footer = """Respond exclusively in valid List of str format without any other text."""
    return reflect_prompt + footer

Once more, we make the LLM return structured output, this time as Python-interpretable data.

Here is an example of new sub-queries “discovered” by reflection after answering the initial sub-queries above:

New search queries for next iteration: [
  "How have changes in The Simpsons' voice cast and production team influenced the show's evolution over different seasons?",
  "What role has The Simpsons' satire and social commentary played in its adaptation to contemporary issues across decades?",
  'How has The Simpsons addressed and incorporated shifts in media consumption, such as streaming services, into its distribution and content strategies?']

Conditional Repeat

Unlike our previous post, DeepSearcher illustrates conditional execution flow. After reflecting on whether the questions and answers so far are complete, if there are additional questions to be asked the agent repeats the above steps. Importantly, the execution flow (a while loop) is a function of the LLM output rather than being hard-coded. In this case there is only a binary choice: repeat research or generate a report. In more complex agents there may be several such as: follow hyperlink, retrieve chunks, store in memory, reflect etc. In this way, the question continues to be refined as the agent sees fit until it decides to exit the loop and generate the report. In our Simpsons example, DeepSearcher performs two more rounds of filling the gaps with extra sub-queries.

Synthesize

Finally, the fully decomposed question and retrieved chunks are synthesized into a report with a single prompt. Here is the code to create the prompt:

def get_final_answer_prompt(
   question: str, 
   mini_questions: List[str],
   mini_chuncks: List[str],
):
    mini_chunk_str = ""
    for i, chunk in enumerate(mini_chuncks):
        mini_chunk_str += f"""<chunk_{i}>\n{chunk}\n</chunk_{i}>\n"""
    summary_prompt = f"""You are an AI content analysis expert, good at summarizing content. Please summarize a specific and detailed answer or report based on the previous queries and the retrieved document chunks.

    Original Query: {question}
    Previous Sub Queries: {mini_questions}
    Related Chunks: 
    {mini_chunk_str}
    """
    return summary_prompt

This approach has the advantage over our prototype, which analyzed each question separately and simply concatenated the output, of producing a report where all sections are consistent with each other, i.e., containing no repeated or contradictory information. A more complex system could combine aspects of both, using a conditional execution flow to structure the report, summarize, rewrite, reflect and pivot, and so on, which we leave for future work.

Results

Here is a sample from the report generated by the query “How has The Simpsons changed over time?” with DeepSeek-R1 passing the Wikipedia page on The Simpsons as source material:

Report: The Evolution of The Simpsons (1989–Present)
1. Cultural Impact and Societal Relevance
The Simpsons debuted as a subversive critique of American middle-class life, gaining notoriety for its bold satire in the 1990s. Initially a countercultural phenomenon, it challenged norms with episodes tackling religion, politics, and consumerism. Over time, its cultural dominance waned as competitors like South Park and Family Guy pushed boundaries further. By the 2010s, the show transitioned from trendsetter to nostalgic institution, balancing legacy appeal with attempts to address modern issues like climate change and LGBTQ+ rights, albeit with less societal resonance.
…
Conclusion
The Simpsons evolved from a radical satire to a television institution, navigating shifts in technology, politics, and audience expectations. While its golden-age brilliance remains unmatched, its adaptability—through streaming, updated humor, and global outreach—secures its place as a cultural touchstone. The show’s longevity reflects both nostalgia and a pragmatic embrace of change, even as it grapples with the challenges of relevance in a fragmented media landscape.

Find the full report here, and a report produced by DeepSearcher with GPT-4o mini for comparison.

Discussion

We presented DeepSearcher, an agent for performing research and writing reports. Our system is built upon the idea in our previous article, adding features like conditional execution flow, query routing, and an improved interface. We switched from local inference with a small 4-bit quantized reasoning model to an online inference service for the massive DeepSeek-R1 model, qualitatively improving our output report. DeepSearcher works with most inference services like OpenAI, Gemini, DeepSeek and Grok 3 (coming soon!).

Reasoning models, especially as used in research agents, are inference-heavy, and we were fortunate to be able to use the fastest offering of DeepSeek-R1 from SambaNova running on their custom hardware. For our demonstration query, we made sixty-five calls to SambaNova’s DeepSeek-R1 inference service, inputting around 25k tokens, outputting 22k tokens, and costing $0.30. We were impressed with the speed of inference given that the model contains 671-billion parameters and is 3/4 of a terabyte large. Find out more details here!

We will continue to iterate on this work in future posts, examining additional agentic concepts and the design space of research agents. In the meanwhile, we invite everyone to try out DeepSearcher, star us on GitHub, and share your feedback!

项目docker化

https://github.com/zilliztech/deep-searcher/tree/master

https://github.com/fanqingsong/deep-searcher/tree/master

Milvus API

https://milvus.io/api-reference/restful/v2.5.x/v2/Collection%20(v2)/List.md

deep searcher doc

https://zilliztech.github.io/deep-searcher/configuration/file_loader/#docling

Milvus1 comments

https://ds2man.github.io/posts/Milvus1/

Introuduce Milvus

According to the DB-Engines Ranking website, various VectorDBs exist, including Milvus, Pinecone, and Chroma. Since I have to use a VectorDB going forward, I was uncertain about which DBMS to choose, but ultimately I selected Milvus because it ranks highest among the open-source VectorDB options. ~~(Honestly, I wanted to use Pinecone, but I was discouraged after learning it’s a paid service…)~~
Functionally, Milvus offers various advantages such as high performance, scalability, diverse indexing algorithm support, easy-to-use APIs (REST API), and optimization for cloud and container environments. Honestly, there are many aspects I’m not entirely familiar with yet. As an engineer, I plan to approach it primarily from a practical usage perspective. For further details, refer to Milvus Home.

DB-Engines Ranking of Vector DBMS(Source: DB-Engines Ranking, update 2nd : 20250309)

Rank			DBMS	Database Model	Score
Mar 2025	Feb 2025	Mar 2024	-	-	Mar 2025	Feb 2025	Mar 2024
1.	1.	1.	Elasticsearch	Multi-model

131.38	-3.25	-3.41
2.	2.	2.	Couchbase	Multi-model

15.05	-0.58	-4.11
3.	3.	3.	Kdb	Multi-model

7.10	+0.31	-0.59
4.	4.	4.	Aerospike	Multi-model

5.22	+0.12	-1.30
5.	5.

6.	Pinecone	Vector	3.17	+0.22	-0.16
6.	6.

9.	Milvus	Vector	2.77	+0.00	+1.07
7.	7.

DolphinDB

Multi-model

2.29	-0.09	-1.85
8.	8.

12.	Qdrant	Vector	2.03	+0.10	+0.78
9.	9.

7.	Chroma	Vector	1.89	+0.04	-0.39
10.	10.

8.	Weaviate	Vector	1.58	-0.07	-0.42
11.	11.	11.	Meilisearch	Multi-model

1.20

+0.07

-0.13

Milvus Architecture(Source: Milvus Architecture)

Install Milvus and MinIO and Attu

The Milvus GitHub Releases page is where you can check and download the official release versions of Milvus. Here, you can find both the latest and previous versions of Milvus. It provides a docker-compose-gpu.yml file, which includes not only Milvus Standalone but also the installation of etcd and MinIO. The reason for installing etcd and MinIO together seems to be related to Milvus’s architecture. Looking at the Milvus architecture, the Standalone mode runs as a single instance, but internally, it still requires metadata management (etcd) and object storage (MinIO).

posted @ 2025-06-09 20:42 lightsong 阅读(48) 评论(0) 收藏举报

刷新页面返回顶部

Stay Hungry,Stay Foolish!

lightsong

{Web: [React, Vue, NodeJS, HTTP]，DevOps:[Jenkins,Docker,K8S], Languages:[Python, JS, C, Lua, Shell, Groovy]}, AI:[LLM, langchain，langraph]