完整教程:吴恩达MCP课程(5):research_server_prompt_resource.py
代码
import arxiv import json import os fromtypingimport List from mcp.server.fastmcpimportFastMCP PAPER_DIR= "papers" # Initialize FastMCP server mcp =FastMCP("research" ) @mcp.tool( ) def search_papers(topic: str ,max_results: int = 5 ) -> List[str]: """ Search for papers on arXiv based on a topic and store their information. Args: topic: The topic to search for max_results: Maximum number of results to retrieve (default: 5) Returns: List of paper IDs found in the search """ # Use arxiv to find the papersclient= arxiv.Client( ) # Search for the most relevant articles matching the queried topicsearch= arxiv.Search( query = topic,max_results=max_results,sort_by= arxiv.SortCriterion.Relevance)papers=client.results(search) # Create directory for this topic path = os.path.join(PAPER_DIR, topic.lower( ).replace(" " , "_" ) ) os.makedirs(path,exist_ok=True )file_path= os.path.join(path, "papers_info.json" ) # Try to load existing papers info try: with open(file_path, "r" ) asjson_file:papers_info= json.load(json_file) except (FileNotFoundError, json.JSONDecodeError):papers_info= { } # Process each paper and add to papers_infopaper_ids= [] for paper inpapers:paper_ids.append(paper.get_short_id( ) )paper_info= { 'title': paper.title, 'authors': [author.name forauthorin paper.authors] , 'summary': paper.summary, 'pdf_url': paper.pdf_url, 'published': str(paper.published.date( ) ) }papers_info[paper.get_short_id( )] =paper_info# Save updated papers_info to json file with open(file_path, "w" ) asjson_file: json.dump(papers_info,json_file,indent=2 ) print(f"Results are saved in:{file_path}" ) returnpaper_ids@mcp.tool( ) def extract_info(paper_id: str ) -> str: """ Search for information about a specific paper across all topic directories. Args: paper_id: The ID of the paper to look for Returns: JSON string with paper information if found, error message if not found """ for item in os.listdir(PAPER_DIR):item_path= os.path.join(PAPER_DIR, item) if os.path.isdir(item_path):file_path= os.path.join(item_path, "papers_info.json" ) if os.path.isfile(file_path): try: with open(file_path, "r" ) asjson_file:papers_info= json.load(json_file) ifpaper_idinpapers_info: return json.dumps(papers_info[paper_id] ,indent=2 ) except (FileNotFoundError, json.JSONDecodeError) as e: print(f"Error reading{file_path}: { str(e) }" ) continue return f"There's no saved information related to paper{paper_id}." @mcp.resource("papers://folders" ) def get_available_folders( ) -> str: """ List all available topic folders in the papers directory. This resource provides a simple list of all available topic folders. """folders= [] # Get all topic directories if os.path.exists(PAPER_DIR): fortopic_dirin os.listdir(PAPER_DIR):topic_path= os.path.join(PAPER_DIR,topic_dir) if os.path.isdir(topic_path):papers_file= os.path.join(topic_path, "papers_info.json" ) if os.path.exists(papers_file):folders.append(topic_dir) # Create a simple markdown listcontent= "# Available Topics\n\n" iffolders: forfolderinfolders:content+= f"- {folder}\n"content+= f"\nUse @{folder}to access papers in that topic.\n" else:content+= "No topics found.\n" returncontent@mcp.resource("papers://{topic}" ) def get_topic_papers(topic: str ) -> str: """ Get detailed information about papers on a specific topic. Args: topic: The research topic to retrieve papers for """topic_dir= topic.lower( ).replace(" " , "_" )papers_file= os.path.join(PAPER_DIR,topic_dir, "papers_info.json" ) if not os.path.exists(papers_file): return f"# No papers found for topic:{ topic }\n\nTry searching for papers on this topic first." try: with open(papers_file, 'r' ) as f:papers_data= json.load(f) # Create markdown content with paper detailscontent= f"# Papers on{ topic.replace('_' , ' ' ).title( ) }\n\n"content+= f"Total papers:{ len(papers_data) }\n\n" forpaper_id,paper_infoinpapers_data.items( ):content+= f"## {paper_info['title'] }\n"content+= f"- **Paper ID**:{paper_id}\n"content+= f"- **Authors**:{ ', '.join(paper_info['authors'] ) }\n"content+= f"- **Published**:{paper_info['published'] }\n"content+= f"- **PDF URL**: [{paper_info['pdf_url'] }]({paper_info['pdf_url'] })\n\n"content+= f"### Summary\n{paper_info['summary'][:500] }...\n\n"content+= "---\n\n" returncontentexcept json.JSONDecodeError: return f"# Error reading papers data for{ topic }\n\nThe papers data file is corrupted." @mcp.prompt( ) def generate_search_prompt(topic: str ,num_papers: int = 5 ) -> str: """Generate a prompt for Claude to find and discuss academic papers on a specific topic.""" return f"""Search for{num_papers}academic papers about '{ topic }' using the search_papers tool. Follow these instructions: 1. First, search for papers using search_papers(topic='{ topic }', max_results={num_papers}) 2. For each paper found, extract and organize the following information: - Paper title - Authors - Publication date - Brief summary of the key findings - Main contributions or innovations - Methodologies used - Relevance to the topic '{ topic }' 3. Provide a comprehensive summary that includes: - Overview of the current state of research in '{ topic }' - Common themes and trends across the papers - Key research gaps or areas for future investigation - Most impactful or influential papers in this area 4. Organize your findings in a clear, structured format with headings and bullet points for easy readability. Please present both detailed information about each paper and a high-level synthesis of the research landscape in{ topic }.""" if__name__== "__main__": # Initialize and run the server mcp.run(transport='stdio' ) 代码解释
这个research_server_prompt_resource.py文件完成了一个基于MCP(Model ContextProtocol)的研究服务器,关键用于搜索、存储和检索arXiv上的学术论文。下面是对代码的详细解释:
1. 导入和初始化
import arxiv import json import os fromtypingimport List from mcp.server.fastmcpimportFastMCP PAPER_DIR= "papers" # Initialize FastMCP server mcp =FastMCP("research" ) - 导入必要的库:
arxiv用于访问arXiv API,json用于处理JSON数据,os用于文件操作 - 定义论文存储目录为
papers - 创建一个名为"research"的FastMCP服务器实例
2. 工具函数:搜索论文
@mcp.tool( ) def search_papers(topic: str ,max_results: int = 5 ) -> List[str]: 这个函数被注册为MCP工具,用于根据主题搜索arXiv上的论文:
- 功能:搜索arXiv上与指定主题相关的论文,并将结果保存到本地
- 参数:
topic:搜索主题max_results:最大结果数(默认5篇)
- 返回值:找到的论文ID列表
核心实现:
- 使用arxiv客户端搜索相关论文
- 为主题创建目录(如果不存在)
- 尝试加载现有的论文信息(假设有)
- 处理每篇论文,提取标题、作者、摘要等信息
- 将论文信息保存到JSON文件中
3. 工具函数:提取论文信息
@mcp.tool( ) def extract_info(paper_id: str ) -> str: 这个函数被注册为MCP工具,用于根据论文ID检索特定论文的详细信息:
- 功能:在所有主题目录中搜索指定ID的论文信息
- 参数:
paper_id:要查找的论文ID - 返回值:包括论文信息的JSON字符串,或未找到时的错误消息
实现逻辑:
- 遍历
papers目录下的所有主题文件夹 - 检查每个文件夹中的
papers_info.json文件 - 如果找到匹配的论文ID,返回其详细信息
4. 资源函数:获取可用文件夹
@mcp.resource("papers://folders" ) def get_available_folders( ) -> str: 这个函数被注册为MCP资源,提供URI为papers://folders的访问点:
- 功能:列出
papers目录中所有可用的主题文件夹 - 返回值:包含所有主题列表的Markdown格式文本
实现方式:
- 扫描
papers目录,查找囊括papers_info.json文件的子目录 - 将找到的主题列表格式化为Markdown文本
- 添加使用说明(如何访问特定主题)
5. 资源函数:获取主题论文
@mcp.resource("papers://{topic}" ) def get_topic_papers(topic: str ) -> str: 这个函数被注册为MCP资源,提供动态URIpapers://{topic}的访问点:
- 功能:获取特定主题的所有论文详细信息
- 参数:
topic:要检索的研究主题 - 返回值:包含该主题所有论文详细信息的Markdown格式文本
实现细节:
- 根据主题名构建文件路径
- 检查并加载该主题的论文信息JSON文件
- 将论文信息格式化为结构化的Markdown文档
- 包含每篇论文的标题、ID、作者、发布日期、PDF链接和摘要
6. 提示函数:生成搜索提示
@mcp.prompt( ) def generate_search_prompt(topic: str ,num_papers: int = 5 ) -> str: 该函数被注册为MCP提示,用于生成结构化的搜索指令:
- 功能:生成一个提示文本,指导AI如何搜索和讨论特定主题的学术论文
- 参数:
topic:要搜索的主题num_papers:要检索的论文数量(默认5篇)
- 返回值:格式化的提示文本
提示内容包括:
- 使用
search_papers工具搜索论文的指令 - 如何提取和组织每篇论文的信息(标题、作者、发布日期等)
- 如何提供综合摘要(研究现状、共同主题、研究空白等)
- 如何组织和呈现结果
7. 主程序
if__name__== "__main__": # Initialize and run the server mcp.run(transport='stdio' ) 当脚本作为主程序运行时,启动MCP服务器,使用标准输入/输出(stdio)作为传输方式。
总结
这个服务器达成了以下核心功能:
- 论文搜索与存储:通过arXiv API搜索论文并将结果保存到本地JSON文件
- 论文信息检索:根据论文ID或主题检索论文详细信息
- 资源访问:供应URI访问点,用于获取可用主题列表和特定主题的论文信息
- 提示生成:生成结构化提示,指导AI如何搜索和分析学术论文
这个服务器设计为与MCP兼容的工具,可以被MCP客户端(如聊天机器人)调用,为用户提供学术论文搜索和分析能力。
浙公网安备 33010602011771号