Background: automating the entire research process with open-source LLMs remains largely unexplored
Task: using open-source post-trained LLMs as agents of performing the full cycle of automated research and review
Tools
Automatic Researcher: CycleResearcher
Experiment:
效果:
a 26.89% improvement in mean absolute error (MAE) over individual human reviewers in predicting paper scores
LLMs can surpass expert-level performance in research evaluation
Automatic Reviewer: CycleReviewer
Experiment
效果
CycleResearcher model achieved a score of 5.36 in simulated peer reviews, surpassing the preprint level of 5.24 from human experts and approaching the accepted paper level of 5.69
Benchmarks:
Review-5k: real-world peer review dynamics
Research-14k: real-world machine learning research