Skip to content

mingyin0312/RL4GenomeBench

Repository files navigation

Toward Scientific Reasoning in LLMs: Training from Expert Discussions via Reinforcement Learning

🧬 A New Benchmark Genome-Bench and RL Fine-Tuning for Scientific Reasoning 📊

arXiv GitHub HuggingFace

Overview

We introduce Genome-Bench, a novel benchmark for evaluating and improving scientific reasoning in large language models. Genome-Bench consists of over 3,000 multiple-choice and QA items derived from CRISPR-related scientific discussions and forum threads, covering key topics in genome engineering, experimental design, and error analysis.

Our RL training pipeline (based on Group Relative Policy Optimization) improves model performance across expert-labeled evaluation sets. For example, our fine-tuned Qwen2.5-7B model exceeds GPT-4o in accuracy and consistency on multi-hop reasoning tasks.


Getting Started 🎯

Installation

git clone https://github.com/mingyin0312/RL4GenomeBench.git
cd RL4GenomeBench
pip install -r requirements.txt

Dataset Preparation

We provide tools to parse .mbox email archives and convert them into standardized MCQ and QA formats.

cd dataset_pipeline
python 1_email_parse.py
python 2_convert_MCQ_full.py
python 3_dataset_prepare.py
python 4_convert_natural_question.py

Training

Reinforcement Fine-tuning (GRPO)

python training/rl_training.py 

Supervised Fine-Tuning (SFT)

python training/sft_training.py 

Multi-Agent RL Routing

python training/rl_router_training.py 

Evaluation

To evaluate on the Genome-Bench test data:

python evaluation/genome-bench_eval.py 

Citation

@article{yin2025genome,
  title={Toward Scientific Reasoning in LLMs: Training from Expert Discussions via Reinforcement Learning},
  author={Yin, Ming and Qu, Yuanhao and Ling, Yang and Cong, Le and Wang Mengdi},
  journal={arXiv preprint arXiv:2505.19501},
  year={2025}
}

Acknowledgement

This project leverages the 🤗 Transformers Reinforcement Learning (TRL) library, which provides powerful tools for fine-tuning large language models with reinforcement learning techniques such as GRPO.

About

Official implementation for the paper "Toward Scientific Reasoning in LLMs: Training from Expert Discussions via Reinforcement Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages