Yuchen Zeng*1,2, Shuibai Zhang*1, Wonjun Kang*3,4, Shutong Wu1, Lynnix Zou1, Ying Fan1,2, Heeju Kim3, Ziqian Lin1, Jungtaek Kim1, Hyung Il Koo3, Dimitris Papailiopoulos1,2, Kangwook Lee1,5,
*Equal Contribution 1University of Wisconsin-Madison 2Microsoft Research 3FuriosaAI 4Seoul National University 5Krafton
Abstract: Large Reasoning Models (LRMs) are Large Language Models (LLMs) explicitly trained to generate long-form Chain-of-Thoughts (CoTs), achieving impressive success on challenging tasks like math and programming. However, their underlying reasoning "algorithms" remain poorly understood. To investigate this, we propose ReJump, which represents a reasoning trace as a visitation order over nodes in a tree of intermediate problem-solving steps. Transitions between nodes, which we term jumps, include adjacent moves that capture behaviors such as calculation, and non-adjacent moves that capture behaviors such as backtracking and verification. ReJump enables analyzing LLM reasoning with diverse metrics that quantify exploration, exploitation, overthinking, forgetting, and verification. Using our proposed LLM agent to extract reasoning traces into ReJump format, we evaluate state-of-the-art LRMs on two tasks and find that models with similar accuracy can exhibit distinct reasoning behaviors, while different tasks favor different reasoning styles (e.g., varying balance between exploration and exploitation). To further understand how learning strategies shape reasoning, we use ReJump to compare distilled LRMs with their teachers, compare CoT-prompted LLMs with LRMs, and examine how reinforcement learning affects reasoning behavior. Finally, we show that ReJump can improve reasoning quality at test time through strategies such as ReJump-guided Best-of-N selection and prompt selection.
Links: Paper (arXiv) | OpenReview
- [May 2026] Our paper is accepted to ICML 2026!
- [Dec 2025] Our paper is available on arXiv.
- Step 1: Set Up Environment
- Step 2: Collect LLM Responses on MATH-500, Game of 24, and Sudoku
- Step 3: Perform Reasoning Analysis via ReJump
To set up the environment for ReJump extraction, analysis, and experiment scripts, follow these steps on Linux.
-
Clone this repository.
git clone https://github.com/UW-Madison-Lee-Lab/ReJump.git cd ReJump -
Install dependencies.
# create the environment that works for all experiments in our paper conda env create -f conda_env/rejump.yml conda activate rejump pip install -e .
-
Create a local
environment.pyat the repository root. This file is ignored by git and must never be committed.cp environment.example.py environment.py # Fill in only the keys/paths needed for the scripts you plan to run.Important: do not commit
environment.py. It contains local API keys and machine-specific paths. The checked-inenvironment.example.pycontains placeholders only.
Check constants.py for all supported LLMs.
python -m run_exps.create_exps \
--dataset math500 \
--model <model_name> \
--mode reasoning \
--shot 0 \
--n_samples 500 \
--n_query 1 \
--exp_name <exp_name> \
--temperature <temperature>
bash run_exps/auto/run_all_<exp_name>.shpython -m run_exps.create_exps \
--dataset game24 \
--model <model_name> \
--mode reasoning \
--shot 0 \
--n_samples 100 \
--n_query 1 \
--exp_name <exp_name> \
--temperature <temperature>
bash run_exps/auto/run_all_<exp_name>.shpython -m run_exps.create_exps \
--dataset sudoku \
--model <model_name> \
--mode reasoning \
--shot 0 \
--n_samples 100 \
--n_query 1 \
--exp_name <exp_name> \
--temperature <temperature>
bash run_exps/auto/run_all_<exp_name>.sh
python -m rejump_extractor.tree_vis_math_v3 \
--dataset_name math500 \
--model_name <model_name> \
--temperature <temperature> \
--num_samples 500 \
--wandbpython -m rejump_extractor.tree_vis_game24 \
--dataset_name game24 \
--model_name <model_name> \
--temperature <temperature> \
--num_samples 100 \
--wandbpython -m rejump_extractor.tree_vis_sudoku \
--dataset_name sudoku \
--model_name <model_name> \
--temperature <temperature> \
--num_samples 100 \
--wandb