Turning the Tide: Repository-based Code Reflection

This repository contains the benchmark data for LiveRepoReflection, a project designed to evaluate the ability of Code Large Language Models (LLMs) to understand, modify, and reflect upon code within a multi-file repository context.

For the full evaluation framework, data generation pipeline, and research paper, please visit the main project repository: LiveRepoReflection/LiveRepoReflection-Project.

Benchmark Data Structure

The benchmark data is organized by programming language. Each language directory contains a set of test cases, where each test case is a self-contained programming problem within its own directory.

We provide directory structure example for six programming language.

The directory structure is as follows:

  .
  ├── cpp/
  │   └── exercises/practice/
  │       └── <uuid_id>/<test_case_id>/
  │           ├── .docs/
  │           ├── .meta/
  │           ├── <source_files>.cpp
  │           └── <test_files>.cpp
  ├── go/
  │   └── exercises/practice/
  │       └── <uuid_id>/<test_case_id>/
  │           ├── .docs/
  │           ├── .meta/
  │           ├── <source_files>.go
  │           └── <test_files>.go
  ├── java/
  │   └── exercises/practice/
  │       └── <uuid_id>/<test_case_id>/
  │           └── src/
  │               ├── main/java/
  │               └── test/java/
  ├── javascript/
  │   └── exercises/practice/
  │       └── <uuid_id>/<test_case_id>/
  │           ├── .docs/
  │           ├── .meta/
  │           ├── <source_files>.js
  │           └── <test_files>.js
  ├── python/
  │   └── exercises/practice/
  │       └── <uuid_id>/<test_case_id>/
  │           ├── .docs/
  │           ├── .meta/
  │           ├── <source_files>.py
  │           └── <test_files>.py
  └── rust/
      └── exercises/practice/
          └── <uuid_id>/<test_case_id>/
              ├── .docs/
              ├── .meta/
              └── src/
                  ├── lib.rs
                  └── main.rs

Inside a Test Case

Each <test_case_id> directory represents a unique problem and contains:

Source Files: One or more source code files (e.g., network_distance.py) that contain the code to be analyzed or modified by the LLM.
Test Files: Corresponding unit test files (e.g., network_distance_test.py) that define the correctness criteria for the task. The LLM's goal is to modify the source files so that these tests pass.
.docs/: (Optional) Directory containing documentation or hints related to the problem.
.meta/: (Optional) Directory containing metadata about the test case.

How to Use the Benchmark

To run the evaluation using this data, please use the automated framework provided in the LiveRepoReflection-Project repository. Please refer to the README.md in the main project repository for detailed instructions on running the benchmarks.

How to Cite

If you use LiveRepoReflection in your research, please cite our paper:

@misc{zhang2025LiveRepoReflection,
      title={Turning the Tide: Repository-based Code Reflection}, 
      author={Wei Zhang and Jian Yang and Jiaxi Yang and Ya Wang and Zhoujun Li and Zeyu Cui and Binyuan Hui and Junyang Lin},
      year={2025},
      eprint={2507.09866},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2507.09866}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cpp/exercises/practice		cpp/exercises/practice
go/exercises/practice		go/exercises/practice
java/exercises/practice		java/exercises/practice
javascript/exercises/practice		javascript/exercises/practice
python/exercises/practice		python/exercises/practice
rust/exercises/practice		rust/exercises/practice
README.md		README.md
structure.png		structure.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Turning the Tide: Repository-based Code Reflection

Benchmark Data Structure

Inside a Test Case

How to Use the Benchmark

How to Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Turning the Tide: Repository-based Code Reflection

Benchmark Data Structure

Inside a Test Case

How to Use the Benchmark

How to Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages