This repository contains the official implementation of HiFL, a hierarchical fault localization method designed to improve the accuracy of Large Language Models (LLMs) in identifying bugs within large code repositories.
LLMs are powerful but often struggle to pinpoint the exact location of a bug in a large project due to context size limitations. HiFL solves this by breaking down the problem into a three-stage hierarchy:
- File-Level Localization: Identifies the most likely files related to a bug.
- Function-Level Localization: Narrows the search to specific functions or code blocks within those files.
- Line-Level Localization: Pinpoints the exact lines of code that need to be fixed.
At each stage, a specialized reward model (HiLoRM) evaluates multiple candidates generated by an LLM and selects the most promising one, significantly boosting localization accuracy.
Our experiments show that this approach improves line-level recall by 12% on the SWE-Bench-Lite dataset compared to baseline models.
The HiLoRM reward model was trained on HiFL-44k, a dataset of approximately 44,000 fault localization instances. You can access it on Hugging Face:
First, clone the repository and install the required dependencies:
git clone https://github.com/your-username/HiFL-Method.git
cd HiFL-Method
pip install -r requirements.txtBefore running, configure a few environment variables. You can add these to your .bashrc or .zshrc file, or create a .env file in the project root.
# Your OpenAI API key for accessing LLMs
export OPENAI_API_KEY=\'your-api-key-here\'
# Add the project directory to your Python path
export PYTHONPATH=$PYTHONPATH:$(pwd)
# Path to the directory containing pre-processed repository structures
# This is required for the model to understand the layout of the codebases you are analyzing.
export PROJECT_FILE_LOC="/path/to/your/repo_structures"The core workflow of HiFL is a three-stage process. Run these stages in order to localize a bug from the repository level down to the specific lines of code.
Goal: Identify a small set of files most likely containing the bug.
This stage runs in four steps:
- Localize Relevant Files: Use the LLM to find files that are likely related to the bug.
- Localize Irrelevant Files: Use the LLM to identify files that are unlikely to be related, helping to prune the search space.
- Retrieve by Similarity: Find additional code snippets from the irrelevant files that might have been missed.
- Merge File Lists: Combine the results into a final candidate file list.
Click to view Stage 1 Commands
# 1. Localize relevant files
python hifl/fl/localize.py \
--model "gpt-4-turbo" \
--reward_model "lapsel/HiLoRM" \
--reward_model_type "generate" \
--file_level \
--output_folder "results/file_level" \
--num_threads 10 \
--sample 3 \
--skip_existing
# 2. Localize irrelevant files (to narrow the search)
python hifl/fl/localize.py \
--model "gpt-4-turbo" \
--reward_model "lapsel/HiLoRM" \
--reward_model_type "generate" \
--file_level \
--irrelevant \
--output_folder "results/file_level_irrelevant" \
--num_threads 15 \
--sample 3 \
--skip_existing
# 3. Retrieve related code via similarity
python hifl/fl/retrieve.py \
--index_type "simple" \
--filter_type "given_files" \
--filter_file "results/file_level_irrelevant/loc_outputs.jsonl" \
--output_folder "results/retrieval_embedding" \
--persist_dir "embedding/swe-bench_simple" \
--num_threads 10
# 4. Merge the file lists
python hifl/fl/combine.py \
--retrieval_loc_file "results/retrieval_embedding/retrieve_locs.jsonl" \
--model_loc_file "results/file_level/loc_outputs.jsonl" \
--top_n 3 \
--output_folder "results/file_level_combined"Goal: Pinpoint relevant functions and code elements within the files identified in Stage 1.
This command takes the combined file list from the previous stage as input.
Click to view Stage 2 Command
python hifl/fl/localize.py \
--related_level \
--model "gpt-4-turbo" \
--reward_model "lapsel/HiLoRM" \
--reward_model_type "generate" \
--output_folder "results/related_elements" \
--top_n 3 \
--compress \
--start_file "results/file_level_combined/combined_locs.jsonl" \
--num_threads 15 \
--skip_existing \
--sample 3Goal: Identify the exact lines of code that need to be edited to fix the bug.
This final stage uses the function-level results to find the precise edit locations.
Click to view Stage 3 Command
python hifl/fl/localize.py \
--fine_grain_line_level \
--model "gpt-4-turbo" \
--reward_model "lapsel/HiLoRM" \
--reward_model_type "generate" \
--output_folder "results/edit_location_samples" \
--top_n 3 \
--compress \
--temperature 1 \
--num_samples 1 \
--start_file "results/related_elements/loc_outputs.jsonl" \
--num_threads 12 \
--skip_existing \
--sample 3