This repository is the official implementation of the ICLR 2026 paper “Latent Veracity Inference for Identifying Errors in Stepwise Reasoning.”
pip install -r requirements.txtKey dependencies (installed automatically):
- torch — latest stable PyTorch
- transformers — Hugging Face Transformers
- datasets — Hugging Face Datasets
- vllm — optional fast inference backend
- huggingface_hub — model hosting
- wandb — optional experiment logging
- peft — LoRA (Low-Rank Adaptation)
bash run_scripts/baseline.shbash run_scripts/veracity_search.shbash run_scripts/train_avi.shYou can edit each script or pass --model_name to change the LLM (default: Qwen3-4B).
veracity_inference/
├── baselines/
│ ├── baseline_utils/
│ │ ├── fewshot.py
│ │ ├── incontext_prompt.py
│ │ ├── logic_prompt.py
│ │ ├── math_prompt.py
│ │ ├── schema.py
│ │ ├── vllm_helper.py
│ │ └── voting.py
│ ├── base.py
│ ├── iterative.py
│ ├── many2many.py
│ ├── many2many_cot.py
│ ├── many2many_majority.py
│ ├── many2many_cot_majority.py
│ └── run_scripts/
├── search_utils/
│ ├── init_population.py
│ ├── mcmc.py
│ └── prompt.py
├── tasks/
│ ├── commonsense_reasoning/
│ │ ├── icl_examples/
│ │ └── test/
│ ├── logical_reasoning/
│ │ └── benchmark/
│ │ ├── icl_examples/
│ │ ├── pseudo_labels/
│ │ ├── test/
│ │ └── test_data/
│ └── math_reasoning/
│ └── benchmark/
│ ├── icl_examples/
│ └── test/
├── README.md
├── requirements.txt
├── run_baseline.py
├── run_search.py
└── train_avi.py
-
baselines/— baseline verifier implementationsbaseline_utils/— prompt builders, voting, vLLM helpers, etc.
-
search_utils/— MCMC and simulated-annealing search utilities -
tasks/— datasets and benchmarks (logical & math reasoning) -
run_baseline.py— launch baselines -
run_search.py— run VS -
train_avi.py— train LoRA AVI -
requirements.txt— complete dependency list