Skip to content

alstn12088/veracity_inference

Repository files navigation

Latent Veracity Inference for Identifying Errors in Stepwise Reasoning

This repository is the official implementation of the ICLR 2026 paper “Latent Veracity Inference for Identifying Errors in Stepwise Reasoning.”


Installation

pip install -r requirements.txt

Key dependencies (installed automatically):

  • torch — latest stable PyTorch
  • transformers — Hugging Face Transformers
  • datasets — Hugging Face Datasets
  • vllm — optional fast inference backend
  • huggingface_hub — model hosting
  • wandb — optional experiment logging
  • peft — LoRA (Low-Rank Adaptation)

Run scripts

Baseline methods

bash run_scripts/baseline.sh

Veracity Search (VS)

bash run_scripts/veracity_search.sh

Train Amortized Veracity Inference (AVI) (using pseudo-labels from VS)

bash run_scripts/train_avi.sh

You can edit each script or pass --model_name to change the LLM (default: Qwen3-4B).


Repository Structure

veracity_inference/
├── baselines/
│   ├── baseline_utils/
│   │   ├── fewshot.py
│   │   ├── incontext_prompt.py
│   │   ├── logic_prompt.py
│   │   ├── math_prompt.py
│   │   ├── schema.py
│   │   ├── vllm_helper.py
│   │   └── voting.py
│   ├── base.py
│   ├── iterative.py
│   ├── many2many.py
│   ├── many2many_cot.py
│   ├── many2many_majority.py
│   ├── many2many_cot_majority.py
│   └── run_scripts/
├── search_utils/
│   ├── init_population.py
│   ├── mcmc.py
│   └── prompt.py
├── tasks/
│   ├── commonsense_reasoning/
│   │   ├── icl_examples/
│   │   └── test/
│   ├── logical_reasoning/
│   │   └── benchmark/
│   │       ├── icl_examples/
│   │       ├── pseudo_labels/
│   │       ├── test/
│   │       └── test_data/
│   └── math_reasoning/
│       └── benchmark/
│           ├── icl_examples/
│           └── test/
├── README.md
├── requirements.txt
├── run_baseline.py
├── run_search.py
└── train_avi.py

Description

  • baselines/ — baseline verifier implementations

    • baseline_utils/ — prompt builders, voting, vLLM helpers, etc.
  • search_utils/ — MCMC and simulated-annealing search utilities

  • tasks/ — datasets and benchmarks (logical & math reasoning)

  • run_baseline.py — launch baselines

  • run_search.py — run VS

  • train_avi.py — train LoRA AVI

  • requirements.txt — complete dependency list

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors