IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection

We propose IAD-R1, a universal post-training framework that enhances Vision-Language Models for industrial anomaly detection through a two-stage training strategy.

📔 Paper │ 🤖 Models │ 💾 Datasets

🗞️ More Experiment Results

We further refine the captions in the Expert-AD dataset and retrain the model, obtaining IAD-R1-UPDATE. Compared with IAD-R1, IAD-R1-UPDATE achieves consistently better zero-shot detection performance.
- IAD-R1-UPDATE (Qwen2.5-VL-Instruct-3B) improves average accuracy by 2.7% over IAD-R1 (Qwen2.5-VL-Instruct-3B).
- IAD-R1-UPDATE (Qwen2-VL-Instruct-2B) improves average accuracy by 1.8% over IAD-R1 (Qwen2-VL-Instruct-2B).
- IAD-R1-UPDATE (Qwen2.5-VL-Instruct-7B) improves average accuracy by 1.2% over IAD-R1 (Qwen2.5-VL-Instruct-7B).
- Additional experimental results are provided in supplementary_results/IAD-R1-UPDATE_Results.
We additionally report the zero-shot results of Qwen-VL-MAX on the six evaluation benchmarks; see supplementary_results/Commercial-model_Results/qwen_vl_max.
We further include 2-shot and 4-shot results for both the pretrained models and their IAD-R1–finetuned counterparts; see supplementary_results/Few-Shot-Settings_Results.

🔥 Overview

Industrial anomaly detection is a critical component of modern manufacturing, yet the scarcity of defective samples restricts traditional detection methods to scenario-specific applications. Although Vision-Language Models (VLMs) demonstrate significant advantages in generalization capabilities, their performance in industrial anomaly detection remains limited. To address this challenge, we propose IAD-R1, a universal post-training framework applicable to VLMs of different architectures and parameter scales, which substantially enhances their anomaly detection capabilities. IAD-R1 employs a two-stage training strategy: the Perception Activation Supervised Fine-Tuning (PA-SFT) stage utilizes a meticulously constructed high-quality Chain-of-Thought dataset (Expert-AD) for training, enhancing anomaly perception capabilities and establishing reasoning-to-answer correlations; the Structured Control Group Relative Policy Optimization (SC-GRPO) stage employs carefully designed reward functions to achieve a capability leap from "Anomaly Perception" to "Anomaly Interpretation". Experimental results demonstrate that IAD-R1 achieves significant improvements across 7 VLMs, the largest improvement was on the DAGM dataset, with average accuracy 43.3% higher than the 0.5B baseline. Notably, the 0.5B parameter model trained with IAD-R1 surpasses commercial models including GPT-4.1 and Claude-Sonnet-4 in zero-shot settings, demonstrating the effectiveness and superiority of IAD-R1.

👾 IAD-R1 Model

	LLaVA-OneVision-SI-0.5B	Qwen2-VL-Instruct-2B	Qwen2.5-VL-Instruct-3B	Qwen2.5-VL-Instruct-7B	LLaVA-1.5-7B	LLaVA-OneVision-SI-7B	LLaVA-1.6-8B
IAD-R1	IAD-R1(LLaVA-OneVision-SI-0.5B)	IAD-R1(Qwen2-VL-Instruct-2B)	IAD-R1(Qwen2.5-VL-Instruct-3B)	IAD-R1(Qwen2.5-VL-Instruct-7B)	IAD-R1(LLaVA-1.5-7B)	IAD-R1(LLaVA-OneVision-SI-7B)	IAD-R1(LLaVA-1.6-8B)
IAD-R1-UPDATE	IAD-R1-UPDATE(LLaVA-OneVision-SI-0.5B)	IAD-R1-UPDATE(Qwen2-VL-Instruct-2B)	IAD-R1-UPDATE(Qwen2.5-VL-Instruct-3B)	IAD-R1-UPDATE(Qwen2.5-VL-Instruct-7B)	IAD-R1-UPDATE(LLaVA-1.5-7B)	IAD-R1-UPDATE(LLaVA-OneVision-SI-7B)	IAD-R1-UPDATE(LLaVA-1.6-8B)

🚀 Get Start

🛠️ Environment Configuration

# clone our project.
git clone https://github.com/Yanhui-Lee/IAD-R1.git
cd IAD-R1

# build conda env.
conda create -n IAD-R1 python=3.10
conda activate IAD-R1
pip install -r requirements.txt

🧰 Data Preparation

Training data

Construct Training Dataset for PA-SFT, SC-GRPO.

For PA-SFT training:

[
  {
      "images": "image_path",
      "messages": [
        {
          "role": "user",
          "content": "<image>\nAre there any defects in the query image?"
        },
        {
          "role": "assistant",
          "content": "<think>[Thinking process]</think><answer>No</answer>"
        }
      ]
  },
  {
      "images": "image_path",
      "messages": [
        {
          "role": "user",
          "content": "<image>\nAre there any defects in the query image?"
        },
        {
          "role": "assistant",
          "content": "<think>[Thinking process]</think><location>[Location]</location><type>[Type]</type><answer>Yes</answer>"
        }
      ]
  }
]

For SC-GRPO training:

[
  {
      "id": "xxx",
      "image": "image_path",
      "problem": "Are there any defects in the query image?",
      "solution": "<answer>No</answer>"
  },
  {
      "id": "xxx",
      "image": "image_path",
      "problem": "Are there any defects in the query image?",
      "solution": "<location>[Location]</location><type>[Type]</type><answer>Yes</answer>"
  }
]

We adopt a zero-shot training setup during the training process. If you want to use a one-shot training setup, you can follow the construction method below.

For PA-SFT training:

[
  {
      "images": [
        "ref_image_path",   
        "query_image_path" 
      ],
      "messages": [
        {
          "role": "user",
          "content": "'few-shot prompt' + <image>\n<image>\nAre there any defects in the query image?"
        },
        {
          "role": "assistant",
          "content": "<think>[Thinking process]</think><location>[Location]</location><type>[Type]</type><answer>Yes</answer>"
        }
      ]
  }
]

For SC-GRPO training:

[
  {
      "id": "xxx",
      "image": [
          "ref_image_path", 
          "query_image_path"
      ],
      "problem": "Are there any defects in the query image?",
      "solution": "<location>[Location]</location><type>[Type]</type><answer>Yes</answer>"
  }
]

Don't forget to set '--single_img 0' in scripts (SC-GRPO stage) in order to use the prompts for 1-shot settings.

Inference data

Download the test data we compiled and organize it according to the following format.

# Inference images
Industrial_test
├── DAGM/
│   ├── Class1
│   └── Class2
│   └── ...
├── DS-MVTec/
│   ├── bottle
│   └── cable
│   └── ...
├── DTD/
│   ├── Blotchy_099
│   └── Fibrous_183
│   └── ...
├── MPDD/
│   ├── bracket_black
│   └── bracket_brown
│   └── ...
├── SDD
│   ├── test
├── VisA/
│   ├── candle
│   └── capsules
│   └── ...

# Inference json
data/Test/
├── test_DAGM_format.json
├── test_MVTec_format.json
├── test_DTD_format.json
├── test_MPDD_format.json
├── test_SDD_format.json
├── test_VisA_format.json

Construct your test dataset.

Prepare meta.json for your dataset

{
    "category1":
    [
        {
            "img_path": "your image path",
            "mask_path": "",
            "cls_name": "object1",
            "specie_name": "good",
            "anomaly": 0
        },
        {
            "img_path": "your image path",
            "mask_path": "",
            "cls_name": "object2",
            "specie_name": "good",
            "anomaly": 0
        }
    ]
}

cd Industrial_test
python convert.py

🧠 Training

Model Preparation

Download these models to your local project（final_model/Pretrain）.

Perception Activation Supervised Fine-Tuning Stage Scripts

# PA-SFT Stage

## LLaVA-OneVision-SI-0.5B
bash scripts/train/PA_SFT/PA_SFT_LLaVA_OneVision_SI_0.5B.sh

## Qwen2-VL-Instruct
bash scripts/train/PA_SFT/PA_SFT_Qwen_Instruct_2_VL.sh

## Qwen2.5-VL-Instruct-3B
bash scripts/train/PA_SFT/PA_SFT_Qwen_Instruct_2_5_VL_3B.sh

## Qwen2.5-VL-Instruct-7B
bash scripts/train/PA_SFT/PA_SFT_Qwen_Instruct_2_5_VL_7B.sh

## LLaVA-1.5
bash scripts/train/PA_SFT/PA_SFT_LLaVA_1_5.sh

## LLaVA-OneVision-SI-7B
bash scripts/train/PA_SFT/PA_SFT_LLaVA_OneVision_SI_7B.sh

## LLaVA-1.6(LLaVA-next)
bash scripts/train/PA_SFT/PA_SFT_LLaVA_1_6.sh

Structured Control Group Relative Policy Optimization Stage Scripts

# SC-GRPO Stage

## LLaVA-OneVision-SI-0.5B
bash scripts/train/SC_GRPO/SC_GRPO_LLaVA_OneVision_SI_0.5B.sh

## Qwen2-VL-Instruct
bash scripts/train/SC_GRPO/SC_GRPO_Qwen_Instruct_2_VL.sh

## Qwen2.5-VL-Instruct-3B
bash scripts/train/SC_GRPO/SC_GRPO_Qwen_Instruct_2_5_VL_3B.sh

## Qwen2.5-VL-Instruct-7B
bash scripts/train/SC_GRPO/SC_GRPO_Qwen_Instruct_2_5_VL_7B.sh

## LLaVA-1.5
bash scripts/train/SC_GRPO/SC_GRPO_LLaVA_1_5.sh

## LLaVA-OneVision-SI-7B
bash scripts/train/SC_GRPO/SC_GRPO_LLaVA_OneVision_SI_7B.sh

## LLaVA-1.6(LLaVA-next)
bash scripts/train/SC_GRPO/SC_GRPO_LLaVA_1_6.sh

📊 Inference

Commercial Model Inference Scripts.

# Commercial Model Inference

## OpenAI
### GPT 4.1
bash scripts/Inference/Commercial-Inference/ChatGPT_4.1_Inference.sh

### GPT 4.1-mini
bash scripts/Inference/Commercial-Inference/ChatGPT_4.1_mini_Inference.sh

### GPT 4.1-nano
bash scripts/Inference/Commercial-Inference/ChatGPT_4.1_nano_Inference.sh

### GPT 4o
bash scripts/Inference/Commercial-Inference/Chatgpt_4o_Inference.sh

### GPT 4o-mini
bash scripts/Inference/Commercial-Inference/ChatGPT_4o_mini_Inference.sh

## Claude-Sonnet-4
bash scripts/Inference/Commercial-Inference/Claude_Sonnet_4_Inference.sh

## Qwen-VL-MAX
bash scripts/Inference/Commercial-Inference/Qwen_VL_MAX_Inference.sh

Pretrain Model Inference Scripts.

# Pretrain Model Inference

## LLaVA-OneVision-SI-0.5B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_OneVision_SI_0_5_B_Inference.sh

# Qwen2-VL-Instruct-2B
bash scripts/Inference/Pretrain-Inference/vLLM_Qwen2_VL_Instruct_2B_Inference.sh

## Qwen2.5-VL-Instruct-3B
bash scripts/Inference/Pretrain-Inference/vLLM_Qwen2_5_VL_Instruct_3B_Inference.sh

## InternVL-2.5-4B
bash scripts/Inference/Pretrain-Inference/vLLM_InternVL_2_5_4B_Inference.sh

## Qwen2.5-VL-Instruct-7B
bash scripts/Inference/Pretrain-Inference/vLLM_Qwen2_5_VL_Instruct_7B_Inference.sh

## LLaVA-OneVision-SI-7B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_OneVision_SI_7B_Inference.sh

## LLaVA-1.5-7B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_1_5_7B_Inference.sh

## LLaVA-1.6(mistral)-8B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_1_6_8B_Inference.sh

## LLaVA-1.5-13B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_1_5_13B_Inference.sh

## LLaVA-1.6-34B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_1_6_34B_Inference.sh

## Qwen2.5-VL-Instruct-72B
bash scripts/Inference/Pretrain-Inference/vLLM_Qwen2_5_VL_Instruct_72B_Inference.sh

IAD-R1 Model Inference Scripts.

# IAD-R1 Model Inference 

## LLaVA-OneVision-SI-0.5B
bash scripts/Inference/IAD-R1-Inference/vLLM_LLaVA_OneVision_SI_0_5_B_Inference.sh

# Qwen2-VL-Instruct-2B
bash scripts/Inference/IAD-R1-Inference/vLLM_Qwen2_VL_Instruct_Inference.sh

## Qwen2.5-VL-Instruct-3B
bash scripts/Inference/IAD-R1-Inference/vLLM_Qwen2_5_VL_Instruct_3B_Inference.sh

## Qwen2.5-VL-Instruct-7B
bash scripts/Inference/IAD-R1-Inference/vLLM_Qwen2_5_VL_Instruct_7B_Inference.sh

## LLaVA-1.5-7B
bash scripts/Inference/IAD-R1-Inference/vLLM_LLaVA_1_5_Inference.sh

## LLaVA-OneVision-SI-7B
bash scripts/Inference/IAD-R1-Inference/vLLM_LLaVA_OneVision_SI_7B_Inference.sh

## LLaVA-1.6(mistral)-8B
bash scripts/Inference/IAD-R1-Inference/vLLM_LLaVA_1_6_Inference.sh

Open-source Model Inference Script.

# Open-source Model Inference

## Anomaly-R1
bash scripts/Inference/Anomaly-R1-Inference/Anomaly-R1-Inference.sh

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
GPT4		GPT4
assets		assets
data		data
helper		helper
scripts		scripts
supplementary_results		supplementary_results
train		train
trl		trl
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection

🗞️ More Experiment Results

🔥 Overview

👾 IAD-R1 Model

🚀 Get Start

🛠️ Environment Configuration

🧰 Data Preparation

Training data

Inference data

🧠 Training

Model Preparation

Perception Activation Supervised Fine-Tuning Stage Scripts

Structured Control Group Relative Policy Optimization Stage Scripts

📊 Inference

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Yanhui-Lee/IAD-R1

Folders and files

Latest commit

History

Repository files navigation

IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection

🗞️ More Experiment Results

🔥 Overview

👾 IAD-R1 Model

🚀 Get Start

🛠️ Environment Configuration

🧰 Data Preparation

Training data

Inference data

🧠 Training

Model Preparation

Perception Activation Supervised Fine-Tuning Stage Scripts

Structured Control Group Relative Policy Optimization Stage Scripts

📊 Inference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages