Β Β π PaperΒ Β β Β Β π€ ModelsΒ Β β Β Β πΎ DatasetsΒ Β
- We further refine the captions in the Expert-AD dataset and retrain the model, obtaining IAD-R1-UPDATE. Compared with IAD-R1, IAD-R1-UPDATE achieves consistently better zero-shot detection performance.
- IAD-R1-UPDATE (Qwen2.5-VL-Instruct-3B) improves average accuracy by 2.7% over IAD-R1 (Qwen2.5-VL-Instruct-3B).
- IAD-R1-UPDATE (Qwen2-VL-Instruct-2B) improves average accuracy by 1.8% over IAD-R1 (Qwen2-VL-Instruct-2B).
- IAD-R1-UPDATE (Qwen2.5-VL-Instruct-7B) improves average accuracy by 1.2% over IAD-R1 (Qwen2.5-VL-Instruct-7B).
- Additional experimental results are provided in
supplementary_results/IAD-R1-UPDATE_Results.
- We additionally report the zero-shot results of Qwen-VL-MAX on the six evaluation benchmarks; see
supplementary_results/Commercial-model_Results/qwen_vl_max. - We further include 2-shot and 4-shot results for both the pretrained models and their IAD-R1βfinetuned counterparts; see
supplementary_results/Few-Shot-Settings_Results.
Industrial anomaly detection is a critical component of modern manufacturing, yet the scarcity of defective samples restricts traditional detection methods to scenario-specific applications. Although Vision-Language Models (VLMs) demonstrate significant advantages in generalization capabilities, their performance in industrial anomaly detection remains limited. To address this challenge, we propose IAD-R1, a universal post-training framework applicable to VLMs of different architectures and parameter scales, which substantially enhances their anomaly detection capabilities. IAD-R1 employs a two-stage training strategy: the Perception Activation Supervised Fine-Tuning (PA-SFT) stage utilizes a meticulously constructed high-quality Chain-of-Thought dataset (Expert-AD) for training, enhancing anomaly perception capabilities and establishing reasoning-to-answer correlations; the Structured Control Group Relative Policy Optimization (SC-GRPO) stage employs carefully designed reward functions to achieve a capability leap from "Anomaly Perception" to "Anomaly Interpretation". Experimental results demonstrate that IAD-R1 achieves significant improvements across 7 VLMs, the largest improvement was on the DAGM dataset, with average accuracy 43.3% higher than the 0.5B baseline. Notably, the 0.5B parameter model trained with IAD-R1 surpasses commercial models including GPT-4.1 and Claude-Sonnet-4 in zero-shot settings, demonstrating the effectiveness and superiority of IAD-R1.
| LLaVA-OneVision-SI-0.5B | Qwen2-VL-Instruct-2B | Qwen2.5-VL-Instruct-3B | Qwen2.5-VL-Instruct-7B | LLaVA-1.5-7B | LLaVA-OneVision-SI-7B | LLaVA-1.6-8B | |
|---|---|---|---|---|---|---|---|
| IAD-R1 | IAD-R1(LLaVA-OneVision-SI-0.5B) | IAD-R1(Qwen2-VL-Instruct-2B) | IAD-R1(Qwen2.5-VL-Instruct-3B) | IAD-R1(Qwen2.5-VL-Instruct-7B) | IAD-R1(LLaVA-1.5-7B) | IAD-R1(LLaVA-OneVision-SI-7B) | IAD-R1(LLaVA-1.6-8B) |
| IAD-R1-UPDATE | IAD-R1-UPDATE(LLaVA-OneVision-SI-0.5B) | IAD-R1-UPDATE(Qwen2-VL-Instruct-2B) | IAD-R1-UPDATE(Qwen2.5-VL-Instruct-3B) | IAD-R1-UPDATE(Qwen2.5-VL-Instruct-7B) | IAD-R1-UPDATE(LLaVA-1.5-7B) | IAD-R1-UPDATE(LLaVA-OneVision-SI-7B) | IAD-R1-UPDATE(LLaVA-1.6-8B) |
# clone our project.
git clone https://github.com/Yanhui-Lee/IAD-R1.git
cd IAD-R1
# build conda env.
conda create -n IAD-R1 python=3.10
conda activate IAD-R1
pip install -r requirements.txt- Construct Training Dataset for PA-SFT, SC-GRPO.
For PA-SFT training:
[
{
"images": "image_path",
"messages": [
{
"role": "user",
"content": "<image>\nAre there any defects in the query image?"
},
{
"role": "assistant",
"content": "<think>[Thinking process]</think><answer>No</answer>"
}
]
},
{
"images": "image_path",
"messages": [
{
"role": "user",
"content": "<image>\nAre there any defects in the query image?"
},
{
"role": "assistant",
"content": "<think>[Thinking process]</think><location>[Location]</location><type>[Type]</type><answer>Yes</answer>"
}
]
}
]For SC-GRPO training:
[
{
"id": "xxx",
"image": "image_path",
"problem": "Are there any defects in the query image?",
"solution": "<answer>No</answer>"
},
{
"id": "xxx",
"image": "image_path",
"problem": "Are there any defects in the query image?",
"solution": "<location>[Location]</location><type>[Type]</type><answer>Yes</answer>"
}
]- We adopt a zero-shot training setup during the training process. If you want to use a one-shot training setup, you can follow the construction method below.
For PA-SFT training:
[
{
"images": [
"ref_image_path",
"query_image_path"
],
"messages": [
{
"role": "user",
"content": "'few-shot prompt' + <image>\n<image>\nAre there any defects in the query image?"
},
{
"role": "assistant",
"content": "<think>[Thinking process]</think><location>[Location]</location><type>[Type]</type><answer>Yes</answer>"
}
]
}
]For SC-GRPO training:
[
{
"id": "xxx",
"image": [
"ref_image_path",
"query_image_path"
],
"problem": "Are there any defects in the query image?",
"solution": "<location>[Location]</location><type>[Type]</type><answer>Yes</answer>"
}
]Don't forget to set '--single_img 0' in scripts (SC-GRPO stage) in order to use the prompts for 1-shot settings.
- Download the test data we compiled and organize it according to the following format.
# Inference images
Industrial_test
βββ DAGM/
β βββ Class1
β βββ Class2
β βββ ...
βββ DS-MVTec/
β βββ bottle
β βββ cable
β βββ ...
βββ DTD/
β βββ Blotchy_099
β βββ Fibrous_183
β βββ ...
βββ MPDD/
β βββ bracket_black
β βββ bracket_brown
β βββ ...
βββ SDD
β βββ test
βββ VisA/
β βββ candle
β βββ capsules
β βββ ...
# Inference json
data/Test/
βββ test_DAGM_format.json
βββ test_MVTec_format.json
βββ test_DTD_format.json
βββ test_MPDD_format.json
βββ test_SDD_format.json
βββ test_VisA_format.json- Construct your test dataset.
Prepare meta.json for your dataset
{
"category1":
[
{
"img_path": "your image path",
"mask_path": "",
"cls_name": "object1",
"specie_name": "good",
"anomaly": 0
},
{
"img_path": "your image path",
"mask_path": "",
"cls_name": "object2",
"specie_name": "good",
"anomaly": 0
}
]
}cd Industrial_test
python convert.py- LLaVA-OneVision-SI-0.5B
- Qwen2-VL-Instruct-2B
- Qwen2.5-VL-Instruct-3B
- InternVL-2.5-4B
- Qwen2.5-VL-Instruct-7B
- LLaVA-1.5-7B
- LLaVA-OneVision-SI-7B
- LLaVA-1.6-8B(mistral)
- LLaVA-1.5-13B
- LLaVA-1.6-34B
- Qwen2.5-VL-Instruct-72B
Download these models to your local projectοΌfinal_model/PretrainοΌ.
# PA-SFT Stage
## LLaVA-OneVision-SI-0.5B
bash scripts/train/PA_SFT/PA_SFT_LLaVA_OneVision_SI_0.5B.sh
## Qwen2-VL-Instruct
bash scripts/train/PA_SFT/PA_SFT_Qwen_Instruct_2_VL.sh
## Qwen2.5-VL-Instruct-3B
bash scripts/train/PA_SFT/PA_SFT_Qwen_Instruct_2_5_VL_3B.sh
## Qwen2.5-VL-Instruct-7B
bash scripts/train/PA_SFT/PA_SFT_Qwen_Instruct_2_5_VL_7B.sh
## LLaVA-1.5
bash scripts/train/PA_SFT/PA_SFT_LLaVA_1_5.sh
## LLaVA-OneVision-SI-7B
bash scripts/train/PA_SFT/PA_SFT_LLaVA_OneVision_SI_7B.sh
## LLaVA-1.6(LLaVA-next)
bash scripts/train/PA_SFT/PA_SFT_LLaVA_1_6.sh# SC-GRPO Stage
## LLaVA-OneVision-SI-0.5B
bash scripts/train/SC_GRPO/SC_GRPO_LLaVA_OneVision_SI_0.5B.sh
## Qwen2-VL-Instruct
bash scripts/train/SC_GRPO/SC_GRPO_Qwen_Instruct_2_VL.sh
## Qwen2.5-VL-Instruct-3B
bash scripts/train/SC_GRPO/SC_GRPO_Qwen_Instruct_2_5_VL_3B.sh
## Qwen2.5-VL-Instruct-7B
bash scripts/train/SC_GRPO/SC_GRPO_Qwen_Instruct_2_5_VL_7B.sh
## LLaVA-1.5
bash scripts/train/SC_GRPO/SC_GRPO_LLaVA_1_5.sh
## LLaVA-OneVision-SI-7B
bash scripts/train/SC_GRPO/SC_GRPO_LLaVA_OneVision_SI_7B.sh
## LLaVA-1.6(LLaVA-next)
bash scripts/train/SC_GRPO/SC_GRPO_LLaVA_1_6.sh- Commercial Model Inference Scripts.
# Commercial Model Inference
## OpenAI
### GPT 4.1
bash scripts/Inference/Commercial-Inference/ChatGPT_4.1_Inference.sh
### GPT 4.1-mini
bash scripts/Inference/Commercial-Inference/ChatGPT_4.1_mini_Inference.sh
### GPT 4.1-nano
bash scripts/Inference/Commercial-Inference/ChatGPT_4.1_nano_Inference.sh
### GPT 4o
bash scripts/Inference/Commercial-Inference/Chatgpt_4o_Inference.sh
### GPT 4o-mini
bash scripts/Inference/Commercial-Inference/ChatGPT_4o_mini_Inference.sh
## Claude-Sonnet-4
bash scripts/Inference/Commercial-Inference/Claude_Sonnet_4_Inference.sh
## Qwen-VL-MAX
bash scripts/Inference/Commercial-Inference/Qwen_VL_MAX_Inference.sh- Pretrain Model Inference Scripts.
# Pretrain Model Inference
## LLaVA-OneVision-SI-0.5B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_OneVision_SI_0_5_B_Inference.sh
# Qwen2-VL-Instruct-2B
bash scripts/Inference/Pretrain-Inference/vLLM_Qwen2_VL_Instruct_2B_Inference.sh
## Qwen2.5-VL-Instruct-3B
bash scripts/Inference/Pretrain-Inference/vLLM_Qwen2_5_VL_Instruct_3B_Inference.sh
## InternVL-2.5-4B
bash scripts/Inference/Pretrain-Inference/vLLM_InternVL_2_5_4B_Inference.sh
## Qwen2.5-VL-Instruct-7B
bash scripts/Inference/Pretrain-Inference/vLLM_Qwen2_5_VL_Instruct_7B_Inference.sh
## LLaVA-OneVision-SI-7B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_OneVision_SI_7B_Inference.sh
## LLaVA-1.5-7B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_1_5_7B_Inference.sh
## LLaVA-1.6(mistral)-8B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_1_6_8B_Inference.sh
## LLaVA-1.5-13B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_1_5_13B_Inference.sh
## LLaVA-1.6-34B
bash scripts/Inference/Pretrain-Inference/vLLM_LLaVA_1_6_34B_Inference.sh
## Qwen2.5-VL-Instruct-72B
bash scripts/Inference/Pretrain-Inference/vLLM_Qwen2_5_VL_Instruct_72B_Inference.sh- IAD-R1 Model Inference Scripts.
# IAD-R1 Model Inference
## LLaVA-OneVision-SI-0.5B
bash scripts/Inference/IAD-R1-Inference/vLLM_LLaVA_OneVision_SI_0_5_B_Inference.sh
# Qwen2-VL-Instruct-2B
bash scripts/Inference/IAD-R1-Inference/vLLM_Qwen2_VL_Instruct_Inference.sh
## Qwen2.5-VL-Instruct-3B
bash scripts/Inference/IAD-R1-Inference/vLLM_Qwen2_5_VL_Instruct_3B_Inference.sh
## Qwen2.5-VL-Instruct-7B
bash scripts/Inference/IAD-R1-Inference/vLLM_Qwen2_5_VL_Instruct_7B_Inference.sh
## LLaVA-1.5-7B
bash scripts/Inference/IAD-R1-Inference/vLLM_LLaVA_1_5_Inference.sh
## LLaVA-OneVision-SI-7B
bash scripts/Inference/IAD-R1-Inference/vLLM_LLaVA_OneVision_SI_7B_Inference.sh
## LLaVA-1.6(mistral)-8B
bash scripts/Inference/IAD-R1-Inference/vLLM_LLaVA_1_6_Inference.sh- Open-source Model Inference Script.
# Open-source Model Inference
## Anomaly-R1
bash scripts/Inference/Anomaly-R1-Inference/Anomaly-R1-Inference.sh