Abstract
Single Image Reflection Separation (SIRS) disentangles mixed images into transmission and reflection layers. Existing methods suffer from transmission-reflection confusion under nonlinear mixing, particularly in deep decoder layers, due to implicit fusion mechanisms and inadequate multi-scale coordination. We propose ReflexSplit, a dual-stream framework with three key innovations. (1) Cross scale Gated Fusion (CrGF) adaptively aggregates semantic priors, texture details, and decoder context across hier archical depths, stabilizing gradient flow and maintaining feature consistency. (2) Layer Fusion-Separation Blocks (LFSB) alternate between fusion for shared structure extraction and differential separation for layer-specific disentanglement. Inspired by Differential Transformer, we extend attention cancellation to dual-stream separation via cross-stream subtraction. (3) Curriculum training progressively strengthens differential separation through depth dependent initialization and epoch-wise warmup. Extensive experiments on synthetic and real-world benchmarks demonstrate state-of-the-art performance with superior perceptual quality and robust generalization.pip install torch>=2.0 torchvision
pip install numpy scipy scikit-learn matplotlib opencv-python tqdm einops tensorboardx tensorboard dominate
Datasets/
├── dataset1/
│ ├── blended/
| | ├── 1.png
| | ├── 2.png
| | ...
│ ├── reflection_layer/
| | ├── 1.png
| | ├── 2.png
| | ...
│ └── transmission_layer/
| ├── 1.png
| ├── 2.png
| ...
├── dataset2/
│ ├── blended/
| | ├── 1.png
| | ├── 2.png
| | ...
│ └── transmission_layer/
| ├── 1.png
| ├── 2.png
| ...
...
- 7,643 images from the Pascal VOC dataset, center-cropped as 224 x 224 slices to synthesize training pairs;
- 90 real-world training pairs provided by Zhang et al.;
- 200 real-world training pairs provided by IBCLN;
- 45 real-world testing images from CEILNet dataset;
- 20 real testing pairs provided by Zhang et al.;
- 20 real testing pairs provided by IBCLN;
- 500 real testing pairs from SIR^2 dataset, containing three subsets (i.e., Objects (200), Postcard (199), Wild (101)).
to be continued...
python train.py --name train --size_rounded --batchSize 1 --base_dir <YOUR_DATA_DIR>python eval.py --name eval --size_rounded --test_nature --weight_path <YOUR_WEIGHT_PATH> --base_dir <YOUR_DATA_DIR>@article{lee2026reflexsplit,
title={ReflexSplit: Single Image Reflection Separation via Layer Fusion-Separation},
author={Lee, Chia-Ming and Lin, Yu-Fan and Jiang, Jin-Hui and Hsiao, Yu-Jou and Hsu, Chih-Chung and Liu, Yu-Lun},
journal={arXiv preprint arXiv:2601.17468},
year={2026}
}



