Research Vision 研究主軸

Our research addresses two intertwined challenges in modern computer vision: securing and robustifying vision systems against real-world degradations and adversarial threats, and deploying AI efficiently on resource-constrained platforms. On the robustness front, we tackle media integrity, perceptual restoration, and robust tracking for robotics and sports analytics. On the efficiency front, we develop hardware-agnostic attention mechanisms and quantization-friendly training to bring state-of-the-art models to edge devices without retraining. Together, these directions aim to make vision AI both trustworthy and deployable in the real world.

Research Categories

Robust Vision Perception: PhaSR, ReflexSplit, PromptHSI, hyperspectral restoration
Media Security & Integrity: GRACEv2, UMCL, DeepFake detection, trustworthy media analysis
Robotics & Sports Vision: autonomous driving, tracking, embodied perception, challenge-driven evaluation
Efficient AI: ELSA, QuantTune, hardware-aware deployment and model compression

A short introduction to my research: [PDF] (Latest updated: Oct. 2024)

Robust Shadow Removal

PhaSR: Generalized Image Shadow Removal with Physically Aligned Priors

Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026.

Shadow removal under complex and multi-source lighting is hindered by the mismatch between physical illumination priors and learned features. PhaSR couples physically aligned normalization with geometry-semantic rectification to deliver robust shadow removal that generalizes beyond traditional single-light settings.

Research Direction. Robust Vision Perception / Image Restoration

[arXiv] [GitHub]

Reflection Separation in the Wild

ReflexSplit: Single Image Reflection Separation via Layer Fusion-Separation

Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026.

Reflections on glass introduce nonlinear layer mixing that often breaks existing separation networks. ReflexSplit uses dual-stream fusion-separation blocks and curriculum training to achieve robust performance on both synthetic and real-world benchmarks.

Research Direction. Robust Vision Perception / Image Restoration

[arXiv] [GitHub]

Efficient AI Inference

ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers

Accepted to the CVPR 2026 Findings Workshop.

ELSA reformulates exact softmax attention as a prefix scan over an associative monoid, achieving memory-light inference with provable FP32 stability and no retraining. Implemented in Triton and CUDA C++, it improves deployability on both data-center and edge hardware.

Research Direction. Efficient AI / Hardware-Agnostic Inference

ArXiv coming soon

Quantization-Friendly Deployment

QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning

Published in IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR) 2025.

QuantTune addresses outlier-driven dynamic range amplification during Transformer quantization and substantially reduces accuracy loss under low-bit settings. The method requires no extra inference-time hardware complexity and transfers across ViT, BERT, and OPT models.

Research Direction. Efficient AI / Model Compression

[arXiv] [IEEE Xplore]

Universal Hyperspectral Restoration

PromptHSI: Universal Hyperspectral Image Restoration with Vision-Language Modulated Frequency Adaptation

Published in IEEE Transactions on Geoscience and Remote Sensing (TGRS), Early Access, Feb. 2026.

PromptHSI is a universal all-in-one framework for hyperspectral restoration that combines frequency-aware modulation with vision-language guided prompt learning. A single model can handle cloud occlusion, blur, noise, and spectral band loss across remote sensing scenarios.

Research Direction. Robust Vision Perception / Hyperspectral Remote Sensing

[IEEE Xplore] [arXiv] [GitHub]

Media Security & DeepFake Robustness

Towards Robust DeepFake Detection under Unstable Face Sequences: Adaptive Sparse Graph Embedding with Order-Free Representation and Explicit Laplacian Spectral Prior

Submitted to IEEE Transactions on Information Forensics and Security (TIFS).

GRACEv2 targets unstable face sequences caused by compression, occlusion, and shuffled or missing frames. By combining order-free temporal graph embedding with an explicit Laplacian spectral prior, it improves robust DeepFake detection under severe real-world disruptions.

Research Direction. Media Security & Integrity / Secured Robust Vision

[arXiv]

Cross-Compression DeepFake Detection

UMCL: Unimodal-Generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection

Published in International Journal of Computer Vision (IJCV), Jan. 2026.

UMCL synthesizes compression-robust multimodal cues, including rPPG, temporal landmarks, and semantic embeddings, from a single visual input. The framework improves cross-compression DeepFake detection while preserving interpretable feature relationships.

Research Direction. Media Security & Integrity / DeepFake Detection

[Springer] [DOI] [arXiv]