Research Vision 研究主軸
Our research addresses two intertwined challenges in modern computer vision: securing and robustifying vision systems against real-world degradations and adversarial threats, and deploying AI efficiently on resource-constrained platforms. On the robustness front, we tackle media integrity, perceptual restoration, and robust tracking for robotics and sports analytics. On the efficiency front, we develop hardware-agnostic attention mechanisms and quantization-friendly training to bring state-of-the-art models to edge devices without retraining. Together, these directions aim to make vision AI both trustworthy and deployable in the real world.
Research Categories
- Robust Vision Perception: PhaSR, ReflexSplit, PromptHSI, hyperspectral restoration
- Media Security & Integrity: GRACEv2, UMCL, DeepFake detection, trustworthy media analysis
- Robotics & Sports Vision: autonomous driving, tracking, embodied perception, challenge-driven evaluation
- Efficient AI: ELSA, QuantTune, hardware-aware deployment and model compression
A short introduction to my research: [PDF] (Latest updated: Oct. 2024)
Robust Shadow Removal
PhaSR: Generalized Image Shadow Removal with Physically Aligned Priors
Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026.
Shadow removal under complex and multi-source lighting is hindered by the mismatch between physical illumination priors and learned features. PhaSR couples physically aligned normalization with geometry-semantic rectification to deliver robust shadow removal that generalizes beyond traditional single-light settings.
Research Direction. Robust Vision Perception / Image Restoration
Reflection Separation in the Wild
ReflexSplit: Single Image Reflection Separation via Layer Fusion-Separation
Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026.
Reflections on glass introduce nonlinear layer mixing that often breaks existing separation networks. ReflexSplit uses dual-stream fusion-separation blocks and curriculum training to achieve robust performance on both synthetic and real-world benchmarks.
Research Direction. Robust Vision Perception / Image Restoration
Efficient AI Inference
ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers
Accepted to the CVPR 2026 Findings Workshop.
ELSA reformulates exact softmax attention as a prefix scan over an associative monoid, achieving memory-light inference with provable FP32 stability and no retraining. Implemented in Triton and CUDA C++, it improves deployability on both data-center and edge hardware.
Research Direction. Efficient AI / Hardware-Agnostic Inference
ArXiv coming soon
Quantization-Friendly Deployment
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Published in IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR) 2025.
QuantTune addresses outlier-driven dynamic range amplification during Transformer quantization and substantially reduces accuracy loss under low-bit settings. The method requires no extra inference-time hardware complexity and transfers across ViT, BERT, and OPT models.
Research Direction. Efficient AI / Model Compression
[arXiv] [IEEE Xplore]
Universal Hyperspectral Restoration
PromptHSI: Universal Hyperspectral Image Restoration with Vision-Language Modulated Frequency Adaptation
Published in IEEE Transactions on Geoscience and Remote Sensing (TGRS), Early Access, Feb. 2026.
PromptHSI is a universal all-in-one framework for hyperspectral restoration that combines frequency-aware modulation with vision-language guided prompt learning. A single model can handle cloud occlusion, blur, noise, and spectral band loss across remote sensing scenarios.
Research Direction. Robust Vision Perception / Hyperspectral Remote Sensing
[IEEE Xplore] [arXiv] [GitHub]
Media Security & DeepFake Robustness
Towards Robust DeepFake Detection under Unstable Face Sequences: Adaptive Sparse Graph Embedding with Order-Free Representation and Explicit Laplacian Spectral Prior
Submitted to IEEE Transactions on Information Forensics and Security (TIFS).
GRACEv2 targets unstable face sequences caused by compression, occlusion, and shuffled or missing frames. By combining order-free temporal graph embedding with an explicit Laplacian spectral prior, it improves robust DeepFake detection under severe real-world disruptions.
Research Direction. Media Security & Integrity / Secured Robust Vision
[arXiv]
Cross-Compression DeepFake Detection
UMCL: Unimodal-Generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection
Published in International Journal of Computer Vision (IJCV), Jan. 2026.
UMCL synthesizes compression-robust multimodal cues, including rPPG, temporal landmarks, and semantic embeddings, from a single visual input. The framework improves cross-compression DeepFake detection while preserving interpretable feature relationships.
Research Direction. Media Security & Integrity / DeepFake Detection
New SOTA SR Model
DRCT: Saving Image Super-Resolution away from Information Bottleneck
Presented at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024, NTIRE Workshop [Oral].
Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou
Semi-Supervised Learning in CT Scan Detection
A Closer Look at Spatial-Slice Features for COVID-19 Detection
Presented at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024, DEF-AI-MIA Workshop.
Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai
[PDF] [arXiv] [GitHub] [Project Page]
Ultra Fast Hyperspectral Image Compressive Sensing
Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat
Published in IEEE Transactions on Geoscience and Remote Sensing (TGRS).
Future Tech Award (未來科技獎)
Chih-Chung Hsu, Chih-Yu Jian, Eng-Shen Tu, Chia-Ming Lee, Guan-Lin Chen
[IEEE Xplore] [GitHub]
COVID-19 Symptoms Detection in CT Scan
Selected challenge papers and results
IEEE ECCV Workshop 2022 [1st place in COV19D challenge]
IEEE ICCV Workshop 2021 [3rd place in COV19D challenge]
Our models are designed for noisy, in-the-wild CT scans and remain robust across varying spatial and slice resolutions.
Social Media Prediction as Longitudinal Task (2022-)
A Comprehensive Study of Spatiotemporal Feature Learning for Social Media Popularity Prediction
Published in ACM Multimedia 2022.
C.C. Hsu, P.J. Tsai, T.C. Yeh, and X.U. Hou
We reformulate social media popularity prediction as an identity-preserving longitudinal task and study how multimodal temporal features improve prediction reliability over time.
[PDF]
Semantic Segmentation for Autonomous Driving (2021-)
Selected papers for robust and efficient scene understanding
IEEE ICME Workshop 2022
Augmented-Training-Aware Bisenet for Real-Time Semantic Segmentation [PDF]
IEEE ICASSP 2022
DCSN: Deformable Convolutional Semantic Segmentation Neural Network for Non-Rigid Scenes [PDF]
These projects focus on stable, real-time semantic understanding for autonomous driving, balancing robustness and low-compute deployment.
Fake Image/Video (DeepFake) Detection (2018-)
Selected papers and outreach
IEEE ICIP 2019 and Applied Sciences
Detecting Generated Image Based on Coupled Network with Two-Step Pairwise Learning
IEEE IS3C 2018
Learning to Detect Fake Face Images in the Wild
[Project] [PDF] [GitHub] [Online Demo]
偽造 / 造假照片偵測,聚焦於可信媒體分析與打擊假照片、假新聞。
Deep Compressed Sensing for Hyperspectral Images (2020-)
Selected papers for efficient satellite sensing
IEEE Transactions on Geoscience and Remote Sensing
DCSN: Deep Compressed Sensing Network for Efficient Hyperspectral Data Transmission of Miniaturized Satellite [PDF]
CVGIP 2020
Deep Joint Compression and Super-Resolution Low-Rank Network for Fast Hyperspectral Data Transmission
以深度學習為基礎之高光譜 / 多光譜影像超解析度與壓縮感知技術開發。
Decision-Making of Autonomous Vehicles Using Vision Information (2019-)
Selected work on robust visual decision-making
Multimedia Tools and Applications
Deep Learning-based Vehicle Trajectory Prediction based on Generative Adversarial Network for Autonomous Driving Applications
IEEE ICCE-TW 2020
Learning to Predict Risky Driving Behaviors for Autonomous Driving
[Large-Scale Vehicle Collision Dataset @ TW] [Link]
自駕車視覺系統之危險駕駛行為預測與台灣道路地區資料庫建置。
Social Media Prediction (2016-)
Selected outputs and awards
- ACM Multimedia 2017-2020
- Social Media Prediction Based on Residual Learning and Random Forest (2017). See the publication list for newer versions.
- 2 Best-Performance Awards and 2 Top-Performance Awards
- Best Grand Challenge Paper Award (2017)
- [GitHub] [PDF]
預測社群貼文點擊率與長期流行度變化。
Image Deblocking and Super-Resolution (2013-2014)
Learning-Based Joint Super-Resolution and Deblocking for a Highly Compressed Image
Published in TMM 2015 and presented at MMSP 2013.
MMSP 2013 Top 10% Paper Award
[Project Page] [PDF] [Matlab Source Code (32-bit only)]
同時去除區塊效應並提高解析度,讓放大後的影像維持清晰。
Super-Resolution of Textured Video (2012-2014)
Temporally Coherent Super-Resolution of Textured Video via Dynamic Texture Synthesis
Published in IEEE Transactions on Image Processing (TIP) and presented at MMSP 2014.
[Project Page] [PDF] [Matlab Code]
提供動態紋理視訊的超解析度技術,改善放大後的細節與時間一致性。
Quality Assessment for Image Retargeting (2011-2013)
Objective Quality Assessment for Image Retargeting Based on Perceptual Geometric Distortion and Information Loss
Published in IEEE Journal of Selected Topics in Signal Processing and presented at VCIP 2013.
[Project Page] [PDF] [Matlab Code]
評估影像濃縮技術的品質,量化幾何失真與資訊流失。
Super-Resolution (2010-2011)
Image Super-Resolution via Feature-Based Affine Transform
Presented at MMSP 2011.
[Project Page] [PDF] [Executable Code (Matlab)]
Note. We provide an implementation of NLM with the proposed method as an example.
影像超解析度技術依賴於資料庫,我們提出一種方法豐富資料庫的類型,提高放大的效果。
Face Hallucination (2008-2010)
Face Hallucination Using Bayesian Global Estimation and Local Basis Selection
Presented at MMSP 2010.
[Project Page] [PDF] [Matlab Code & Database]
人臉超解析度放大,從極低解析度人臉影像重建出較清晰的人臉結果。
Video Forensics (2007-2008)
Video Forgery Detection Using the Correlation of Noise Residue
Presented at MMSP 2008.
Citations > 100
[PDF] [Matlab Code] [Database]
視訊鑑識技術,聚焦於影片偽造偵測與可信媒體分析。
Image Authentication (2006-2007)
Image Authentication and Tampering Localization Based on Watermark Embedding in the Wavelet Domain
Published in Optical Engineering.
[PDF] [Source Code]
將浮水印藏入影像中,並可耐受不同攻擊以進行影像認證與竄改定位。
