- [2026/01] π₯ Our paper has been accepted by ICASSP 2026!
This repository contains the official PyTorch implementation of HSSDCT (Hierarchical Spatial-Spectral Dense Correlation Network).
HSSDCT introduces a novel framework for fusing low-resolution hyperspectral images (LR-HSI) with high-resolution multispectral images (HR-MSI). Unlike recent Transformer-based methods that suffer from quadratic complexity, our method proposes:
- Hierarchical Dense-Residue Transformer Block (HDRTB): Progressively enlarges receptive fields with dense-residue connections for multi-scale feature aggregation.
- Spatial-Spectral Correlation Layer (SSCL): Explicitly factorizes spatial and spectral dependencies, reducing self-attention to linear complexity while mitigating spectral redundancy.
Extensive experiments demonstrate that HSSDCT achieves state-of-the-art performance with significantly lower computational costs compared to recent methods like FusionMamba and QRCODE.
- Environment Setup
- Data Preparation
- Usage
- Project Structure
- Configuration Options
- Evaluation Metrics
- Citation
- Acknowledgments
- Python 3.8 or higher
- CUDA 12.x compatible GPU (recommended: NVIDIA GPU with β₯16GB memory)
- Conda (recommended) or pip
# Clone the repository
git clone https://github.com/your-username/HSSDCT.git
cd HSSDCT
# Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Create conda environment
conda create -n hssdct python=3.10
conda activate hssdct
# Install PyTorch with CUDA support
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
# Install remaining dependencies
pip install -r requirements.txtThe dataset should be organized in the following structure under your data root directory:
<data_root>/
βββ train/
β βββ <scene_name>/
β β βββ GT.npz.npy # Ground truth HRHSI (256Γ256Γ172), float32
β β βββ hrmsi.npz # High-resolution MSI, contains keys: 'hrmsi4', 'hrmsi6'
β β βββ lrhsi.npz # Low-resolution HSI, contains keys: 'lrhsi', 'lrhsi1', 'lrhsi2', 'lrhsi3'
β βββ Hama9_15/
β βββ TBD4_39/
β βββ ...
βββ val/
β βββ <scene_name>/
β β βββ GT.npz.npy
β β βββ hrmsi.npz
β β βββ lrhsi.npz
β βββ ...
βββ test/
βββ <scene_name>/
β βββ GT.npz.npy
β βββ hrmsi.npz
β βββ lrhsi.npz
βββ ...
| File | Format | Shape | Description |
|---|---|---|---|
GT.npz.npy |
NumPy array | (256, 256, 172) | Ground truth high-resolution hyperspectral image |
hrmsi.npz |
NumPy compressed | (256, 256, 4/6) | High-resolution multispectral image (4 or 6 bands) |
lrhsi.npz |
NumPy compressed | (64, 64, 172) | Low-resolution hyperspectral image (4Γ downsampled) |
To train the model from scratch:
python train.py \
--root /path/to/your/data \
--train_file ./data_path/train.txt \
--val_file ./data_path/val.txt \
--prefix EXPERIMENT_NAME \
--batch_size 6 \
--epochs 1000 \
--lr 0.000055 \
--lr_scheduler cosine \
--msi_bands 4 \
--bands 172 \
--crop_size 128 \
--image_size 256 \
--network_mode 1 \
--device cuda:0| Argument | Default | Description |
|---|---|---|
--root |
/ssd4t/Fusion_data |
Root directory of the dataset |
--prefix |
ASTUDY_num2_BAND4_SWINY_SNR0 |
Experiment name for checkpoints and logs |
--batch_size |
6 | Training batch size |
--epochs |
1000 | Total training epochs |
--lr |
5.5e-5 | Initial learning rate |
--lr_scheduler |
cosine |
Learning rate scheduler (cosine or step) |
--network_mode |
1 | Network mode: 0=Single, 1=LRHSI+HRMSI, 2=Triplet |
--msi_bands |
4 | Number of HRMSI spectral bands (4 or 6) |
--bands |
172 | Number of hyperspectral bands |
--crop_size |
128 | Training patch size |
--snr |
0 | Signal-to-noise ratio for AWGN (0=no noise) |
--nf |
96 | Base number of feature channels |
--gc |
32 | Growth channels in dense blocks |
--joint_loss |
1 | Enable joint loss (L1 + SAM + BandWiseMSE) |
python train.py \
--resume_ind 100 \
--resume_ckpt ./checkpoint/EXPERIMENT_NAME/best.pth \
[other arguments...]HSSDCT/
βββ train.py # Main training script (training loop & validation)
βββ dataset.py # Dataset classes for data loading
β # - Pairwise (LRHSI + HRMSI) & Triplet loading
βββ trainOps.py # Training utilities & Evaluation metrics
β # - Losses: SAM Loss, BandWise MSE
β # - Metrics: PSNR, ERGAS, RMSE
βββ utils.py # General utilities (Activation, padding)
βββ requirements.txt # Python dependencies
βββ models/
β βββ __init__.py
β βββ hssdct.py # Main model architecture
β β # - HSSDCT Framework
β β # - HDRTB (Hierarchical Dense-Residue Transformer Block)
β β # - SSCL (Spatial-Spectral Correlation Layer)
β βββ module_util.py # Weight initialization utilities
βββ data_path/
βββ train.txt # Training sample list
βββ val.txt # Validation sample list
βββ test.txt # Test sample list
| Parameter | Default | Description |
|---|---|---|
--nf |
96 | Base feature channels |
--gc |
32 | Growth channels in dense blocks |
--num_blocks |
6 | Number of repeated backbone blocks |
--groups |
1 | Group convolution factor (1=full, 4=light) |
--out_nc |
172 | Output channels (should match --bands) |
The model is evaluated using the following metrics:
| Metric | Description | Optimal |
|---|---|---|
| SAM | Spectral Angle Mapper (degrees) | Lower is better |
| PSNR | Peak Signal-to-Noise Ratio (dB) | Higher is better |
| RMSE | Root Mean Square Error | Lower is better |
| ERGAS | Erreur Relative Globale Adimensionnelle de Synthèse | Lower is better |
Quantitative comparison on the AVIRIS dataset (see Table 1 in the paper):
| Method | Params (M) | FLOPs (G) | PSNR (dB) | SAM |
|---|---|---|---|---|
| FusionMamba | 21.68 | 134.47 | 30.741 | 1.978 |
| QRCODE | 41.88 | 2231.19 | 35.361 | 1.623 |
| HSSDCT (Ours) | 6.78 | 283.84 | 37.212 | 1.348 |
Visual comparison of fused results (Figure 5 from the paper).
If you find this work useful in your research, please consider citing:
@inproceedings{lee2026hssdct,
title = {HSSDCT: Factorized Spatial-Spectral Correlation for Hyperspectral Image Fusion},
author = {Lee, Chia-Ming and Ho, Yu-Hao and Lin, Yu-Fan and Lee, Jen-Wei and Kang, Li-Wei and Hsu, Chih-Chung},
booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year = {2026}
}This project is licensed under the MIT License - see the LICENSE file for details.
For questions or issues, please open a GitHub issue or contact [jemmy112322@gmail.com].
