This repository contains algorithmic solutions for the PANTHER Challenge, tackling both tasks.
The PANTHER Challenge focuses on automated segmentation of pancreatic tumors from abdominal MRI scans.
The algorithms leverage nnU-Net with several advanced techniques including:
- Pseudo-labeling to leverage unlabeled data
- 5-fold ensemble inference for robust predictions
- Noisy student training for semi-supervised learning
- ResEnc (Residual Encoder) U-Net architecture variants
- Pre-training and fine-tuning for effective transfer between two tasks
.
├── Task1/ # Pancreatic tumor segmentation in diagnostic MRIs
│ ├── Dockerfile # Container definition for Task 1
│ ├── inference.py # 5-fold ensemble inference pipeline
│ ├── custom_trainers/ # Custom nnU-Net trainer implementations
│ ├── nnUNet_results/ # Model configurations and plans
│ ├── training/ # Training scripts and utilities
│ │ ├── colab_training.py # Round 0: 5-fold teacher model training
│ │ ├── r1_pseudo_panther/ # Round 1: 5-fold teacher model pseudo-labeling
│ │ ├── r1_students_panther/ # Round 1: 5-fold noisy student model training
│ │ ├── r2_pseudo_panther/ # Round 2: 5-fold student model pseudo-labeling
│ │ └── r2_students_panther/ # Round 2: Enhanced 5-fold noisy student model training
│ └── model/ # Checkpoint storage directory
│
└── Task2/ # Pancreatic tumor segmentation in MR-Linac MRIs
├── Dockerfile # Container definition for Task 2
├── inference.py # Cropped inference with MRSegmentator
├── data_utils.py # Image preprocessing utilities
├── nnUNet_results/ # Model configurations
├── training/ # Training scripts and utilities
│ └── train.py # 3-fold ResEncM fine-tuning
└── model/ # Checkpoint storage directory
Architecture: ResEnc-M (Residual Encoder U-Net Medium variant) with 3-class output (background as label 0, tumor as label 1, and pancreas as label 2)
Training Strategy - Iterative Noisy Student Training:
-
Round 0 - Teacher Training:
- Train 5-fold ResEnc-M models on 92 labeled samples
- 300 epochs with initial LR of 0.005
-
Round 1 - First Student Generation:
- Teachers generate pseudo-labels for 367 unlabeled samples
- Filter out background-only predictions (70 samples excluded)
- Train 5-fold student models on combined labeled + pseudo-labeled data (800 epochs)
-
Round 2 - Enhanced Student Generation:
- Round 1 students generate pseudo-labels for 367 unlabeled samples
- Filter out background-only predictions (29 samples excluded)
- Train final 5-fold student models with 1600 epochs and LR of 0.003
Inference:
- 5-fold ensemble with softmax probability averaging
- Multi-class predictions (3 classes) with binary tumor extraction
Architecture: ResEnc-M fine-tuned from Task 1 models
Training
- 3-fold cross-validation on 50 labeled MR-Linac samples
- Fine-tune from one of 5-fold Task 1 checkpoints (500 epochs, LR 0.001)
- Maintain 3-class formulation for consistency with Task 1 pre-training, despite binary evaluation
Inference
-
Input Processing:
- Load original high-resolution T2-weighted MRI
- Create low-resolution copy with 3.0×3.0×6.0 mm spacing for computational efficiency
-
Organ Detection:
- Apply MRSegmentator with 5-fold ensemble (folds 0-4) on the low-resolution image
- Extract pancreas segmentation mask (organ class #7 in MRSegmentator output)
- Binary conversion: pancreas=1, everything else=0
-
ROI Definition:
- Compute 3D bounding box around the detected pancreas region
- Add 30mm safety margins in all directions to ensure complete tumor coverage
- Transform coordinates back to original image space
-
Focused Processing:
- Crop the original high-resolution MRI using the computed ROI (preserving original resolution)
-
Tumor Detection:
- Run nnU-Net 3-fold ensemble (folds 0, 1, 2) on the cropped high-resolution region
-
Full Resolution Reconstruction:
- Map the predicted tumor mask from cropped space back to original image dimensions
- Place predictions at the correct anatomical location using saved crop coordinates
- Output final binary mask (0=background, 1=tumor) at original resolution
- Pseudo-labeling: Leverages 367 unlabeled samples in Task 1
- Noisy Student Training: Iterative refinement through teacher-student paradigm
- Background Filtering: Excludes pseudo-labels with pure background to maintain quality
- Task 1: 5-fold cross-validation ensemble
- Task 2: 3-fold cross-validation ensemble
- Softmax Averaging: Probabilistic combination of fold predictions
- PyTorch 2.3.1 with CUDA 11.8
- nnU-Net v2
- SimpleITK
- MRSegmentator (Task 2 only)
- surface-distance (for evaluation)
- GPU with ~9GB VRAM for ResEnc-M variant
- 24GB VRAM for ResEnc-L variant (optional)
- 40GB VRAM for ResEnc-XL variant (optional)
Note: I did not observe explicit performance boost by replacing ResEncM with the L or XL variant for neither task. My interpretation is that the dataset size is relatively small so larger models tend to be underfit.
# Task 1
cd Task1
./do_build.sh panther-task1-5fold-ensemble
./do_test_run.sh # Test with sample data
./do_save.sh # Create submission package
# Task 2
cd Task2
./do_build.sh panther-task2-baseline
./do_test_run.sh
./do_save.sh# Round 0: Initial teacher training (implemented on Google CoLab)
# Refer to ./Task1/training/colab_training.py
# Round 1: Generate pseudo-labels and train students
python ./Task1/training/r1_pseudo_panther/generate_teacher_predictions.py
python ./Task1/training/r1_students_panther/train_students.py
# Round 2: Refined pseudo-labels and final students
python ./Task1/training/r2_pseudo_panther/generate_prediction.py
python ./Task1/training/r2_students_panther/train.py --fold 0 # Repeat for folds 0-4# Fine-tune from Task 1 (3 folds)
python ./Task2/training/train.py --fold 0 --task1_checkpoint path/to/checkpoint.pth
python ./Task2/training/train.py --fold 1 --task1_checkpoint path/to/checkpoint.pth
python ./Task2/training/train.py --fold 2 --task1_checkpoint path/to/checkpoint.pthBoth tasks include evaluation scripts that compute:
- Volumetric Dice Score
- Surface Dice (5mm tolerance)
- 95% Hausdorff Distance
- Mean Average Surface Distance (MASD)
- Tumor Burden RMSE
# Example evaluation
python ./Task2/training/evaluate_local_fixed.py \
--pred_dir predictions/ \
--gt_dir ground_truth/ \
--verbose- Network: ResidualEncoderUNet
- Normalization: Z-Score
- Patch Size:
- Task 1: [48, 160, 224]
- Task 2: [64, 112, 160]
- Batch Size: 2
- Deep Supervision: Enabled
| Parameter | Task 1 Teacher | Task 1 Student | Task 2 |
|---|---|---|---|
| Epochs | 300 | 800-1600 | 500 |
| Initial LR | 0.005 | 0.003-0.005 | 0.001 |
| Optimizer | SGD with momentum | SGD with momentum | SGD with momentum |
| Objectives | CE + Dice (1:1.5) | CE + Dice | CE + Dice |