Inspiration
Reading CT and X-ray studies often means scrolling through hundreds of slices to find small, high-stakes pathologies. Many hospitals can’t use cloud APIs for PHI, and hosting giant ViTs locally is impractical. I wanted a tool that:
- runs fully on CPU, on-prem,
- gives interpretable, per-finding evidence (not just a score),
- and can be tuned to a hospital’s own data.
To the best of my knowledge (Aug 10, 2025), no one had shown a CPU-only, on-device Convolutional Sparse Autoencoder (Conv-SAE) doing unsupervised, evidence-grounded localization in radiology with single-latent ablation masks. That gap and clinicians’ need for trustworthy visual evidence parked RadiSpect.
What it does
RadiSpect helps clinicians see where to look and why:
1) Safe Spans (peer review of an existing report):
Click a finding in the report -> see the Conv-SAE’s per-latent mask aligned to that finding. High-activation latents not covered in the text are highlighted as potential misses.
2) Clinician Assist (draft a new report):
Shows top-activated latents with overlays. Accept the useful ones, ignore false positives, and use the masks as evidence breadcrumbs.
3) Report Cross-Check (QA):
If a report is written, RadiSpect flags strongly activated latents that the text didn’t mention (an on-device second set of eyes).
How we built it
Model. A compact Convolutional Sparse Autoencoder trained unsupervised on images:
Evidence maps via single-latent ablation. For latent \(z_i\):
$$ \hat{x}^{(i=0)} = g_\phi\big([z_1,\dots,0,\dots,z_n]\big),\quad \Delta_i = \big| \hat{x} - \hat{x}^{(i=0)} \big|,\quad M_i = \mathrm{Thresh}(\Delta_i) $$
We overlay \(M_i\) on the original image to show what changes when only that concept is removed.
Label mining (v0). For each latent \(i\), collect reports from images with high \(z_i\), extract frequent phrases to propose a short label (optionally refined later).
Quality gates. We keep masks that meet energy/consistency checks (e.g., energy \(E_i=\sum M_i\); basic monotonicity under increased ablation strength).
Data & stack.
- Dataset: IU X-Ray (paired images + reports).
- Runtime: Python + PyTorch (CPU), Streamlit viewer.
- Why CPU-only? To honor privacy constraints and prove edge feasibility. Also, many radiologists work without GPUs and it's important to tailor to that.
Challenges we ran into
- CT scarcity & time: Open CT datasets with aligned reports are scarce; training pipelines are heavier and unlikely to succeed within the timeframe of this hackathon. I built the X-ray proof of concept first to validate compute and workflow.
- Unsupervised noise: Without labels, some masks bleed into irrelevant areas. I use energy thresholds, simple consistency checks, and human judgment in the loop.
- Threshold tuning: I iterated on per-latent scaling and percentile rules to keep overlays stable.
- Label fidelity: Phrase mining can be noisy. I kept labels short, factual, and tied to top-activator galleries rather than over-promising.
- Provable privacy: Judges want proof. I run in offline mode, show “Local Mode: ON (0 API calls)”, and keep models small.
Accomplishments that we're proud of
- End-to-end on device: Training + inference on CPU, no cloud, no GPUs.
- 1:1 provenance: Every accepted finding is backed by a single-latent ablation mask.
- Clinically aligned UX: Three flows that map to real tasks: peer review, assistive drafting, and QA.
- Lightweight & fast: Small Conv-SAE with interactive overlays on commodity hardware.
- Clear scope of novelty: First to combine on-device Conv-SAE + unsupervised evidence maps for radiology localization.
What we learned
- Interpretability drives adoption. Visual evidence beats a single probability score when clinicians need to trust a tool.
- Unsupervised doesn't mean unreliable. With sparsity + careful ablations, evidence maps can be useful even without segmentation labels.
- Human-in-the-loop matters. False positives are manageable when the UI lets clinicians quickly accept/ignore overlays.
- Compute realism helps. Proving CPU-only viability makes deployment conversations (privacy, security, cost) much easier.
- Metrics need to match the task. Energy/coverage and “no mask -> no claim” are better aligned to evidence-grounded assistance than generic accuracy.
What’s next for RadiSpect
- CT extension (retrain, same method):
Train on HU-windowed slices or short 2.5D stacks; aggregate per-slice masks (e.g., max-energy) to surface top frames. No architecture change required. - Packaging for hospitals: One-click local install; logs proving offline mode; admin knob for on-prem fine-tuning.
Built With
- convolutional-sparse-autoencoder
- python
- streamlit
- torch
Log in or sign up for Devpost to join the conversation.