Inspiration

Reading CT and X-ray studies often means scrolling through hundreds of slices to find small, high-stakes pathologies. Many hospitals can’t use cloud APIs for PHI, and hosting giant ViTs locally is impractical. I wanted a tool that:

  • runs fully on CPU, on-prem,
  • gives interpretable, per-finding evidence (not just a score),
  • and can be tuned to a hospital’s own data.

To the best of my knowledge (Aug 10, 2025), no one had shown a CPU-only, on-device Convolutional Sparse Autoencoder (Conv-SAE) doing unsupervised, evidence-grounded localization in radiology with single-latent ablation masks. That gap and clinicians’ need for trustworthy visual evidence parked RadiSpect.


What it does

RadiSpect helps clinicians see where to look and why:

1) Safe Spans (peer review of an existing report):
Click a finding in the report -> see the Conv-SAE’s per-latent mask aligned to that finding. High-activation latents not covered in the text are highlighted as potential misses.

2) Clinician Assist (draft a new report):
Shows top-activated latents with overlays. Accept the useful ones, ignore false positives, and use the masks as evidence breadcrumbs.

3) Report Cross-Check (QA):

If a report is written, RadiSpect flags strongly activated latents that the text didn’t mention (an on-device second set of eyes).

How we built it

Model. A compact Convolutional Sparse Autoencoder trained unsupervised on images:

Evidence maps via single-latent ablation. For latent \(z_i\):

$$ \hat{x}^{(i=0)} = g_\phi\big([z_1,\dots,0,\dots,z_n]\big),\quad \Delta_i = \big| \hat{x} - \hat{x}^{(i=0)} \big|,\quad M_i = \mathrm{Thresh}(\Delta_i) $$

We overlay \(M_i\) on the original image to show what changes when only that concept is removed.

Label mining (v0). For each latent \(i\), collect reports from images with high \(z_i\), extract frequent phrases to propose a short label (optionally refined later).
Quality gates. We keep masks that meet energy/consistency checks (e.g., energy \(E_i=\sum M_i\); basic monotonicity under increased ablation strength).

Data & stack.

  • Dataset: IU X-Ray (paired images + reports).
  • Runtime: Python + PyTorch (CPU), Streamlit viewer.

- Why CPU-only? To honor privacy constraints and prove edge feasibility. Also, many radiologists work without GPUs and it's important to tailor to that.

Challenges we ran into

  • CT scarcity & time: Open CT datasets with aligned reports are scarce; training pipelines are heavier and unlikely to succeed within the timeframe of this hackathon. I built the X-ray proof of concept first to validate compute and workflow.
  • Unsupervised noise: Without labels, some masks bleed into irrelevant areas. I use energy thresholds, simple consistency checks, and human judgment in the loop.
  • Threshold tuning: I iterated on per-latent scaling and percentile rules to keep overlays stable.
  • Label fidelity: Phrase mining can be noisy. I kept labels short, factual, and tied to top-activator galleries rather than over-promising.

- Provable privacy: Judges want proof. I run in offline mode, show “Local Mode: ON (0 API calls)”, and keep models small.

Accomplishments that we're proud of

  • End-to-end on device: Training + inference on CPU, no cloud, no GPUs.
  • 1:1 provenance: Every accepted finding is backed by a single-latent ablation mask.
  • Clinically aligned UX: Three flows that map to real tasks: peer review, assistive drafting, and QA.
  • Lightweight & fast: Small Conv-SAE with interactive overlays on commodity hardware.
  • Clear scope of novelty: First to combine on-device Conv-SAE + unsupervised evidence maps for radiology localization.

What we learned

  • Interpretability drives adoption. Visual evidence beats a single probability score when clinicians need to trust a tool.
  • Unsupervised doesn't mean unreliable. With sparsity + careful ablations, evidence maps can be useful even without segmentation labels.
  • Human-in-the-loop matters. False positives are manageable when the UI lets clinicians quickly accept/ignore overlays.
  • Compute realism helps. Proving CPU-only viability makes deployment conversations (privacy, security, cost) much easier.
  • Metrics need to match the task. Energy/coverage and “no mask -> no claim” are better aligned to evidence-grounded assistance than generic accuracy.

What’s next for RadiSpect

  • CT extension (retrain, same method):
    Train on HU-windowed slices or short 2.5D stacks; aggregate per-slice masks (e.g., max-energy) to surface top frames. No architecture change required.
  • Packaging for hospitals: One-click local install; logs proving offline mode; admin knob for on-prem fine-tuning.

Built With

  • convolutional-sparse-autoencoder
  • python
  • streamlit
  • torch
Share this project:

Updates