MediaSKINoscope

A real‑time, on‑device zero‑shot AI co‑pilot that flags skin anomalies in live video streams.

✨ What is MediaSKINoscope?

MediaSKINoscope harnesses real‑time WebRTC streaming and Google’s MedSigLIP encoder to run vector‑based, zero‑shot analysis of skin conditions directly on live video feeds. It identifies and highlights dermatologic anomalies, such as rashes, lesions, or unusual pigmentation - for both clinicians and content creators.

💡 Inspiration

Medical foundation models now enable a unified understanding of both medical images and clinical texts within a single embedding space. By harnessing lightweight, multimodal encoders trained on diverse medical datasets, we unlock new possibilities for on‑device, real‑time inference without sacrificing accuracy. This project investigates how these compact models can be paired with more powerful, agent‑driven orchestration layers to deliver seamless, scalable medical AI workflows.

🔍 What It Does

MedSigLIP Encoder
A lightweight, 400 M‑parameter dual‑tower model from Google HAI-DEF that brings medical images and clinical texts into a single embedding space, enabling robust zero‑shot classification and retrieval on edge devices. Trained on paired image–text data covering chest x‑rays, dermatology photos, histopathology slides, ophthalmologic images, and CT/MRI slices, MedSigLIP is designed for strong out‑of‑the‑box performance across modalities in a wide variety of clinical domains.

⚠️ The Problem

On‑Stream Expert Coverage: Telemedicine consults or social‑media streams miss out on capturing and alerting medical problems in real‑time.
Workflow Interruptions: Pausing a live feed to capture, upload, and analyze images are unnatural and disrupts the conversation flow and engagement.
Rare‑Condition Blind Spots: Traditional models struggle to recognize less‑common skin issues on the fly without abundant labeled examples.

💡 Our Solution

Livestream Medical Alert Systems

Live‑Stream Ingestion: Capture and buffer incoming video via WebRTC or native streaming APIs, breaking it into a continuous sequence of frames without interrupting the user experience.
Embedding Extraction: Send each frame through MedSigLIP’s encoder to produce compact, medical‑grade visual embeddings that summarize the clinical appearance.
Zero‑Shot Detection: Perform vector‑similarity search against pre‑computed “normal” and “abnormal” skin descriptors (or a library of disease‑state prompts) to instantly flag anomalies.
Real‑Time Alerting: When similarity scores exceed clinician‑defined thresholds, trigger on‑screen notifications and log timestamps, then send to downstream agentic pipelines or expert review.

🚀 Features

Zero‑Shot Flexibility: Broad clinical coverage out-of-the-box, with optional domain-specific fine‑tuning.
Live‑Stream Speed: Analyze every frame in under 100 ms to keep pace with real‑time video.
Human‑Friendly Insights: Produce concise, natural‑language explanations linked to medical ontologies.
Privacy‑Preserving Inference: Optional on‑device processing to keep patient data local and secure.
Adjustable Sensitivity: Enable clinicians to tweak alert thresholds per condition for optimal precision and recall.

🛠 Tech Stack

Live Streaming: FastRTC for WebRTC‑based video capture and transport, with built‑in Gradio UI for seamless demos.
Embeddings: MedSigLIP encoder model transforms each frame into a unified medical embedding for zero‑shot tasks.
LLM Inference: LlamaIndex orchestrates Retrieval‑Augmented Generation (RAG) workflow, and medical data analyses.
TTS Orchestration: VAPI handles text‑to‑speech generation for spoken alerts, playing LlamaIndex outputs.
Frontend: Gradio powers a lightweight dashboard and alert overlay, for instant deployment without custom UI code.
Backend: FastAPI with Uvicorn manages the async pipeline—frame ingestion → MedSigLIP → LlamaIndex → VAPI TTS—for high‑throughput, event‑driven processing.

🧠 What We Learned

WebRTC Power: Leveraging WebRTC and frameworks like FastRTC simplifies real‑time video capture and transport, enabling low‑latency inference on live streams.
Optimized Pipelines Matter: Lightweight model orchestration and efficient frame handling are essential for consistent, sub‑100 ms inference speeds.
Explainability Boosts Trust: Generating concise natural‑language summaries tied to clinical ontologies significantly improves user confidence and interpretability.

🔜 What’s Next for MediaSKINoscope

MediaSKINoscope’s lightweight, on‑device embedding and vector‑search core can be repurposed for other live‑stream medical co‑pilot scenarios. By running at the network edge, it minimizes latency and provides situational awareness when it matters. Potential extensions include:

Tele‑Ultrasound: Highlight fluid collections or organ margins in real time.
Endoscopy & Colonoscopy: Flag inflamed mucosa or colorectal polyps mid‑procedure without pausing.
Otoscopy: Classify tympanic membrane pathologies on portable or smartphone devices on the fly.
Minimally‑Invasive Surgery: AI‑detected critical anatomy or perfusion anomalies during laparoscopic procedures.
Robotic Surgery Oversight: Live visual cues like tissue tension or bleeding risk to surgeons using robotic arms, without disrupting the workflow.

Built With

llamaindex
medsiglip
python
vapi

Updates

Patrick Damaso MD started this project — Jul 26, 2025 07:56 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.