MedSight

Hard at work
Team photo!

Inspiration

Clinical decision-making is often constrained by one basic problem: physicians rarely see how a patient is actually functioning between visits.

In practice, they depend heavily on retrospective self-report, caregiver recollection, and brief in-clinic snapshots. That creates a fidelity gap. Patients may not accurately remember how often they were disengaged, distracted, inactive, or behaviorally different over the past several days or weeks, and subtle changes can be hard to describe even when they are clinically meaningful.

We were motivated by that gap both personally and professionally. Through our own experience, as well as conversations with physicians and researchers, we kept hearing the same theme: one of the hardest problems in healthcare is obtaining reliable, longitudinal, real-world behavioral data without increasing burden on the patient or clinician.

MedSight was built around a simple idea: if we can passively capture first-person context outside the clinic, then we can transform everyday behavior into structured clinical signals.

Instead of relying only on memory or fragmented notes, physicians can review objective trend data and summarized behavioral patterns over time. The goal is not diagnosis. The goal is better observational intelligence: earlier visibility into change, more grounded follow-up questions, and more informed clinical judgment.

What it does

MedSight is an AI-powered clinical observation platform that uses smart glasses to passively monitor patient behavior in the real world.

The system captures periodic first-person images, extracts structured behavioral observations, and analyzes those observations over time to generate a concise clinical report.

A physician begins by specifying, in natural language, what they want monitored. For example, they may want to watch for declining activity, reduced engagement, or increased distraction. That prompt becomes the clinical context for the session and conditions how incoming observations are interpreted.

MedSight converts raw visual input into structured behavioral telemetry such as:

activity and activity score
engagement level and engagement score
distraction presence and distraction score
detected environmental objects
model confidence and rationale

These signals are aggregated across multiple time horizons and compared against a personalized baseline.

The system outputs both quantitative and qualitative insights:

metric shifts
trend direction
narrative interpretation
key findings
recommended follow-up actions

The result is a physician-facing report that provides longitudinal visibility into how a patient is functioning outside the exam room.

How we built it

We built MedSight as a full-stack, multi-layer clinical analytics pipeline.

Data Capture

Smart glasses periodically capture first-person images and send them to the backend. Each frame is tied to a session containing the physician’s prompt, metadata, and patient context.

AI Observation Extraction

A vision pipeline uses an LLM with structured output constraints to convert images into machine-readable behavioral observations (activity, engagement, distraction, confidence, rationale).

Time-Based Modeling

We organize data into three layers:

Primary: frame-level observations
Secondary: minute-level aggregates
Tertiary: hour-level aggregates

This enables progression from moment-level insight → short-term patterns → longitudinal trends.

Analytics + Reporting

The system computes baseline-relative changes and generates a structured clinical report including:

status headline
risk level
quantitative snapshot
qualitative interpretation
key findings
recommended actions

We also built:

a physician web app for report viewing
a device/mobile layer for session and glasses control
Firebase-backed storage
backend orchestration for ingestion, analytics, and reporting

Challenges we ran into

Metric determination: We found it pretty difficult at times to determine what kind of information we wanted to output (ie quantitative data vs qualitative data); weighed the pros and cons of each to make our final decision.
Prompt → structure translation: Converting flexible physician language into stable, structured monitoring logic
Signal reliability: Working with sparse image snapshots instead of continuous data
Aggregation design: Choosing appropriate time windows and baselines
UI abstraction: Designing a report that is concise, structured, and clinically usable

Accomplishments that we're proud of

Built a complete end-to-end system from wearable capture → clinical report
Designed a multi-layer data architecture (primary → secondary → tertiary)
Created a physician-first interface focused on decision support

What we learned

Raw data is not the product—insight is the product
Multi-layer reasoning significantly improves system credibility
Healthcare UX requires clarity and restraint, not feature overload
Structured outputs are critical for reliable aggregation and analysis
Product framing matters—positioning as observational support increased realism and trust