Inspiration

Clinical decision-making is often constrained by one basic problem: physicians rarely see how a patient is actually functioning between visits.

In practice, they depend heavily on retrospective self-report, caregiver recollection, and brief in-clinic snapshots. That creates a fidelity gap. Patients may not accurately remember how often they were disengaged, distracted, inactive, or behaviorally different over the past several days or weeks, and subtle changes can be hard to describe even when they are clinically meaningful.

We were motivated by that gap both personally and professionally. Through our own experience, as well as conversations with physicians and researchers, we kept hearing the same theme: one of the hardest problems in healthcare is obtaining reliable, longitudinal, real-world behavioral data without increasing burden on the patient or clinician.

MedSight was built around a simple idea: if we can passively capture first-person context outside the clinic, then we can transform everyday behavior into structured clinical signals.

Instead of relying only on memory or fragmented notes, physicians can review objective trend data and summarized behavioral patterns over time. The goal is not diagnosis. The goal is better observational intelligence: earlier visibility into change, more grounded follow-up questions, and more informed clinical judgment.


What it does

MedSight is an AI-powered clinical observation platform that uses smart glasses to passively monitor patient behavior in the real world.

The system captures periodic first-person images, extracts structured behavioral observations, and analyzes those observations over time to generate a concise clinical report.

A physician begins by specifying, in natural language, what they want monitored. For example, they may want to watch for declining activity, reduced engagement, or increased distraction. That prompt becomes the clinical context for the session and conditions how incoming observations are interpreted.

MedSight converts raw visual input into structured behavioral telemetry such as:

  • activity and activity score
  • engagement level and engagement score
  • distraction presence and distraction score
  • detected environmental objects
  • model confidence and rationale

These signals are aggregated across multiple time horizons and compared against a personalized baseline.

The system outputs both quantitative and qualitative insights:

  • metric shifts
  • trend direction
  • narrative interpretation
  • key findings
  • recommended follow-up actions

The result is a physician-facing report that provides longitudinal visibility into how a patient is functioning outside the exam room.


How we built it

We built MedSight as a full-stack, multi-layer clinical analytics pipeline.

Data Capture

Smart glasses periodically capture first-person images and send them to the backend. Each frame is tied to a session containing the physician’s prompt, metadata, and patient context.

AI Observation Extraction

A vision pipeline uses an LLM with structured output constraints to convert images into machine-readable behavioral observations (activity, engagement, distraction, confidence, rationale).

Time-Based Modeling

We organize data into three layers:

  • Primary: frame-level observations
  • Secondary: minute-level aggregates
  • Tertiary: hour-level aggregates

This enables progression from moment-level insight → short-term patterns → longitudinal trends.

Analytics + Reporting

The system computes baseline-relative changes and generates a structured clinical report including:

  • status headline
  • risk level
  • quantitative snapshot
  • qualitative interpretation
  • key findings
  • recommended actions

We also built:

  • a physician web app for report viewing
  • a device/mobile layer for session and glasses control
  • Firebase-backed storage
  • backend orchestration for ingestion, analytics, and reporting

Challenges we ran into

  • Metric determination: We found it pretty difficult at times to determine what kind of information we wanted to output (ie quantitative data vs qualitative data); weighed the pros and cons of each to make our final decision.
  • Prompt → structure translation: Converting flexible physician language into stable, structured monitoring logic
  • Signal reliability: Working with sparse image snapshots instead of continuous data
  • Aggregation design: Choosing appropriate time windows and baselines
  • UI abstraction: Designing a report that is concise, structured, and clinically usable

Accomplishments that we're proud of

  • Built a complete end-to-end system from wearable capture → clinical report
  • Designed a multi-layer data architecture (primary → secondary → tertiary)
  • Created a physician-first interface focused on decision support

What we learned

  • Raw data is not the product—insight is the product
  • Multi-layer reasoning significantly improves system credibility
  • Healthcare UX requires clarity and restraint, not feature overload
  • Structured outputs are critical for reliable aggregation and analysis
  • Product framing matters—positioning as observational support increased realism and trust

What's next for MedSight

  • Continue work on compliance, privacy, and security
  • Move from descriptive → predictive analytics
  • Improve explainability (trace contributing observations and trends)
  • Build deeper personalization based on patient-specific baselines
  • Add physician feedback loops to refine monitoring and interpretation
  • Validate on real-world datasets and clinical workflows
  • Integrate with remote patient monitoring systems and EHRs

Long-term vision:
Make passive, real-world behavioral observation a standard input into clinical care.

Built With

Share this project:

Updates