Inspiration

I was inspired by the potential of computer vision and agentic swarms to assist doctors in diagnosing skin conditions more accurately and efficiently. By leveraging the HAM10000 dataset, I wanted to build an app that can analyze skin images and provide actionable insights, helping medical professionals catch conditions early and improve patient care.

What it does

This app in its best case meant to be integrated with Apple Vision Pro and Webspatial, however for the purpose of the hackathon, This app uses the webcam to detect skin abnormalities, suggests possible diagnosis with confidence intervals, displays similar cases, and performs deep research with Claude for in-depth explanations as to why the system suggested that diagnosis.

How we built it

First, the dataset is loaded into the directory, where it is embedded, which is the process of converting images with semantic meaning and metadata into high dimensional vectors using HuggingFace. These vectors are chunked(process of splitting large files like images into smaller chunks for fast retrival) and stored in Pinecone, where I used RAG (Retrieval Augmented Generation) which abstracts the mathematical formulas(cosine similarity) used to retrieve similar images and their metadata (2 vectors with a high similarity score are considered similar). After the user captures a frame in the client, the image encoding is sent to an Express REST endpoint where it is decoded from its binary and compared with all vectors stored in the PineCone Index. As for Anthropic integration, I integrated a multi-agentic research swarm with Anthropic SDK and prompt engineering to orchestrate deep research of the diagnosis.

Challenges we ran into

Some challenges I ran into was figuring out how to structure the server directory to accommodate for over 3GB of medical data, creating an immersive and futuristic feeling frontend, and integrating Claude for backend endpoints

Accomplishments that we're proud of

Im proud of getting the chance to learn more about computer vision, and developing this system end-to-end. I am also proud of figuring out how to embed a large dataset and import it into a vector database index

What we learned

Through developing DermaVisionXR, I learned how to architect and orchestrate multi-agent AI systems using Claude to create specialized medical consultants that work in parallel. I also gained experience integrating technologies like vector databases (Pinecone) and vision transformers (CLIP) into a web application that bridges agentic ai, computer vision, and applies it to the medical sector.

What's next for DermaVisionXR

I want to utilize Apple Vision Pro's SDK with WebSpatial or MentraOS's SDK to enable dermatologists to not have to use their webcam on their computer to assist with diagnosis. WebSpatial will allow me to create immersive 3d experiences for dermatologists, and Mentra will provide the hardware and SDK's that will allow me to develop this app cross-platform (IOS and Mentra). Additionally, I would like to expand the dataset to not just dermatology, but other conditions.

Built With

  • anthropic-api
  • anthropic-claude-sonnet-4
  • apple-vision-pro
  • canvas-api
  • clip-vision-transformer
  • cors
  • css-animations
  • css-grid
  • css-in-js
  • dotenv
  • eslint
  • express.js
  • fetch-api
  • flexbox
  • getusermedia-api
  • glass-morphism-effects
  • ham10000-dataset
  • javascript
  • jsx/tsx
  • multer
  • multi-agent-ai-system
  • node.js
  • nodemon
  • npm
  • pinecone
  • pinecone-cloud
  • react-18
  • restful-api
  • sharp
  • single-page-application
  • swift
  • typescript
  • typescript-compiler
  • visionos-2.0+
  • vite
  • webspatial/react
  • xcode
  • xenova/transformers
Share this project:

Updates