Inspiration
Holoray, a medical education company, needed a lightweight system to superimpose annotations onto live footage of medical procedures. These medical procedures tend to feature a lot of moving objects. As such, the annotations must stay in place as objects within the video feed move. In addition, should the camera pan away from the annotations and pan back to them at a later point, they should move offscreen with the camera and come back onscreen when the camera returns to them. The service must be able to accomplish this using only the raw video footage. Finally, medical professionals should not be expected to change their habits in order to make the most of this service – it should just work.
How we built it
The backend of our project uses a python fastAPI server with three key endpoints. One endpoint receives live camera feed, either from a webcam or from a medical imaging machine’s output, via the websocket protocol. The other receives annotations encoded in JSON through HTTP. Finally, the third endpoint outputs the camera footage, through websockets, with the annotations superimposed on it. The server dynamically calculates the position of these annotations based on anchor points it identities throughout the video.
The frontend is implemented in React. It allows the user to both subscribe to the live camera feed provided by the backend and send annotations back to the backend to be layered on top of the camera feed.
Challenges we ran into
We encountered two primary challenges. The first was understanding how to leverage OpenCV, a computer vision library, for this particular use case, while the second was allowing users to accurately make annotations on live video feed.
With regards to the first challenge, we learned how to leverage anchor points in OpenCV to pinpoint key locations in the video feed. The backend then associates annotations to these key locations, and constantly recalculates their positions relative to them.
We believe that precision is key when communicating medical ideas. As such, our second challenge was allowing users to make accurate edits in real time. When a user begins making an annotation, the video feed is paused to allow them to make such precise sketches. However, when the user finishes their annotation, it must be associated with the correct video frame to preserve this accuracy. This was achieved by implementing a sliding window cache; when a user begins annotating a frame, the system is able to retrieve the correct frame and its associated anchor points from the cache.
Accomplishments that we're proud of
This was our first time working with OpenCV and building a computer‑vision project, and a major accomplishment was moving beyond tutorials to apply it to real‑world problems. We learned how to load and process frames, experiment with feature detection, and reason about what the computer actually “sees” in each image. Another milestone was bringing artificial intelligence and computer vision, two new domains for us, into a medical context, which forced us to think carefully about accuracy, reliability, and how clinicians might interact with the output. We also worked through the full pipeline of how frames are sent between the frontend, backend, and a simulated webcam, which meant handling timing, synchronization, and data flow across components. Overall, the project pushed us to connect theory with practice and gave us confidence to keep building in this space. This project marked our introduction to computer vision and OpenCV. Overall, the experience reinforced our ability to connect theoretical knowledge with practical application and has given us the confidence to pursue further projects in this domain.
What's next for TissueTrackr
The next step for IssueTrackr is implementation in Holoray’s HoloXR remote learning technology. The simplicity of this system makes this process relatively straightforward.
Log in or sign up for Devpost to join the conversation.