Capsure

The user could draw rectangles on their reference photo to highlight specific areas they want to align
The user could also save their favorite compositions and reload them anytime
User interface
Face & Eye Detection: Tracks head orientation (pitch, yaw, roll) and detects if eyes are open—perfect for portrait recreation

Inspiration

When travelling solo (or even travelling with the people you know), we often find the photos the others take for us unsatisfactory. For example, your eyes may be closed when someone took a photo of you or maybe their style of photography is just different to yours. Sometimes, it is also difficult or awkward to articulate the exact photograph we wanted to others. Selfie sticks only offer an extremely limited perspective on the scenery. Editing the photo afterwards removes the integrity of the travelling moments (which hold emotional values). Also, it only allows cropping (i.e. cannot add new content if not photographed) and visually only provides 2D transformations.

This is important because photography is an extremely important part of the travelling experience, a way to remind yourself of the good memories and hold strong emotional values.

This is especially relevant now because research (e.g. Forbes) shows that solo travel has become increasingly popular (doubling in popularity from 2018-2023 alone) and the meteoric rise of social media (Instagram/Tiktok in particular) has made photo quality increasingly important.

What it does

User = the person who is in the photo (and wants someone to take a photo of him/her) Photographer = the person who is taking the photo

We built an app (called ‘Capsure’) that allows the user to give clear instructions on the phone to the photographer on their preferred position and orientation with respect to the background. This is achieved by the user taking a picture of the location he/she wants to be in and draw a custom box to indicate where he/she wants to be in.

Using computer vision, the app offers adaptive guidance for the photographer to take the photo exactly the way the user wants. The app also issues warnings if certain pre-set undesirable features are detected (e.g. closed eyes).

The app also allows many locations to be saved and come back after to take a photo. This is especially useful for busy tourist sites where an immediate photograph opportunity is not available. This allows the user to remember and preserve their favourite unique perspective even after wandering away.

How we built it

Frontend: JavaScript-based web interface with real-time video streaming and an interactive canvas for drawing target regions. Backend: Python Flask server handling all the heavy lifting: OpenCV powers our computer vision pipeline—detecting features in images, matching them between reference and live frames, and computing homography transformations to calculate precise camera movement instructions MediaPipe Face Mesh provides robust face angle estimation and eye tracking without requiring any external model files

Challenges we ran into

Long computational time: Originally, a more accurate computer vision model was employed to recognise key features in images, however, this resulted in a long computational time, causing the app to lag. Therefore, we employed a simpler but quicker model so that the app could function in a normal frame rate.

The Canvas Overlay Problem: Allowing users to draw rectangles on the reference image sounds simple, but positioning an interactive canvas perfectly over a responsive image is tricky. We dealt with coordinate scaling issues where drawings wouldn't align with what users clicked. The fix required careful math to translate mouse coordinates between the canvas display size and the actual image dimensions.

Accomplishments that we're proud of

Real-time homography transformation that actually works

What we learned

Stability matters more than speed - It is often good to achieve speed in the short run, but in order for the app to run sustainably, stability is always more important