LensLab

Inspiration

Our inspiration comes from the desire to help people bring their imagination to life — transforming a flat, inanimate idea or 2D object into an interactive and engaging experience. As such, we wanted to create a bridge between the real world and animated, interactive objects that people of all ages could enjoy.

Two use cases we considered as inspiration for our product: 1) For children, we wanted to create an experience similar to popular games from our childhoods, such as Talking Tom, Angela, or Pou, but with the excitement of the chance to bring to life a character of their own design (e.g. imaginary friend :o ?). This concept was inspired by the children's story "Harold and the Purple Crayon" by Crockett Johnson, in which Harold’s doodles come to life as he goes on adventures. Similarly, children can realize their imagination into an interactive model in their simulated real world.

2) For the senior demographic, especially those at risk of dementia or Alzheimer’s, "social activities can prevent social isolation and loneliness, which are linked to higher risks for cognitive decline and Alzheimer’s disease." By providing an interactive companion that can provide a sense of presence, we hope to help support feelings of connectivity in this demographic.

What it does

LensLab takes inanimate 2D images or text around the user, and upon a quick hand gesture to capture the user's point of view, it identifies a distinct image of either an object/character or text in the image capture frame and generates in real-time an interactive 3D model of the object/text that can be poked, stretched, moved around, peered at from multiple angles, and more!

The user experience is also enhanced with a sleek UI that sits cleanly on top of the user's vision, simple and intuitive UX, and sound and visual effects.

How we built it

Using the Snap Spectacles and Lens Studio, we developed a friendly user interface with interactive buttons and a live camera feed that the user can use as reference for their image capture.

With this AR interface, we set up the functionality so that upon a hand gesture to activate the camera capture functionality, an image of the user's perspective is captured and then passed through a data processing pipeline. In this pipeline, the image is parsed by Gemini API, which we prompted to return a short description of the character shown in the image (e.g. "black cat, yellow eyes"), and then this description is input into the Snap3D object generator (Snap3DInteractableFactory), which then generates a 3D model of the image in front of the user.

To ensure a smooth and positive user experience, we also designed and implemented a UI/UX workflow and added additional sound and visual effects into our app.

Challenges we ran into

As this was the first time developing an AR app for all of us, the technology that we used was relatively unfamiliar to us. As such, our main challenge initially was getting accustomed to Lens Studio, the platform used to develop the product.

Since augmented reality is still relatively new and uncommon to regular consumers, Lens Studio also, as such, is relatively new to the market and has some limitations that we learned to work through or around as we experimented with building our product.

We also went through a lot of trial and error, using the developers' documentation, sample projects, and individual experiments, to develop our image capture and processing pipeline. As such, it took a few iterations to set up the initial independent behaviour that we wanted to implement, such as capturing the user's real-time view and the basic pipeline of taking in an image input and generating a prompt description of the captured image.

Accomplishments that we're proud of

After some prompt refining and understanding the preferred prompting style for Snap3D object generation, we landed on a model that generates an accurate 3D model of the image captured in the user's line of sight, which the user can then interact with in real time. We're proud of the accuracy with which the 3d object generator can create a model of the image, especially given that it involves a pipeline of multiple steps of parsing the data before the 3d model is generated.

Regarding quality of life refinements, we also succeeded in making the image parser and prompt generator accurate even when the character or text in focus takes up a small proportion of the frame (i.e. the user is far away from the object in question). The UI is also an aspect we're proud of, given that designing AR-based UI is a new concept both to us and society, and the workflow we designed is intuitive even for new users of our product.

What we learned

Developing using Lens Studio is similar to using a game engine, and developing for games and AR involves a slightly different ideation pattern and feature planning compared to other genres of products, such as web products.

Additionally, a valuable lesson that we learned was that new technology such as augmented reality lenses innately has a lack of documentation and resources due to the short term for which it has existed and had the time to evolve. As such, there is much iteration, innovation, and creativity required on the part of the developers to learn and work within the current limitations of the product.

Overall, this experience has been a great learning experience and opportunity to work with leading modern technology for all of us. 🐈‍⬛

What's next for LensLab

Sessions and auth
More detailed 3d models
Animated/more interactive models than just resize, translation, and rotation
Generating entire scenes from text (book visualisation, movie from a book)
Chatbot-style object instance for characters
Voice for characters
Croppable gesture
Option for seamless integration into other lenses with a transparent overlay and environment-based loading