slynk

process flowchart for slynk
testing ARKit image overlay
trying out meta quest 3!
testing ARKit video overlay

Inspiration:

In a world saturated with intrusive advertising, we want to transform ads from annoying interruptions into delightful, engaging experiences. Research shows that consumers respond significantly better to positive, emotionally appealing content than to neutral or information-heavy ads.

That’s why we created slynk, an augmented reality (AR) app that’s bringing sci‑fi to life, revolutionizing the way we interact with ads and shop. By crafting immersive, interactive experiences with novel technology, we’re creating ads that resonate with people long after the interaction ends. With slynk, scanning an ad transforms it into an immersive, personalized experience. The celebrity or spokesperson comes to life in real‑time, walking you through the product’s features, answering questions, and showing you exactly how it would look or work in your own life. It’s not just another static image or video—slynk lets you engage and interact with the products you want to see, right in front of you.

For consumers, this means shopping that’s more than just a transaction—it’s an interactive, engaging experience. You can visualize products in your space, ask questions, and get real‑time, personalized information that helps you make better decisions faster. No more scrolling endlessly through product pages; with slynk, you have everything you need to make informed, confident choices—all while saving time.

For businesses, big or small, slynk is a game‑changer. It doesn’t matter if you're a startup with limited resources or a large corporation with a vast product catalog. By boosting engagement and increasing conversion rates by 94% (market.us) on product pages, slynk helps businesses connect with consumers in a more meaningful way. It gives small businesses the tools they need to create personalized, memorable experiences that compete with the biggest names in the industry. Larger businesses, on the other hand, can leverage slynk’s analytics to refine their marketing strategies and deliver the kind of targeted, immersive advertising that drives results.

What it does:

slynk brings a whole new level of interactivity to ads. When you come across an ad that catches your attention, simply open the slynk app and point your phone at the ad. Instantly, the celebrity or spokesperson will appear on your screen, lip-syncing an AI-generated script that showcases the product in a way that aligns with your personal preferences. You can then interact with the product itself by resizing it—making it larger or smaller—allowing you to get a closer look and explore the product in more detail. If you like what you see, you can easily add it to your "Liked Items" list. From there, you can revisit your liked products, click through to their websites, and make a purchase directly. This creates a seamless, personalized shopping experience that allows you to interact with ads like never before.

How we built it:

AR: XCode

We used XCode's ARKit, RealityKit, UIKit, Vision, AVfoundation packages.

ARKit handles the core augmented reality functionality by providing motion tracking and scene understanding for placing virtual content in the real world.
RealityKit builds upon ARKit to deliver high-fidelity 3D rendering, physics simulation, and advanced visual effects for our AR experiences
UIKit manages the app’s user interface components and provides the basic application development environment.
AVFoundation handles all audio-visual media processing including video playback, audio recording, and media file management for our ad content.

Together, these frameworks create a comprehensive stack that enables us to build an immersive AR advertising platform with sophisticated UI, realistic 3D content, and seamless media handling capabilities.

Software:

GroqCloud (LLama3.2-90b-vision), Sync (lipsyncing static video of ad promoter person), ElevenLabs (tts for fine-tuned ad promoter audio), ngrok tunneling (hosting public videos for APIs), OpenAI Whisper (stt for user prompt & interaction), CV prompt-based object detection (Grounding DINO 1.5), removal, and inpaint/fill (inpaint anything - segment anything (SAM-2 ViT-b) and LaMA (big-lama)), face mesh and smooth mapping (MediaPipe), FileStack + ngrok tunneling (public file upload and host api).

Input Processing

Speech input converted to text via OpenAI Whisper
Video/image input processed through MediaPipe face mesh for 3D landmark detection
Object detection and segmentation performed using Grounding DINO 1.5, SAM2 ViT-b
Captured static video hosted through ngrok tunneling for API access

Core Processing

LLama 3.2 11B Vision model processes visual content and generates ad-related content
Face mesh coordinates mapped and smoothed using MediaPipe’s 468 3D landmarks

Visual Enhancement

Lip movements synchronized with generated audio and static video input from ngrok using Sync.so technology
Generated text converted to speech using ElevenLabs voice synthesis integration into Sync using specified voice ID
Background inpainting and filling where needed with LoMA

Distribution

Sync.so output video mapped onto detected object with MediaPipe
Repeat the loop from Core processing for further user interaction and prompts about ad product
IoS App has “Like” feature, saves the ad URL into local storage database for future viewing

Challenges we ran into:

Our original goal was to implement our app on the Meta Quest 3 so that ads come to life by creating interactive VR avatars from recognr-ized adboards who would advertise products and answer questions about their product. However, due to compatibility issues with Meta Quest Link needing Windows systems with NVIDIA discrete graphics, we had to pivot our idea to be more realistic while still immersive. We transitioned to a new AR stack using XCode (Swift) libraries and had to learn how to combine ARKit with all the apis.

Accomplishments that we’re proud of:

Going into this project, we knew it would be quite ambitious. Building out a AR/VR platform and augmenting images into animated talking personas with as low latency as possible is not just an issue we are addressing in this hackathon, startups and companies are developing their own reality swapping and mixed reality solutions to manipulate media for content creation, distribution, and consumption. Understanding two completely new platforms (Unity + Quest 3, XCode + ARKit) and having to scrap one was not only a big planning hurdle we had to overcome, but also a significant mental hurdle. Coming up with creative ways to actually implement parts of our ambitious idea was something we were quite proud of, researching and trying out different models for applications (stability ai’s stable diffusion2, GLIGEN for generative filling and inpainting, Grounding-DINO for scene-aware object detection, Grounded SAM for detailed segmentation, vespa.ai VLM for vision language understanding). After trying out multiple approaches and consulting with our team, we came up with the most optimized solution that balanced performance and speed for our application.

What we learned:

Ananya: I learned a lot about how to do AR on Swift and was able to do a lot more in a short amount of time in addition to having to learn the platform and language. I have had previous experience developing AR apps but this is my first experience with Swift, I liked its feature robustness while staying simple.

Manav: Learned a completely new language with Swift, got a lot of experience with API paradigms which are quite helpful for future development.

Kathleen: Team synergy is very important and I am glad that we had a lot of that. I really valued learning a new development language and platform through Swift and XCode as well.

Robert: I got to learn so much with dynamic video manipulation and extraction with novel advanced models that definitely help me understand the variety of models available, the emerging areas of R&D in capable, low-latency computer vision and generative media, familiarity with developing with new models and libraries, and integrating dynamic CV and media into a meaningful app.

What’s next for slynk:

slynk is just getting started. We wanted to make ads a positive, immersive interaction rather than the dull, nagging presence they have right now. Once we acquire the required hardware, we will boot our app and features into virtual reality with Apple Vision Pro. With this, our recognition 2D image to interactive 3D virtual avatar system can be used to create real virtual assistants that users can talk to and physically interact with for a more immersive experience. We will expand the virtual avatar to become full body. This virtual avatar generation can also be applied to other applications such as virtual shopping and fitting for clothes and furniture, personalized sports broadcasting, and VR presentation generation.

Built With

ar
elevenlabs
flask
groq
llama
mediapipe
ngrok
openai-whisper
python
swift
sync
xcode

Submitted to

BoilerMake XII
- Winner Sync: Best Lipsync Hack

Created by

Worked on mapping video overlays to the app using arkit image detection libraries and building app front end.

Ananya Jajoo
I integrated APIs such as Groq (LLaMA Vision inference), ElevenLabs (text-to-speech), and Sync (lip-synced video generation). I also contributed to building the ARKit video features.

Manav Gagvani
did prompt-based object detection, segmentation, replacement w/ inpainting, and face mapping.

Robert Zhang
Kathleen Young

Updates

Ananya Jajoo started this project — Feb 23, 2025 01:22 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.