🛒 Shop the Frame

About the project

Every day, millions of people watch YouTube videos and think:

“Where did they get that hoodie?”
“What keyboard is that?”
“I want that desk lamp.”

But today, discovering those products means pausing the video, searching Google, scrolling through comments, or hoping someone left a link — a process that’s slow, fragmented, and often unsuccessful.

Shop the Frame was inspired by that moment of curiosity.

💡 Inspiration

Video content is no longer just entertainment — it has become one of the strongest drivers of online purchasing behavior.

According to Shopify research:

Video accounts for over 82% of global internet traffic, making visual media the dominant way people consume content online.
Shoppers are 88% more likely to purchase after watching a product video, and ecommerce pages with video can see conversion increases of up to 144%.
Shopify merchants collectively reached over 875 million buyers globally, operating in more than 175 countries, demonstrating the massive scale of commerce happening across the platform.

At the same time, consumer behavior has shifted:

People increasingly discover products outside traditional storefronts — through creators, vlogs, lifestyle videos, and everyday content.
Yet product discovery remains disconnected: viewers must manually search for items they see, often abandoning the intent altogether.
While video clearly influences buying decisions, there is still no seamless way to connect seeing with shopping in real time.

These insights revealed a clear gap:

Video inspires demand — but commerce tools haven’t caught up.

Rather than relying on ads or manual product tagging, we asked:

What if any moment of curiosity inside a video could instantly become a shopping experience?

That question became the foundation of Shop the Frame — a tool that transforms organic video moments into real product discovery opportunities powered by AI and Shopify’s ecosystem.

🚀 How we built it

The project consists of three main components:

1. Chrome Extension (Frontend)

Runs directly on YouTube watch pages
Captures the current video frame using the browser’s <video> and canvas APIs
Displays a slide-in side panel UI
Sends the screenshot to the backend for analysis
Renders product results in real time

This keeps the experience fast and intuitive — no accounts, no links, no interruptions.

2. Gemini Vision + Function Calling (AI Layer)

We used Gemini’s multimodal capabilities to analyze each screenshot.

Given a single video frame, Gemini:

Identifies visually distinct, purchasable objects
Extracts attributes such as color, material, and form factor
Produces search-ready ecommerce queries
Uses function calling to determine which product searches should be executed

Rather than hard-coding detection logic, Gemini acts as an intelligent agent — reasoning over the image and dynamically selecting the most relevant commerce queries.

This allows the system to work across:

lifestyle vlogs
study-with-me videos
desk setups
everyday YouTube content

3. Shopify Catalog API (Commerce Layer)

Once Gemini generates product queries, the backend calls Shopify’s Catalog API, which provides global product discovery across eligible Shopify merchants.

For each detected item, we:

Search Shopify’s centralized catalog
Retrieve live product listings
Normalize pricing and imagery
Return real storefront links

All results are:

Sold by Shopify merchants
Available in real stores
Retrieved without scraping or manual indexing

🧠 What we learned

Building Shop the Frame taught us several key lessons:

Multimodal AI is most powerful when paired with tools.
Gemini’s reasoning combined with function calling enabled far richer outcomes than image labeling alone.
Commerce discovery is fundamentally a UX problem.
Users don’t want more ads — they want answers at the moment curiosity occurs.
Screenshot-based interaction dramatically simplifies video analysis.
A single frame provides enough semantic context without complex video pipelines.
Shopify’s Catalog API enables entirely new discovery surfaces.
Commerce no longer has to live only inside storefronts.

🧗 Challenges we faced

🎥 Video frame capture

Capturing YouTube frames reliably required careful handling of video playback states, canvas rendering, and resolution scaling.

🧠 Avoiding hallucinated products

Vision models can infer details that may not exist. We mitigated this by:

Only allowing brand detection when logos or text were clearly visible
Enforcing strict JSON schemas
Filtering for clearly purchasable items

⚡ Latency and concurrency

Each frame can produce multiple product searches. We addressed this by:

Parallelizing Shopify Catalog queries
Caching access tokens
Balancing response speed with result quality

🎯 Keeping the experience organic

We intentionally avoided staged or sponsored content. Designing the system to work on normal YouTube videos — not ads — was one of the most important challenges.

🌟 Why this matters

Shop the Frame introduces a new commerce surface:

For viewers: instant product discovery
For creators: monetization without intrusive ads
For merchants: organic exposure beyond storefronts

It demonstrates how multimodal AI + agentic reasoning + commerce APIs can redefine how people interact with video content.

🧰 Built with

Languages

JavaScript (ES6+)

Frontend

Chrome Extension (Manifest V3)
HTML / CSS
Browser Canvas & Media APIs

Backend

Node.js 18
Express.js
Multer (image uploads)
Native Fetch API

AI & Machine Learning

Google Gemini Vision
Gemini Function Calling

Commerce & APIs

Shopify Catalog API
Shopify MCP tool interface

🏁 Closing

Shop the Frame transforms moments of curiosity into moments of commerce.

By combining Gemini’s multimodal reasoning with Shopify’s global product catalog, we demonstrate a future where anything you see — not just what’s advertised — can be instantly shopped.

Built With

browser
chrome
css3
dotenv
express.js
gemini
github
google
html
javascript
local
multer
native
node.js
shopify

Updates

Connor Leung started this project — Jan 18, 2026 03:35 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.