🛒 Shop the Frame

About the project

Every day, millions of people watch YouTube videos and think:

“Where did they get that hoodie?”
“What keyboard is that?”
“I want that desk lamp.”

But today, discovering those products means pausing the video, searching Google, scrolling through comments, or hoping someone left a link — a process that’s slow, fragmented, and often unsuccessful.

Shop the Frame was inspired by that moment of curiosity.


💡 Inspiration

Video content is no longer just entertainment — it has become one of the strongest drivers of online purchasing behavior.

According to Shopify research:

  • Video accounts for over 82% of global internet traffic, making visual media the dominant way people consume content online.
  • Shoppers are 88% more likely to purchase after watching a product video, and ecommerce pages with video can see conversion increases of up to 144%.
  • Shopify merchants collectively reached over 875 million buyers globally, operating in more than 175 countries, demonstrating the massive scale of commerce happening across the platform.

At the same time, consumer behavior has shifted:

  • People increasingly discover products outside traditional storefronts — through creators, vlogs, lifestyle videos, and everyday content.
  • Yet product discovery remains disconnected: viewers must manually search for items they see, often abandoning the intent altogether.
  • While video clearly influences buying decisions, there is still no seamless way to connect seeing with shopping in real time.

These insights revealed a clear gap:

Video inspires demand — but commerce tools haven’t caught up.

Rather than relying on ads or manual product tagging, we asked:

What if any moment of curiosity inside a video could instantly become a shopping experience?

That question became the foundation of Shop the Frame — a tool that transforms organic video moments into real product discovery opportunities powered by AI and Shopify’s ecosystem.


🚀 How we built it

The project consists of three main components:


1. Chrome Extension (Frontend)

  • Runs directly on YouTube watch pages
  • Captures the current video frame using the browser’s <video> and canvas APIs
  • Displays a slide-in side panel UI
  • Sends the screenshot to the backend for analysis
  • Renders product results in real time

This keeps the experience fast and intuitive — no accounts, no links, no interruptions.


2. Gemini Vision + Function Calling (AI Layer)

We used Gemini’s multimodal capabilities to analyze each screenshot.

Given a single video frame, Gemini:

  • Identifies visually distinct, purchasable objects
  • Extracts attributes such as color, material, and form factor
  • Produces search-ready ecommerce queries
  • Uses function calling to determine which product searches should be executed

Rather than hard-coding detection logic, Gemini acts as an intelligent agent — reasoning over the image and dynamically selecting the most relevant commerce queries.

This allows the system to work across:

  • lifestyle vlogs
  • study-with-me videos
  • desk setups
  • everyday YouTube content

3. Shopify Catalog API (Commerce Layer)

Once Gemini generates product queries, the backend calls Shopify’s Catalog API, which provides global product discovery across eligible Shopify merchants.

For each detected item, we:

  • Search Shopify’s centralized catalog
  • Retrieve live product listings
  • Normalize pricing and imagery
  • Return real storefront links

All results are:

  • Sold by Shopify merchants
  • Available in real stores
  • Retrieved without scraping or manual indexing

🧠 What we learned

Building Shop the Frame taught us several key lessons:

  • Multimodal AI is most powerful when paired with tools.
    Gemini’s reasoning combined with function calling enabled far richer outcomes than image labeling alone.

  • Commerce discovery is fundamentally a UX problem.
    Users don’t want more ads — they want answers at the moment curiosity occurs.

  • Screenshot-based interaction dramatically simplifies video analysis.
    A single frame provides enough semantic context without complex video pipelines.

  • Shopify’s Catalog API enables entirely new discovery surfaces.
    Commerce no longer has to live only inside storefronts.


🧗 Challenges we faced

🎥 Video frame capture

Capturing YouTube frames reliably required careful handling of video playback states, canvas rendering, and resolution scaling.


🧠 Avoiding hallucinated products

Vision models can infer details that may not exist. We mitigated this by:

  • Only allowing brand detection when logos or text were clearly visible
  • Enforcing strict JSON schemas
  • Filtering for clearly purchasable items

⚡ Latency and concurrency

Each frame can produce multiple product searches. We addressed this by:

  • Parallelizing Shopify Catalog queries
  • Caching access tokens
  • Balancing response speed with result quality

🎯 Keeping the experience organic

We intentionally avoided staged or sponsored content. Designing the system to work on normal YouTube videos — not ads — was one of the most important challenges.


🌟 Why this matters

Shop the Frame introduces a new commerce surface:

  • For viewers: instant product discovery
  • For creators: monetization without intrusive ads
  • For merchants: organic exposure beyond storefronts

It demonstrates how multimodal AI + agentic reasoning + commerce APIs can redefine how people interact with video content.


🧰 Built with

Languages

  • JavaScript (ES6+)

Frontend

  • Chrome Extension (Manifest V3)
  • HTML / CSS
  • Browser Canvas & Media APIs

Backend

  • Node.js 18
  • Express.js
  • Multer (image uploads)
  • Native Fetch API

AI & Machine Learning

  • Google Gemini Vision
  • Gemini Function Calling

Commerce & APIs

  • Shopify Catalog API
  • Shopify MCP tool interface

🏁 Closing

Shop the Frame transforms moments of curiosity into moments of commerce.

By combining Gemini’s multimodal reasoning with Shopify’s global product catalog, we demonstrate a future where anything you see — not just what’s advertised — can be instantly shopped.

Share this project:

Updates