Upload Page before Upload
Upload Page after videos upload
Landing Page
Traces for the Pipeline
Ingestion Page for the video upload

Project Story

About the Project

Open Stitch was inspired by our shared passion for video content creation. While creating content is fun and expressive, we repeatedly found that the editing phase was the most tedious and time-consuming part of the workflow. Trimming clips, syncing audio, structuring timelines, and rendering final edits required repetitive manual effort that slowed down creativity.

We wanted a way to let creators focus on storytelling rather than technical editing. This led to the core idea behind Open Stitch: an AI-powered system that can automatically transform raw video clips into a polished final video based on natural language instructions. Instead of manually arranging clips, users simply describe their vision and let the system handle the editing pipeline.

What Inspired Us

Our inspiration came from observing how content creators spend hours performing mechanical editing tasks that don’t add creative value. The creative intent is often clear in the creator’s mind, but translating that intent into a finished video requires navigating complex editing tools and workflows.

We envisioned a startup concept where AI acts as an intelligent editor. The goal was to reduce friction between creative intent and final output. Open Stitch aims to bridge this gap by using AI agents to interpret instructions, analyze video content, and automatically synthesize edits into a coherent timeline.

How We Built It

Open Stitch is built as a full-stack AI video editing pipeline. The backend uses FastAPI to orchestrate the workflow, while the frontend uses React and Vite to provide an interactive user interface. The system integrates multiple AI components, including speech recognition, vision-language models, and agent-based planning, to generate structured edit plans and render final compositions.

At a high level, the pipeline ingests raw video clips, extracts frames and audio, and processes them through ASR and vision-language analysis. These signals are merged into a unified timeline representation, which is then handled by a series of graph-managed agents responsible for planning, synthesis, verification, and final QA. The resulting edit plan is rendered automatically into a final video composition.

We designed the system using a graph orchestration approach so that each agent has a specific responsibility, such as clarification, planning, synthesis, or verification. This modular architecture keeps the pipeline robust, debuggable, and capable of automated retries and validation checks before rendering.

What We Learned

Building Open Stitch taught us how to design complex AI systems that coordinate multiple models and processing stages. We learned how to combine speech recognition outputs, visual scene understanding, and structured planning into a single coherent editing workflow. We also gained experience orchestrating AI agents through a graph-based execution model, which provided stronger control over reliability and state transitions.

Additionally, we developed a deeper understanding of how full-stack systems operate in production-like environments. This included managing containerized infrastructure, handling asynchronous processing, and designing APIs that connect frontend interactions to long-running AI pipelines.

Challenges We Faced

One of the biggest challenges was ensuring consistency across the entire pipeline. Each component—speech recognition, visual analysis, planning, and rendering—produces outputs with different formats and uncertainties. Aligning these signals into a single deterministic edit timeline required careful schema design and validation logic.

Another challenge was balancing automation with user control. Fully automated editing risks misinterpreting creative intent, so we introduced clarification and verification stages to confirm structured prompts before synthesis. This helped maintain alignment between the creator’s vision and the generated output.

Performance and reliability were also ongoing concerns. Processing video data is computationally expensive, and orchestrating multiple AI steps introduced potential points of failure. Implementing fallback flows, deterministic checks, and QA gates ensured that the system could recover gracefully and still produce a usable final render.

Looking Forward

Open Stitch represents our vision of the future of content creation: a workflow where creators describe their ideas, and AI handles the heavy lifting of editing. By automating the most tedious part of video production, we aim to empower creators to focus on creativity, storytelling, and experimentation rather than manual technical work.