Inspiration

Short attention spans have driven the demand for quick, engaging, bite-sized content, particularly among the younger generation. We were primarily inspired by Fireship videos, which explain complex development concepts in a modern, fast-paced, and highly engaging style.

Existing AI video generators often rely on diffusion models, which frequently result in spatial and temporal inconsistencies (e.g., objects changing size, disappearing, or morphing between frames). Our key differentiator is that we use code to define the video, ensuring complete coherence and consistency. Our pipeline meticulously emulates the process of a human video editor: planning, scripting, asset retrieval, and final composition.


What it Does

Simply type in any concept you wish to have explained, select a character, and our video editing pipeline takes over:

  • Generates a comprehensive plan and script for the video.
  • Generates narration using the voice of the selected character.
  • Gathers relevant images and selects the optimal candidate for each segment.
  • Gathers and places relevant sound effects according to the script.
  • Stitches all assets together using code.
  • Renders the code into the final video.

How We Built It

  • A robust, end-to-end video-generation pipeline that handles user input through to final video creation.
  • A custom animation engine designed to ensure smooth and fluid motion.
  • Frontend: Svelte / Typescript
  • Backend: Python
  • Animation: Unity / C#

Challenges We Ran Into

  • Optimizing runtime: To keep video generation fast, we implemented parallel processing, running the same LLM prompt multiple times simultaneously and selecting the best result to minimize errors. We also decoupled image retrieval from script generation, fetching images only after the plan is finalized, rather than making the script dependent on them. This approach significantly reduced delays and improved overall efficiency.

Accomplishments We're Proud Of

  • We successfully created custom, replicable voices for our characters using XTTS technology.
  • Our interactive loading screen keeps users engaged with a cute, real-time animation that communicates the current generation progress while the script is being generated.
  • The sound effects, narration, images, and animations are seamlessly integrated, resulting in a cohesive and polished final video.
  • Our generated videos are factually accurate and tailored to fit our target audience, ensuring clarity and reliability of the content.

What’s Next for "Replace your Professor"?

  • Expand customization options for characters and voice styles.
  • Improve the editing workflow, allowing users to fine-tune videos post-generation with ease.
  • Continue refining the pipeline for faster, more dynamic, and more interactive video creation.

Built With

Share this project:

Updates