Skip to content

GVCLab/CutClaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CutClaw teaser

🦞CutClaw: Agentic Hours-Long Video Editing via Music Synchronization

🎬 Your personal editor for turning hours of footage into cinematic montages.

arXiv GitHub Stars

Hours-Long Footage Music Beat Sync Instruction Following One-Click Editing LiteLLM Powered

Overview β€’ Features β€’ Gallery β€’ Quick Start β€’ Troubleshooting β€’ Citation


Demo.mp4

πŸ’‘ Overview

CutClaw is an end-to-end editing system for long-form footage + music.

It first deconstructs raw video/audio into structured captions, then uses a multi-agent pipeline to plan shots (shot_plan), select clip timestamps (shot_point), and validate final quality before rendering.

CutClaw Pipeline


✨ Key Features

🎬 One-Click Deconstruction

Long-Form Processing

Effortlessly transforms hours-long raw video and audio into structured, searchable assets with a single click.

🎯 Instruction Control

Text to Edit

Requires only one text instruction to steer the editing styleβ€”easily generating fast-paced character montages or slow-paced emotional narratives.

πŸ“± Smart Auto-Cropping

Smart Adaptation

Content-aware cropping automatically identifies core subjects and adjusts aspect ratios to fit various social platforms.

🎡 Music-Aware Sync

Audio Sync

Extracts musical beats and energy signals to build rhythm-aware cuts that perfectly match the music's pacing.


πŸ–ΌοΈ Gallery(remember to turn on the audioοΌ‰

dark_knight.mp4
kyoto.mp4
paprika.mp4
chongqing.mp4
interstellar.mp4
naruto.mp4
lalaland.mp4
swiss.mp4
titanic.mp4

πŸš€ Quick Start

1. Install

git clone https://github.com/GVCLab/CutClaw.git
cd CutClaw
conda create -n CutClaw python=3.12
conda activate CutClaw
pip install -r requirements.txt

We strongly recommend the GPU-accelerated Decord/NVDEC build for faster video decoding. Build from source.

2. Add your files

resource/
β”œβ”€β”€ video/      ← put your .mp4 / .mkv here
β”œβ”€β”€ audio/      ← put your .mp3 / .wav here
└── subtitle/   ← optional .srt (skips ASR, saves time)

3. Run

UI (recommended)

streamlit run app.py

Then open http://localhost:8501 in your browser. (*If http://localhost:8501 does not work well, try http://127.0.0.1:8501)

CutClaw UI demo

Place your footage in the paths above, then you can directly select those files in the UI.

Model selection guidance:

  • Video model

    • Role: shot/scene understanding and visual captioning.
    • Recommended: Gemini-3, Qwen3.5, GPT-5.3
  • Audio model

    • Role: ASR plus music-structure parsing (beat/downbeat, pitch, energy) for music-aware segmentation.
    • Recommended: Gemini-3
  • Agent model

    • Role: drives the Screenwriter + Editor + Reviewer loop to generate shot_plan and shot_point.
    • Recommended: MiniMax-2.7, Kimi-2.5, Claude-4.5

We leverage LiteLLM as the api manager gateway, the typical Model name is e.g. 'openai/MiniMax-2.7' which means using openai protocol to call the given model, more information see LiteLLM documents.

CLI (advanced)
python local_run.py \
  --Video_Path "resource/video/xxxx.mp4" \
  --Audio_Path "resource/audio/xxxx.mp3" \
  --Instruction "xxxx"
Common config overrides

Any src/config.py parameter can be overridden with --config.PARAM_NAME VALUE.

Parameter Default Effect
VIDEO_PATH "resource/video/The_Dark_Knight.mkv" Default input video path used by UI remembered inputs
AUDIO_PATH "resource/audio/Way_Down_We_Go.mp3" Default input audio path used by UI remembered inputs
INSTRUCTION "Joker's crazy that want to change the world." Default editing instruction prompt
ASR_BACKEND "litellm" ASR engine (litellm cloud or whisper_cpp local)
VIDEO_FPS 2 Sampling FPS for preprocessing
MAIN_CHARACTER_NAME "Joker" Protagonist name for character-focused edits
AUDIO_MIN_SEGMENT_DURATION 3.0 Minimum beat segment duration (seconds)
AUDIO_MAX_SEGMENT_DURATION 5.0 Maximum beat segment duration (seconds)
AUDIO_DETECTION_METHODS ["downbeat", "pitch", "mel_energy"] Audio keypoint detection methods
PARALLEL_SHOT_MAX_WORKERS 4 Parallel shot selection workers

Example:

python local_run.py \
  --Video_Path "resource/video/xxxx.mp4" \
  --Audio_Path "resource/audio/xxxx.mp3" \
  --Instruction "xxxx" \
  --config.MAIN_CHARACTER_NAME "Batman" \
  --config.VIDEO_FPS 2 \
  --config.AUDIO_TOTAL_SHOTS 50

Then render manually:

python render/render_video.py \
  --shot-plan  "Output/<video_audio>/shot_plan_*.json" \
  --shot-json  "Output/<video_audio>/shot_point_*.json" \
  --video  "resource/video/xxxx.mp4" \
  --audio  "resource/audio/xxxx.mp3" \
  --output "output/final.mp4" \
  --crop-ratio "9:16" \
  --no-labels --render-hook-dialogue

πŸ› οΈ Troubleshooting

Very slow runtime

  1. API latency β€” the pipeline sends a large number of concurrent requests to vision/language APIs. Speed is heavily dependent on your API provider's response time and rate limits.
  2. First-run Footage Deconstruction β€” the first time you process a video, shot detection, captioning, ASR, and scene analysis all run from scratch. This is a one-time cost per video; subsequent edits with the same footage reuse the cached results and are much faster.
  3. GPU acceleration β€” a CUDA-capable GPU significantly speeds up video decoding and encoding. We recommend building Decord with NVDEC support (see Install section).
  4. Video codec compatibility β€” if the pipeline appears to hang during video-related steps, the source video's encoding may be the cause. In our testing, videos encoded with libx264 worked reliably.

⭐ Citation

If you find CutClaw useful for your research, welcome to cite our work using the following BibTeX:

@article{cutclaw,
 title={CutClaw: Agentic Hours-Long Video Editing via Music Synchronization},
 author={Shifang Zhao, Yihan Hu, Ying Shan, Yunchao Wei, Xiaodong Cun},
 journal={arXiv preprint arXiv:2603.29664},
 year={2026}
}

About

CutClaw: Agentic Hours-Long Video Editing via Music Synchronization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages