OptiMulti-Video

High-Performance Multimodal Attention with Custom CUDA Kernels

This project demonstrates a vertical slice of a high-performance Multimodal AI system. It features:

Custom CUDA Kernel: A fused "Normalize & Project" kernel written in CUDA C++ for low-latency fusion of video and text embeddings.
Multimodal Architecture: A compact Video-Text Transformer model.
Distributed Training: FSDP (Fully Sharded Data Parallel) training loop designed to run on dual T4 GPUs (available for free on Kaggle/Colab).

How to Run on Google Colab / Kaggle For Testing Purposes:

Push to GitHub: Sync this local folder to a public GitHub repository named OptiMulti-Video.
Open the Notebook: Upload notebooks/colab_demo.ipynb to Google Colab.
Run: The notebook is pre-configured to:
- Clone your repository.
- Install dependencies.
- Compile the Custom CUDA kernel (JIT compilation).
- Run the FSDP training demo.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
model		model
notebooks		notebooks
src		src
training		training
README.md		README.md
setup.py		setup.py
walkthrough.md		walkthrough.md