Compatibility Report: Running Stream-DiffVSR on Modern Hardware (RTX 50-Series / Blackwell) & Dependency Concerns



Hello Stream-DiffVSR Team,

First of all, thank you for sharing this work! The results – once running – are impressive, and the approach to low-latency VSR is very promising.

I am writing this report to share my experience trying to run this repository on brand-new hardware (NVIDIA RTX 5070 Ti, 16GB VRAM) and a modern Linux environment (Arch-based). I ran into significant friction due to the legacy dependency stack defined in requirements.yml, which seems out of sync with current hardware capabilities.

Here is a summary of the hurdles and the necessary fixes, along with a question regarding the project's environment choices.

1. The "Legacy Stack" vs. Modern Hardware Conflict
The repository enforces Python 3.8 / 3.9, CUDA 11, and PyTorch 2.0.1.
However, modern GPUs (RTX 50-series / Blackwell architecture, sm_120) require CUDA 12.6+ and PyTorch Nightly (2.6+) to function correctly. The provided requirements.yml is physically incompatible with this hardware generation.

Impact: Installing via the provided Conda instructions leads to Error 804: forward compatibility was attempted on non supported HW or simply fails to detect the GPU.
Workaround: I had to manually build a Python 3.11 environment and install PyTorch Nightly (cu124/cu126 build) to get the GPU recognized.
2. VRAM Management on 16GB Cards (OOM Issues)
Despite having 16GB VRAM, the default inference.py script crashes with CUDA Out Of Memory even at moderate resolutions (e.g., 512x512 input -> 2048x2048 output).
The script loads the entire pipeline (ControlNet, UNet, VAE, RAFT) into VRAM simultaneously. While this might work on an RTX 3090/4090 (24GB), it renders the tool unusable on high-end consumer cards (16GB).

Workaround: I had to modify inference.py to include standard Diffusers memory optimizations:
code
Python
# Replaced pipeline.to(device) with:
pipeline.enable_model_cpu_offload()
pipeline.enable_vae_slicing()
3. Precision Mismatch in Manual Optimization
When attempting to reduce VRAM usage by casting the Optical Flow model (RAFT) to half-precision (.half()), the inference loop crashes with RuntimeError: Input type (float) and bias type (c10::Half) should be the same.
The inference loop lacks an automatic mixed-precision context.

Workaround: Wrapping the inference call in with torch.autocast("cuda"): solved this and allowed the pipeline to run smoothly with significantly reduced memory footprint.
Question & Suggestion
Given that Stream-DiffVSR is a recent release (late 2024/2025) and targets high-performance scenarios, why was the decision made to base the codebase on the aging CUDA 11 / Python 3.8 ecosystem?

This legacy stack makes it incredibly difficult to deploy the software on modern systems without "fighting" the package manager. Updating the baseline to Python 3.10+ and PyTorch 2.4+ (CUDA 12) would inherently solve many compatibility issues with newer NVIDIA drivers and hardware.

I hope this report helps to improve the robustness of the repository for future users!

Best regards,
A Linux & RTX 50-Series User

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compatibility Report: Running Stream-DiffVSR on Modern Hardware (RTX 50-Series / Blackwell) & Dependency Concerns #10

Replaced pipeline.to(device) with:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Compatibility Report: Running Stream-DiffVSR on Modern Hardware (RTX 50-Series / Blackwell) & Dependency Concerns #10

Description

Replaced pipeline.to(device) with:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions