Shenyuan Gao comments

Results 40 comments of


                                            Shenyuan Gao

CPU utils occupies a lot when inference

Hi. I think the following code may help you to solve this issue. In my case, the CPU occupation can be reduced by inserting these code, and the inference speed...

Positional encoding for time for temporal attention layers in SVD

None, but from their [code](https://github.com/Stability-AI/generative-models/blob/main/sgm/modules/video_attention.py#L266-L276), I think such cues are fed to temporal blocks via frame index embedding.

[Stable Video Diffusion] first frame is not equal to initial image

I think the minor difference may come from the loss of autoencoder.

[Stable Video Diffusion] first frame is not equal to initial image

Yes! I also think it will be amazing if its generation quality can be extended to much longer sequences. BTW, I guess the temporal-aware deflickering decoder may also affect the...

[Stable Video Diffusion] first frame is not equal to initial image

Yes, I have also tried the image decoder, but it doesn't help. The temporal-aware decoder can greatly eliminate the jittering but doesn't affect the content. To enable identical preservation, it...

RuntimeError: One of the differentiated Tensors does not require grad

For SD project, set `use_checkpoint` as `False` can solve this issue. It won't affect the performance, but may increase your GPU memory requirement a little bit.

Multi-gpu training

> I do implement the multi-gpu version for V2X-ViT, and the results actually raise some points. For people who want to use, please leave your email here. Thanks a lot...

Some questions about the implementation of spatial confidence-aware message fusion.

As far as I know, there is no official complete code currently open-sourced online. You need to implement the remaining functions according to the paper by yourself.

first frame of i2v

Hi. I am also curious about how you made it. Did you apply any other techniques to ensure the identity of the first frame, or did you simply fix the...

A question about the implementation of communication mask generation.

The public code is not perfectly organized with several missing parts, but it should be easy to reproduce their reported performance even you directly ignore and bypass those uncompleted implementations....