Hi. I have trouble understanding how and when do you send tensors to the GPU. I have a custom training routine where I use your dataloader and some modules using the build_X methods. When the data comes from your dataloaders, neither the data nor the model is required to be in the GPU. The tensors must be in the CPU for the modules to handle, which otherwise will raise an error of mismatched memory allocation for input and model. For example, for an Anchor3dHead, the passed tensor is sent through a Conv2d layer (
|
cls_score = self.conv_cls(x) |
), and I cannot find anywhere along the way any .to(device) or .cuda(). Yet, the processing seems really fast. Can you give me some insights on how is this done?
Hi. I have trouble understanding how and when do you send tensors to the GPU. I have a custom training routine where I use your dataloader and some modules using the build_X methods. When the data comes from your dataloaders, neither the data nor the model is required to be in the GPU. The tensors must be in the CPU for the modules to handle, which otherwise will raise an error of mismatched memory allocation for input and model. For example, for an Anchor3dHead, the passed tensor is sent through a Conv2d layer (
mmdetection3d/mmdet3d/models/dense_heads/anchor3d_head.py
Line 150 in 78f4562