As a deep learning engineer, leveraging the massive parallel processing power of GPUs is critical for accelerating neural network training. Compared to CPUs, high-end GPUs can provide over 50x speedups that reduce training times from weeks to hours.
In this comprehensive guide, I‘ll share the key methods for checking if PyTorch is utilizing your Nvidia or AMD GPU.
Why Check if PyTorch is Using the GPU?
Here‘s a benchmark of training the ResNet-50 model on CPU vs GPU hardware:
| Hardware | Training Time | Speedup vs CPU |
|---|---|---|
| Intel i9-9900K CPU | 15 hours | 1x |
| Nvidia RTX 2080 Ti | 18 mins | 50x |
As you can see, the GPU provides enormous performance benefits. By verifying PyTorch takes advantage of your graphics card, you ensure fast training.
Checking also helps troubleshoot if there are any issues leveraging the GPU like:
- No Nvidia/AMD drivers installed
- CUDA not configured properly
- Old GPU lacking compute capability
- Resource contention from other processes
Next, let‘s explore how to check if PyTorch is using the GPU.
1. Check CUDA Availability
PyTorch relies on the Nvidia CUDA deep neural network (DNN) library and driver for GPU acceleration. We can check if CUDA is available:
import torch
if torch.cuda.is_available():
print("CUDA is available")
else:
print("CUDA NOT available")
This tests if PyTorch can access Nvidia GPUs with CUDA installed and configured. However, it doesn‘t confirm active usage.
On AMD or Apple M series GPUs, .is_available() returns false. There is experimental support with ROCm and Metal but no CUDA.
2. Detect Active CUDA Tensors
When you move a tensor to the GPU with .cuda(), PyTorch automatically begins using the graphics card for all operations:
import torch
x = torch.rand(5, 5) # CPU tensor
if torch.cuda.is_available():
x = x.cuda() # Copies x to GPU
print(f"X tensor on {x.device}")
X tensor on cuda:0
The .device shows the active device. We can see X is a CUDA tensor allocated on GPU 0.
Internally, CUDA tensors allocate memory on the GPU and execute kernels.

Visualization of CUDA tensor data movement (Credit: Nvidia)
Now GPU usage begins for computing tensor operations, accelerating training.
To free up GPU memory, move a tensor back to CPU with x.cpu().
3. Getting Active GPU Properties
We can check which GPU model PyTorch is using and query other properties like memory usage:
import torch
if torch.cuda.is_available():
gpu_index = 0
gpu = torch.cuda.get_device_properties(gpu_index)
print(f"GPU: {gpu.name}")
print(f"Memory: {gpu.total_memory / 1024**3 :.2f} GB")
GPU: Nvidia GeForce RTX 3090
Memory: 24.77 GB
This is useful to confirm the expected GPU or check if there‘s enough free memory for large datasets.
4. Monitor GPU Utilization
While training neural networks, we can plot GPU utilization to optimize resource usage:
import GPUtil
import time
gpu = GPUtil.getFirstAvailable()
while training:
gpu_load = gpu.load*100
print(f"GPU load: {gpu_load}%)
time.sleep(10)
This loops every 10 seconds fetching the GPU load percentage. The output may look like:
GPU load: 86%
GPU load: 97%
GPU load: 96%
If GPU load drops below 90-95%, there could be unused memory bandwidth impacting speed. We can tweak model batch sizes for fuller utilization.
Hardware monitoring showing GPU utilization over time (Credit: Nvidia)
5. Using Multiple CUDA Devices
If you have multiple Nvidia GPUs for training very large models, PyTorch can leverage them with:
torch.nn.DataParallel
torch.distributed.DistributedDataParallel
This parallelizes training across GPUs, giving nearly linear speedups.
To check multi-GPU usage:
import torch
print(torch.cuda.device_count())
if torch.cuda.device_count() > 1:
print("Using Multiple GPUs!")
We request the available device count, then confirm >1 devices are active.
6. Speed Up GPU Training
There are a few best practices to maximize GPU performance in PyTorch:
Batched Data: Always use batched inputs instead of single examples. Typical batch sizes range from 32-512. The DataLoader helps automatically batchify data.
Half Precision: Switch model and input tensors to 16-bit with .half(). Halves memory usage allowing larger batch sizes.
CUDA Events: Profile kernel timings with torch.cuda.Event() to identify bottlenecks.
Employ these together with a fast development workflow to improve productivity.
Example Training Loop
Here is an example training loop using the GPU efficiently for computer vision:
# Dataset with CuDNN image preprocessing
train_dataset = ImageFolder(train_dir).cuda()
# Half precision model + optimzier
model = ResNet18().half()
optimizer = optim.SGD(model.parameters(), lr=0.001)
for epoch in range(10):
optimizer.zero_grad()
inputs, labels = next(iter(train_loader))
inputs = inputs.half().cuda()
outputs = model(inputs)
loss = criterion(outputs, labels)
# Backprop
loss.backward()
optimizer.step()
print (f"Epoch {epoch}, Loss: {loss.item():.4f}")
This fetches our next GPU batch, casts tensors to half float, forwards through the model, then runs backpropagation–greatly accelerating over CPU.
7. Troubleshooting CUDA Errors
If encountering CUDA errors like:
CUDA error: no kernel image is available for execution on the device
Try these troubleshooting tips:
- Update to the latest Nvidia drivers
- Reinstall PyTorch (pip uninstall then install torch)
- Try a different GPU slot on machine if available
- Confirm GPU compute compatibility for CUDA
- Disable any GPU overclocking/underclocking
Checking the output and logs helps diagnose CUDA configuration issues blocking GPU usage.
Conclusion
I hope this guide has equipped you to thoroughly validate and monitor if PyTorch is leveraging Nvidia or AMD GPUs for accelerating deep learning.
Some key takeaways:
- Check for CUDA availability with
torch.cuda.is_available() - Detect active CUDA tensors on GPU using
.device - Monitor GPU load for optimization opportunities
- Employ batching, half precision, CUDA events for speedups
- Tweak model architecture and data pipelines to maximize throughput
Properly accessing hardware accelerators unlocks order-of-magnitude training speedups critical for delivering production models faster.
Now go leverage those GPUs to train your neural networks!


