🐛 Bug
The following exception is raised:
RuntimeError: Pin memory thread exited unexpectedly
This is happening at the end of an epoch (using PyTorch Lightning). Here's a (poorly formatted) traceback:
for batch_idx, (batch, is_last_batch) in train_dataloader:
File ".../pytorch_lightning/profiler/profilers.py", line 80, in
profile_iterable
value = next(iterator)
File ".../pytorch_lightning/trainer/connectors/data_connector.py
", line 45, in _with_is_last
it = iter(iterable)
File ".../torch/utils/data/dataloader.py", line 356, in __iter__
self._iterator._reset(self)
File ".../torch/utils/data/dataloader.py", line 936, in _reset
data = self._get_data()
File ".../torch/utils/data/dataloader.py", line 1113, in _get_data
raise RuntimeError('Pin memory thread exited unexpectedly')
RuntimeError: Pin memory thread exited unexpectedly
Since it's at the end of an epoch, the data loader is being reset (see self._iterator._reset() in the traceback), and for reasons I can't understand, it causes issues with the pin memory thread.
This of course happens with pin_memory=True (as well as num_workers=8, prefetch_factor=8, persistent_workers=True). I have a suspicion it might be due to persistent_workers=True.
Environment
I unfortunately don't have access to easily run the environment collection script. I will figure this out if it's critical; however, it's a very recently built version of PyTorch running on Linux with V100 GPUs.
cc @ssnl @VitalyFedyunin @ejguan
🐛 Bug
The following exception is raised:
This is happening at the end of an epoch (using PyTorch Lightning). Here's a (poorly formatted) traceback:
Since it's at the end of an epoch, the data loader is being reset (see
self._iterator._reset()in the traceback), and for reasons I can't understand, it causes issues with the pin memory thread.This of course happens with pin_memory=True (as well as
num_workers=8, prefetch_factor=8, persistent_workers=True). I have a suspicion it might be due to persistent_workers=True.Environment
I unfortunately don't have access to easily run the environment collection script. I will figure this out if it's critical; however, it's a very recently built version of PyTorch running on Linux with V100 GPUs.
cc @ssnl @VitalyFedyunin @ejguan