📚 The doc issue
I was trying torchdata 0.4.0 and I found that shuffling with data pipes will always yield the same result across different epochs, unless I shuffle it again at the beginning of every epoch.
# same_result.py
import torch
import torchdata.datapipes as dp
X = torch.randn(200, 5)
dpX = dp.map.SequenceWrapper(X)
dpXS = dpX.shuffle()
for _ in range(5):
for i in dpXS:
print(i) # always prints the same value
break
# different_result.py
import torch
import torchdata.datapipes as dp
X = torch.randn(200, 5)
dpX = dp.map.SequenceWrapper(X)
for _ in range(5):
dpXS = dpX.shuffle()
for i in dpXS:
print(i) # prints different values
break
I wonder what is the recommended practice to shuffle the data at the beginning of every epoch? Neither the documentation nor the examples seem to answer this question.
Suggest a potential alternative/fix
No response