add test if dataset samples can be collated#5233
Conversation
💊 CI failures summary and remediationsAs of commit 18799db (more details on the Dr. CI page):
🕵️ 9 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| Job | Step | Action |
|---|---|---|
| Run tests | 🔁 rerun |
🚧 1 ongoing upstream failure:
These were probably caused by upstream breakages that are not fixed yet.
- unittest_prototype since Jan 14 (adf8466)
This comment was automatically generated by Dr. CI (expand for details).
Please report bugs/suggestions to the (internal) Dr. CI Users group.
|
Summing up what we discussed with Philip offline: The issue here is that some of our datasets (those above) can have The default collate function of DataLoader cannot handle dataset = datasets.CLEVRClassification("~/datasets", split="test", transform=transforms.PILToTensor())
data_loader = DataLoader(dataset, batch_size=3)A decent workaround for these right now is to write a custom collate function that doesn't try to wrap def collate_fn(batch):
imgs = [img for (img, _) in batch]
return torch.stack(imgs), [None] * len(batch)We'll reach out to the torchdata team to see what they think of this. Perhaps the default collate function could start accepting |
|
QQ: If there is no target in test set, why not emit target and only keep the |
|
It's because we want to keep the returned types / length consistent across splits. Otherwise users need to unpack the returned values and check whether there is a target or not, which can be a cumbersome |
Makes sense. And, what would we expect in the batch before collate if target is
And, we need a custom map (collate) function for either case. def collate_fn(batch):
imgs, targets = tuple(zip(*batch))
return default_collate(imgs), NoneCase 2: def collate_fn(batch):
return default_collate(batch), NoneI'm using |
That is the root of the problem: we cannot say past |
|
I've summarized my proposal here. Please have a look @NicolasHug, @ejguan, @NivekT, @VitalyFedyunin |
The following datasets currently fail the test:
WIDERFaceKittiSintelKittiFlowHD1KFER2013CLEVRClassificationOxfordIIITPetThey all fail since they return
Nonein at least one configuration anddefault_collatedoes not support it. This in turn means that the datasets with the offending configurations cannot be used in atorch.utils.data.DataLoader(dataset)without passing a customcollate_fn.cc @pmeier