Conversation
vfdev-5
left a comment
There was a problem hiding this comment.
Thanks for the draft ! I left few comments
|
|
||
|
|
||
| class MovingMNIST(VisionDataset): | ||
| """MovingMNIST""" |
There was a problem hiding this comment.
Could you please a link (http://www.cs.toronto.edu/~nitish/unsupervised_video/) to the dataset like that
`MNIST <http://yann.lecun.com/exdb/mnist/>`_ Dataset.
and define docstring Args etc as it is done for other datasets.
| return int(codecs.encode(b, 'hex'), 16) | ||
|
|
||
|
|
||
| class MovingMNIST(VisionDataset): |
There was a problem hiding this comment.
Can't we inherit it from MNIST ?
There was a problem hiding this comment.
If we use this as a video dataset, we shouldn't. We probably need to use MNIST to generate the training split though.
pmeier
left a comment
There was a problem hiding this comment.
Hey @vballoli,
thanks a lot for this PR. We discussed this internally and have some comments on this:
MovingMNISTis a video dataset, i. e. the individual frames have a strong temporal relation. Currently you are treating it as an image dataset. We should return 4D tensor representing a single video (channels x depth x height x weight) rather than an image with multiple channels (channels x height x weight). This also means that we shouldn't treat the frames individually for the transforms.- You only implemented the test split, since this is the only generated data provided by the author. Since the authors also provide the code to generate the sequences (see for example the implementation by
tensorflow), it might be beneficial to also provide a train split. If we do this, we need to decide if we want to generate the sequences ahead-of-time or on-the-fly. Depending on how large the training split will be, the former might be to much on a burden on the memory (even the test split is ~1GB and resides completely in memory). - Although the authors split the sequences of two with 10 frames each, this should not be required since
MovingMNISTis marketed as an unsupervised dataset. This is especially important if we generate the trainings data with a deviating number of frames. IMO we should simply return a single video. Maybe we could provide a (static) method that implements this split by the authors.
Let us know what you think and if you are willing to continue working on this.
| return int(codecs.encode(b, 'hex'), 16) | ||
|
|
||
|
|
||
| class MovingMNIST(VisionDataset): |
There was a problem hiding this comment.
If we use this as a video dataset, we shouldn't. We probably need to use MNIST to generate the training split though.
| self.file))) | ||
|
|
||
|
|
||
| def __getitem__(self, index: int) -> Tuple[Any, Any]: |
There was a problem hiding this comment.
Without a length, the dataset is not iterable.
| def __getitem__(self, index: int) -> Tuple[Any, Any]: | |
| def __len__(self) -> int: | |
| return len(self.data) | |
| def __getitem__(self, index: int) -> Tuple[Any, Any]: |
| with open(path, 'rb') as f: | ||
| x = torch.tensor(np.load(f)) | ||
| assert(x.dtype == torch.uint8) | ||
| return x No newline at end of file |
There was a problem hiding this comment.
Nit
| return x | |
| return x | |
|
To complement @pmeier answer, if you could let the dataset return a single tensor of shape |
|
@pmeier Somebody working on closing this PR? If no, I can help. |
|
Thanks @vballoli for this great contribution and sorry for the long time to get a new reply! Unfortunately, there is a new datasets API in the works here and new datasets should follow the new API design. That being said, rest assured, there are at least two options:
Hope one of the two options works for you @vballoli. 👌 |
PR in reference to #2676. Currently in draft since the dataset train-test split needs discussion. Inputs regarding how to better handle the train-test arguments is appreciated! @vfdev-5 @pmeier @fmassa