Skip to content

Add iNaturalist dataset#4123

Merged
fmassa merged 7 commits intomasterfrom
inat
Jul 1, 2021
Merged

Add iNaturalist dataset#4123
fmassa merged 7 commits intomasterfrom
inat

Conversation

@dgenzel
Copy link
Copy Markdown
Contributor

@dgenzel dgenzel commented Jun 25, 2021

Adding iNaturalist dataset from https://github.com/visipedia/inat_comp
This relies on the data files only, not using annotations.

Resolves #3292

@pmeier pmeier mentioned this pull request Jun 28, 2021
17 tasks
Copy link
Copy Markdown
Contributor

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dgenzel2 and thanks for the PR! While INaturalist is on the list of potential new datasets in #3562, I don't recall any decision on this. Did I miss something?

Copy link
Copy Markdown
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the quick PR Dmitriy!

@pmeier I've discussed with Dmitriy about working on this dataset as a good onboarding task. We've decided to only provide the labels for now, and not the bounding boxes.

I've done an initial pass and the PR looks good to me.

I made a few minor comments, but I'll leave @pmeier do a more thorough review.

Comment thread torchvision/datasets/inaturalist.py
Comment thread torchvision/datasets/__init__.py
Comment thread torchvision/datasets/inaturalist.py
Copy link
Copy Markdown
Contributor

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. I got some comments inline. Plus, could you add the dataset to the documentation?

Comment thread torchvision/datasets/inaturalist.py Outdated
Comment thread torchvision/datasets/inaturalist.py Outdated
Comment thread torchvision/datasets/inaturalist.py Outdated
@dgenzel
Copy link
Copy Markdown
Contributor Author

dgenzel commented Jun 30, 2021

It turned out that the format for earlier years was different, so I had to make some changes. But now download is supported, and I verified it manually.

Copy link
Copy Markdown
Contributor

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking image / video folders for integrity is not feasible, so we normally go another way: we skip the integrity check completely and bail out if we encounter already extracted folders together with download=True:

if path.exists(self.split_folder):
raise RuntimeError(
f"The directory {self.split_folder} already exists. "
f"If you want to re-download or re-extract the images, delete the directory."
)

IMO we should adopt the same approach here, to avoid accidentally downloading again.

Comment thread torchvision/datasets/inaturalist.py Outdated
Copy link
Copy Markdown
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking pretty good, thanks a lot Dmitriy!

I've left a minor comment that can be addressed in follow-up PRs. @pmeier I'm merging this PR, but let us know if you have further comments and we can address it in a follow-up PR

Comment thread test/test_datasets.py

ADDITIONAL_CONFIGS = datasets_utils.combinations_grid(
target_type=("kingdom", "full", "genus", ["kingdom", "phylum", "class", "order", "family", "genus", "full"]),
version=("2021_train",),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future, it would be good to also test for the other years, as they contain different code-paths in the initialization phase

@fmassa fmassa merged commit ef71159 into master Jul 1, 2021
@fmassa fmassa deleted the inat branch July 1, 2021 18:33
@fmassa
Copy link
Copy Markdown
Member

fmassa commented Jul 1, 2021

Failures are unrelated, merging

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jul 1, 2021

Hey @fmassa!

You merged this PR, but no labels were added.

facebook-github-bot pushed a commit that referenced this pull request Jul 12, 2021
Summary:
* Add iNaturalist dataset

* Add download support

* address comments

Reviewed By: fmassa

Differential Revision: D29659493

fbshipit-source-id: 9bdb53c24aeb6fdba9cf0604f1f824ed506d3c89

Co-authored-by: dgenzel <dgenzel@fb.com>
Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
@ferreirafabio
Copy link
Copy Markdown

The torchvision iNaturalist dataset code does not allow to load the test split, e.g. 2017 or 2018 test split. What's the suggestion how to use the torchvision code when one also needs the test split?

@pmeier
Copy link
Copy Markdown
Contributor

pmeier commented Sep 19, 2022

What's the suggestion how to use the torchvision code when one also needs the test split?

Unfortunately, there is none at the moment. We are working on revamping our datasets API after which all splits will be supported. But this is not ready yet.

We could introduce the test splits on the current API by returning None for the labels. Some of our datasets already do this, but this is not supported by the default collation. @ferreirafabio Could you open an issue, so we can discuss there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add iNaturalist dataset

5 participants