Skip to content

Conversation

@PatWie
Copy link
Contributor

@PatWie PatWie commented Oct 22, 2014

The order of read HDF5 files and the order of the entries of the HDF5 files can be shuffled when setting the flag shuffle in the hdf5data layer

@baeuml
Copy link
Contributor

baeuml commented Oct 24, 2014

Other than that it looks good to me! Cool!

@PatWie
Copy link
Contributor Author

PatWie commented Oct 24, 2014

I changed some parts. Should I open a new pull request, since rebasing does not work for me? I did a git push --force

@Yangqing
Copy link
Member

Could you also do a speed benchmark and see how shuffling affects typical read speed? It used to cause a lot of trouble when reading randomly from a leveldb. Usually large-scale datasets don't need shuffling that much so if speed is a concern, it might be better to keep sequential read.

(Since shuffling is turned off in default, I think having the capability is good.)

@PatWie
Copy link
Contributor Author

PatWie commented Oct 26, 2014

Not everybody is able to use Caffe on highend GPU for the ImageNet challenge ;-)
For smaller datasets it is crucial to use a random order (see issue #1249 from @bearpaw). Maybe the BVLC team should provide a standard hf5 dataset and netlayout for benchmarking code changes.

@shelhamer
Copy link
Member

@jeffdonahue can you review and merge if this looks good to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify this comment to explain that the HDF5 files themselves are shuffled but the order within any given file is fixed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything will be shuffled: hdf5 files and entries in these hdf5 files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I see that now, my bad. I think it should still be clarified though -- it's not actually a full shuffle of the dataset (i.e., some orderings of the dataset are impossible to obtain) unless you only have a single HDF5 file (or each HDF5 file only has a single entry).

@shelhamer shelhamer added the JD label Mar 7, 2015
@jeffdonahue jeffdonahue mentioned this pull request Mar 13, 2015
@jeffdonahue
Copy link
Contributor

Replaced by #2118.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants