-
Notifications
You must be signed in to change notification settings - Fork 18.6k
leveldb/lmdb refactoring #1238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leveldb/lmdb refactoring #1238
Conversation
|
@shelhamer Sure, no problem. #1261 has the OpenCV 3 and other minor fixes. I updated create_cifar10.sh to use lmdb, but I did not touch other build scripts. |
tools/extract_features.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One should be able to specify the type of database, instead of assume "leveldb"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Overall a nice PR, @kmatzen please address the comments above before merging. |
|
Just a few more comments about the interface that I don't quite like. (1) If there a failure anywhere (e.g. commit when the db was open in RO mode), the whole thing just fails on a |
|
@kmatzen thanks for your updates. Regarding your questions:
One more thing, I'm not sure if Would be possible to get a vector with all the keys, by calling |
|
Referencing previous comment. (1) Done. https://github.com/kmatzen/caffe/commit/175192eb8a522fd0d7b2eb88b646abeb646d4e00 buffer_t renaming: Done. https://github.com/kmatzen/caffe/commit/aee2b9543a8648c573b2b839a922634e8f17bcda keys(): Done. https://github.com/kmatzen/caffe/commit/66316564204222d2ef63bb54c808c645c96d8886 |
|
This a very good example to prove the power of "programing to an interface, not an implementation". |
|
The tests for different db implmentations are almost the same. They can reuse the common parts. |
include/caffe/database.hpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between vector<char> and string? It seems that you are following the conventions of the standard template library. Why not make Database more generic?
template<typename KeyType, typename ValueType>
class Database {
...
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Please pay attention to properly ordering the keys as described in #1158. |
|
Besides the extract_features tool, there are also a few other tools that directly use the raw data store. |
|
#1266 tries to get input data from a remote service which is not a database. A more generic abstraction is better called Dataset. |
…finable KCoder and VCoder which default to a set of DefaultCoder's based on types K and V. Reworked the DefaultCoder's such that if none are available, a static assertion fails with a relevant message.
|
Rebased |
|
@kmatzen is this PR ready to be reviewed again and merged? |
|
Yes, go for it. |
|
While this PR has unified the leveldb and lmdb interfaces the joint interface has yielded a net addition of 1,000+ lines of code while introducing new classes. In my opinion it's desirable to cut this back and simplify as I'm having a hard time reasoning about the DB operations now -- although I admit that could be my own failing. My concern is sharpened by the LMDB crash I am now encountering with my reshaping data layer #1313 although all it does is use |
|
@shelhamer I also spend quite a bit of time trying to understand it. I think we need more test for this, like for instance I had a similar error while doing Maybe the @shelhamer do you encounter the same error when using LEVELDB? |
|
See #1319 and let me know if that helps. |
|
With @shelhamer, I am concerned about the haste with which we are merging code here and in #1239. I had hoped this PR would reduce the duplication and overhead of working with the databases, resulting in a mildly negative net number of lines of code added. Instead we are up 1400. That's too much for me to give a full and coherent analysis of right now, but here are a few concerns that come to mind on a first read:
I'm sure there may be some good reasons for the above that I've missed, but, as @shelhamer and @sguada seem to agree, it's not rapidly digestible. Abstractions should not be added before they are needed. More code means more bugs, more mental effort, and more maintenance. Now we have the problem @shelhamer has run into, we have dev apparently broken as described in #1319, which is a partial fix, and some concerning comments such as #1316 (comment). I'm all for moving fast in dev, but I don't see any reason to merge code until (1) it is both necessary and sufficient to provide some benefit over what came before, and (2) we are confident that it doesn't break things. Code doesn't have to be merged to exist; we have PRs, branches, and forks as well. Of course, this is no fault to @kmatzen, who has clearly put a lot of good thought into how this should work. But I think discussion deserves to be superlinear in number of lines of code added, and I think there is no reason to merge code before having confidence in its correctness. Now, we could attempt to revert dev to a known good state, or we could dig our way out by fixes such as #1319 and some culling as suggested by @shelhamer. Opinions are welcome, but I don't think @shelhamer or I have time to do a lot of hacking on this, so I think my hope is that we can fix the current issues and simplify the interface with a few small, obviously correct PRs. |
|
I agree with what's been said here. My objective is not to push code upstream quickly. My objective is to receive a high quality code review. It would be nice in the future to receive more feedback such as "I don't understand this, please provide more comments", "you're missing test coverage for this one area", or even "this design is bad, let's discuss a better one" before it is merged. |
|
Sorry for pushing quickly, @kmatzen could you do a simplified version which only have key=string and value=Datum. If not, I will work on that tomorrow |
|
Sure. Do you want me to remove the instantiations for the other value types or do you want me to remove the templating entirely? |
|
I think for starting it would be better to remove the whole template, and keep it simple, just abstracting the code was in DataLayer before. Let's use a DatasetParam to build the Dataset (which would be defined in caffe.proto) and contains all the needed parameters, like source, backend (or any other parameters specific to the backend).
To simplify even further we can only allow reads in ReadMode and writes in WriteMode, that way the transactions don't get mixed. |
|
Sounds good everybody. While we're revising how about DataSource instead of
|
|
The confidence of the correctness of PRs should be proportional to the test coverage. #1177 aimed reduce the human efforts needed to figure it out. |
leveldb/lmdb refactoring
|
@shelhamer I got the similar error,but it occurs all of sudden。i swear i did not change anything。 F1011 17:58:04.521750 10269 db_lmdb.hpp:15] Check failed: mdb_status == 0 (-30790 vs. 0) MDB_READERS_FULL: Environment maxreaders limit reached |
This refactoring places leveldb and lmdb access behind a common interface. The interface supports:
It would be nice to use this interface to implement an even simpler database where binary blobs are written sequentially to file and then mmapped in during training. It's not clear why leveldb and lmdb are used when all that's being done is a sequential scan and I'd like to have timing measurements to learn what is best.
There are also a few bug fixes scattered throughout this pull request. io.hpp allocates too little memory for a strcpy, there's a delete/delete[] mismatch, opencv 3.0 alpha doesn't work as-is, and more.See #1261.Existing tests pass, but none of the tools are tested as far as I can tell. This patch might also break any special leveldb or lmdb configurations. There wasn't an explanation as to why some flags were chosen, so I copied and pasted flags blindly.