leveldb/lmdb refactoring #1238

kmatzen · 2014-10-07T22:37:18Z

This refactoring places leveldb and lmdb access behind a common interface. The interface supports:

open
close
put
sequential scan

It would be nice to use this interface to implement an even simpler database where binary blobs are written sequentially to file and then mmapped in during training. It's not clear why leveldb and lmdb are used when all that's being done is a sequential scan and I'd like to have timing measurements to learn what is best.

There are also a few bug fixes scattered throughout this pull request. io.hpp allocates too little memory for a strcpy, there's a delete/delete[] mismatch, opencv 3.0 alpha doesn't work as-is, and more. See #1261.

Existing tests pass, but none of the tools are tested as far as I can tell. This patch might also break any special leveldb or lmdb configurations. There wasn't an explanation as to why some flags were chosen, so I copied and pasted flags blindly.

shelhamer · 2014-10-08T17:02:08Z

Nice refactoring! Can you switch the cifar10 data to lmdb while you're at it since Caffe is shifting to lmdb defaults everywhere? #1128 #1131.

shelhamer · 2014-10-10T06:22:36Z

@kmatzen please make another PR of your misc. fixes in the first commit. Regarding OpenCV, please take a look at #1247 and compare.

kmatzen · 2014-10-10T23:48:56Z

@shelhamer Sure, no problem. #1261 has the OpenCV 3 and other minor fixes. I updated create_cifar10.sh to use lmdb, but I did not touch other build scripts.

shelhamer · 2014-10-11T06:53:23Z

@kmatzen please squash the lint commit and drop c4b632a now that it's a separate PR so this'll be clear to merge.

sguada · 2014-10-11T21:19:19Z

@kmatzen I think this refactoring with #1239 could allow to include ImageDataLayer as another type of DB, which in reality doesn't store any DB, just read the needed files when requested.

sguada · 2014-10-12T05:59:21Z

tools/extract_features.cpp

One should be able to specify the type of database, instead of assume "leveldb"

Done.
https://github.com/kmatzen/caffe/commit/9edaa6fe44f61e2349dc31352eb6ad123bee9cc1

sguada · 2014-10-12T06:17:17Z

Overall a nice PR, @kmatzen please address the comments above before merging.

kmatzen · 2014-10-12T21:06:10Z

Just a few more comments about the interface that I don't quite like.

(1) If there a failure anywhere (e.g. commit when the db was open in RO mode), the whole thing just fails on a CHECK*. It would be nice to replace that behavior with either exceptions or error codes that the caller is then responsible for handling (e.g. LevelDB's Status). I don't really see either in Caffe's existing codebase. What's the accepted method?
(2) The key and value for put and the key for get are all passed by non-const pointer. I did this since LMDB's MDB_val.mv_data is of type char*, not const char*. I'd rather pass these by const reference. Any recommendations? I could pass by const reference and then make a copy inside put or get for LMDB. LevelDB is fine since Slice takes const char*.
(3) The iterator invalidation behavior is going to lead to bugs if used incorrectly. LevelDB iterators must be deallocated before the database is deallocated. If you create an iterator, then it means that the database must own and track that iterator, but creating a const_iterator usually implies that you are not mutating the collection or the collection itself is const. Maybe add a mutable vector<shared_ptr<leveldb::Iterator> > collection to the LeveldbDatabase and store a weak_ptr<leveldb::Iterator> in the LeveldbState?

sguada · 2014-10-12T22:48:45Z

@kmatzen thanks for your updates. Regarding your questions:

Not sure about the CHECK_, in general the CHECK should happen when it is crucial for a valid execution. If the DB cannot be open, it doesn't make sense to continue, however when checking if a key exists maybe it makes sense to not fail and return false instead.
I think the arguments to put and get should be consistent with our standard, const &buffer_t for inputs and buffer_t_for outputs. Maybeputandget could return a bool to indicate if they succeed or fail.
Not sure about this, I don't fully follow the logic behind that, but maybe adding a check before deallocating the database could be enough.

One more thing, I'm not sure if buffer_t is the best name for the type, since they are vector<char>, what about vector_char, or even better having key_type and value_type which in this case both are vector<char>.

Would be possible to get a vector with all the keys, by calling vector<key_type> keys()

kmatzen · 2014-10-13T01:45:56Z

Referencing previous comment.

(1) Done. https://github.com/kmatzen/caffe/commit/175192eb8a522fd0d7b2eb88b646abeb646d4e00
(2) Done. https://github.com/kmatzen/caffe/commit/6df63648b984f49747d1bebdd14d9c971ef9ffcf
(3) Done. https://github.com/kmatzen/caffe/commit/ea3813136b90ef693c7ebcca6a6f2eea8b1438ab

buffer_t renaming: Done. https://github.com/kmatzen/caffe/commit/aee2b9543a8648c573b2b839a922634e8f17bcda

keys(): Done. https://github.com/kmatzen/caffe/commit/66316564204222d2ef63bb54c808c645c96d8886

kloudkl · 2014-10-13T09:21:35Z

This PR makes it possible to simplify #1074 and provide a simple solution to #1155.

Edit: I just read the code. #1155 has been solved.

kloudkl · 2014-10-13T10:46:32Z

This a very good example to prove the power of "programing to an interface, not an implementation".

kloudkl · 2014-10-13T11:39:29Z

The tests for different db implmentations are almost the same. They can reuse the common parts.

kloudkl · 2014-10-13T12:01:38Z

include/caffe/database.hpp

What's the difference between vector<char> and string? It seems that you are following the conventions of the standard template library. Why not make Database more generic?

template<typename KeyType, typename ValueType> class Database { ... };

Done. https://github.com/kmatzen/caffe/commit/8d4c9743f71f1a4f23af383135405c3b8f7f4356

kloudkl · 2014-10-13T12:10:00Z

Please pay attention to properly ordering the keys as described in #1158.

kloudkl · 2014-10-13T12:33:23Z

Besides the extract_features tool, there are also a few other tools that directly use the raw data store.

kloudkl · 2014-10-13T14:20:13Z

#1266 tries to get input data from a remote service which is not a database. A more generic abstraction is better called Dataset.

…finable KCoder and VCoder which default to a set of DefaultCoder's based on types K and V. Reworked the DefaultCoder's such that if none are available, a static assertion fails with a relevant message.

kmatzen · 2014-10-14T23:49:11Z

Rebased

sguada · 2014-10-15T16:50:48Z

@kmatzen is this PR ready to be reviewed again and merged?

kmatzen · 2014-10-15T17:16:53Z

Yes, go for it.

leveldb/lmdb refactoring

shelhamer · 2014-10-18T04:14:55Z

While this PR has unified the leveldb and lmdb interfaces the joint interface has yielded a net addition of 1,000+ lines of code while introducing new classes. In my opinion it's desirable to cut this back and simplify as I'm having a hard time reasoning about the DB operations now -- although I admit that could be my own failing.

My concern is sharpened by the LMDB crash I am now encountering with my reshaping data layer #1313

F1017 02:29:37.982517 28564 lmdb_dataset.cpp:274] Check failed: 0 == retval (0 vs. -30790) mdb_txn_begin failed MDB_READERS_FULL: Environment maxreaders limit reached

although all it does is use iter_->value. This could be my fault in mis-use of the interface, but I'm troubled by the amount of effort I am now spending to figure this out.

sguada · 2014-10-18T04:38:51Z

@shelhamer I also spend quite a bit of time trying to understand it. I think we need more test for this, like for instance I had a similar error while doing first_key() and last_key(), which I only discovered because I had a test where I asked for both.

Maybe the deserializer with sharing data to avoid copies, could be introducing some undesired side effects.

@shelhamer do you encounter the same error when using LEVELDB?

kmatzen · 2014-10-18T06:58:27Z

See #1319 and let me know if that helps.

longjon · 2014-10-18T10:33:04Z

With @shelhamer, I am concerned about the haste with which we are merging code here and in #1239.

I had hoped this PR would reduce the duplication and overhead of working with the databases, resulting in a mildly negative net number of lines of code added. Instead we are up 1400. That's too much for me to give a full and coherent analysis of right now, but here are a few concerns that come to mind on a first read:

We have type parameters for KCoder and VCoder, but it seems there is only one coder, and it's unclear why this abstraction is needed. (As far as I can tell, Allow using encoded images in Datum, LevelDB, LMDB #1239 doesn't use it.)
We have DEFAULT_CODER_NOT_AVAILABLE, which I guess is some kind of template metaprogramming trick to tell us something the compiler already knows when sizeof(T) == 0, although that is not the complement of the cases for which the DefaultCoder is available...
As far as I know, we always store data as Datum, but now we have string and vector<char> options as well, at the cost of a bunch more code. Is there an existing use case for this?
Do we really need a std::iterator for datasets? I'm not opposed to making good use of STL, but this seems chiefly to introduce a bunch of boilerplate...

I'm sure there may be some good reasons for the above that I've missed, but, as @shelhamer and @sguada seem to agree, it's not rapidly digestible. Abstractions should not be added before they are needed. More code means more bugs, more mental effort, and more maintenance.

Now we have the problem @shelhamer has run into, we have dev apparently broken as described in #1319, which is a partial fix, and some concerning comments such as #1316 (comment).

I'm all for moving fast in dev, but I don't see any reason to merge code until (1) it is both necessary and sufficient to provide some benefit over what came before, and (2) we are confident that it doesn't break things. Code doesn't have to be merged to exist; we have PRs, branches, and forks as well.

Of course, this is no fault to @kmatzen, who has clearly put a lot of good thought into how this should work. But I think discussion deserves to be superlinear in number of lines of code added, and I think there is no reason to merge code before having confidence in its correctness.

Now, we could attempt to revert dev to a known good state, or we could dig our way out by fixes such as #1319 and some culling as suggested by @shelhamer. Opinions are welcome, but I don't think @shelhamer or I have time to do a lot of hacking on this, so I think my hope is that we can fix the current issues and simplify the interface with a few small, obviously correct PRs.

kmatzen · 2014-10-18T17:34:21Z

I agree with what's been said here. My objective is not to push code upstream quickly. My objective is to receive a high quality code review. It would be nice in the future to receive more feedback such as "I don't understand this, please provide more comments", "you're missing test coverage for this one area", or even "this design is bad, let's discuss a better one" before it is merged.

sguada · 2014-10-18T21:26:51Z

Sorry for pushing quickly, @kmatzen could you do a simplified version which only have key=string and value=Datum. If not, I will work on that tomorrow

kmatzen · 2014-10-18T21:29:14Z

Sure. Do you want me to remove the instantiations for the other value types or do you want me to remove the templating entirely?

sguada · 2014-10-18T23:26:44Z

I think for starting it would be better to remove the whole template, and keep it simple, just abstracting the code was in DataLayer before.

Let's use a DatasetParam to build the Dataset (which would be defined in caffe.proto) and contains all the needed parameters, like source, backend (or any other parameters specific to the backend).
The basic functionality should be:

new Dataset(DatasetParam)
open(Mode)
close()
get(key, value)
put(key, value)
commit()
And a way to iterate either through next() or iterators.

To simplify even further we can only allow reads in ReadMode and writes in WriteMode, that way the transactions don't get mixed.

longjon · 2014-10-18T23:39:41Z

@kmatzen Yep, we appreciate your contributions, and sorry to be giving feedback post-hoc, that's our fault. (Personally I did not look at this until it was brought to my attention to due to being in deadline mode.)

Thanks for taking action on this guys; @sguada's plan looks good to me.

shelhamer · 2014-10-19T00:29:25Z

Sounds good everybody. While we're revising how about DataSource instead of
DataSet, since it is really about a format / container and not a given set
of data?
On Saturday, October 18, 2014, longjon notifications@github.com wrote:

@kmatzen https://github.com/kmatzen Yep, we appreciate your
contributions, and sorry to be giving feedback post-hoc, that's our fault.
(Personally I did not look at this until it was brought to my attention to
due to being in deadline mode.)

Thanks for taking action on this guys; @sguada https://github.com/sguada's
plan looks good to me.

—
Reply to this email directly or view it on GitHub
#1238 (comment).

futurely · 2014-10-21T10:56:51Z

The confidence of the correctness of PRs should be proportional to the test coverage. #1177 aimed reduce the human efforts needed to figure it out.

leveldb/lmdb refactoring

KangolHsu · 2017-10-11T10:08:26Z

@shelhamer I got the similar error，but it occurs all of sudden。i swear i did not change anything。

F1011 17:58:04.521750 10269 db_lmdb.hpp:15] Check failed: mdb_status == 0 (-30790 vs. 0) MDB_READERS_FULL: Environment maxreaders limit reached

shelhamer force-pushed the dev branch from d8eb4df to 914da95 Compare October 8, 2014 16:36

shelhamer added interface enhancement labels Oct 8, 2014

shelhamer mentioned this pull request Oct 10, 2014

Allow using encoded images in Datum, LevelDB, LMDB #1239

Merged

shelhamer assigned sguada Oct 11, 2014

sguada reviewed Oct 12, 2014
View reviewed changes

sguada mentioned this pull request Oct 12, 2014

Proto data layer #1266

Closed

kloudkl mentioned this pull request Oct 13, 2014

Requesting lmdb support in extract_features.cpp #1155

Closed

kloudkl reviewed Oct 13, 2014
View reviewed changes

This was referenced Oct 13, 2014

Enable the users to disable optional dependencies #1074

Closed

Add support for RocksDB which is open sourced by Facebook #817

Closed

Kevin James Matzen added 3 commits October 14, 2014 19:40

Renamed Database interface to Dataset.

d275f77

Had forgotten to set some of the Dataset test cases to LMDB backend.

6fd72e1

Reworked the Coder interface such that a Dataset now has both user-de…

ef518dc

…finable KCoder and VCoder which default to a set of DefaultCoder's based on types K and V. Reworked the DefaultCoder's such that if none are available, a static assertion fails with a relevant message.

sguada mentioned this pull request Oct 15, 2014

Added first_key and last_key to dataset #1288

Merged

sguada added a commit that referenced this pull request Oct 15, 2014

Merge pull request #1238 from kmatzen/db

a23c9bf

leveldb/lmdb refactoring

sguada merged commit a23c9bf into BVLC:dev Oct 15, 2014

shelhamer mentioned this pull request Oct 18, 2014

Memory leak in InternalThread::StartInternalThread() #1316

Closed

seanbell mentioned this pull request Oct 19, 2014

LMDB read-only transaction limit fix #1319

Merged

sguada mentioned this pull request Oct 31, 2014

Caffe memory increases with time(iterations?) #1377

Closed

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014

Merge pull request BVLC#1238 from kmatzen/db

b0c5905

leveldb/lmdb refactoring

shelhamer mentioned this pull request Nov 11, 2014

training is freezing for multiple hours #1412

Closed

sguada mentioned this pull request Nov 28, 2014

Indirection layer #1414

Closed

sguada mentioned this pull request Dec 12, 2014

Datum db #1568

Closed

shelhamer mentioned this pull request Jan 16, 2015

[bug report] When extracting multiple features with extract_features the first batch contains only features of the last blob #1192

Closed

longjon mentioned this pull request Jan 19, 2015

Simple database wrappers #1748

Merged

leveldb/lmdb refactoring #1238

leveldb/lmdb refactoring #1238

Uh oh!

Conversation

kmatzen commented Oct 7, 2014

Uh oh!

shelhamer commented Oct 8, 2014

Uh oh!

shelhamer commented Oct 10, 2014

Uh oh!

kmatzen commented Oct 10, 2014

Uh oh!

shelhamer commented Oct 11, 2014

Uh oh!

sguada commented Oct 11, 2014

Uh oh!

sguada Oct 12, 2014

Choose a reason for hiding this comment

Uh oh!

kmatzen Oct 12, 2014

Choose a reason for hiding this comment

Uh oh!

sguada commented Oct 12, 2014

Uh oh!

kmatzen commented Oct 12, 2014

Uh oh!

sguada commented Oct 12, 2014

Uh oh!

kmatzen commented Oct 13, 2014

Uh oh!

kloudkl commented Oct 13, 2014

Uh oh!

kloudkl commented Oct 13, 2014

Uh oh!

kloudkl commented Oct 13, 2014

Uh oh!

kloudkl Oct 13, 2014

Choose a reason for hiding this comment

Uh oh!

kmatzen Oct 13, 2014

Choose a reason for hiding this comment

Uh oh!

kloudkl commented Oct 13, 2014

Uh oh!

kloudkl commented Oct 13, 2014

Uh oh!

kloudkl commented Oct 13, 2014

Uh oh!

kmatzen commented Oct 14, 2014

Uh oh!

sguada commented Oct 15, 2014

Uh oh!

kmatzen commented Oct 15, 2014

Uh oh!

shelhamer commented Oct 18, 2014

Uh oh!

sguada commented Oct 18, 2014

Uh oh!

kmatzen commented Oct 18, 2014

Uh oh!

longjon commented Oct 18, 2014

Uh oh!

kmatzen commented Oct 18, 2014

Uh oh!

sguada commented Oct 18, 2014

Uh oh!

kmatzen commented Oct 18, 2014

Uh oh!

sguada commented Oct 18, 2014

Uh oh!

longjon commented Oct 18, 2014

Uh oh!

shelhamer commented Oct 19, 2014

Uh oh!

futurely commented Oct 21, 2014

Uh oh!

KangolHsu commented Oct 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

KangolHsu commented Oct 11, 2017 •

edited

Loading