feat(storage): Add PrefixDB to pkg internal/storage#4562
feat(storage): Add PrefixDB to pkg internal/storage#4562alesforz merged 29 commits intofeature/remove-cometbftdbfrom
PrefixDB to pkg internal/storage#4562Conversation
|
|
||
| if !source.Valid() || !bytes.HasPrefix(source.Key(), prefix) { | ||
| return itInvalid, nil | ||
| } |
There was a problem hiding this comment.
Why do we create this itInvalid object a few lines above? Since it's an invalid iterator, any method call on it will cause a panic, thus making it useless. Why not return an error here instead?
There was a problem hiding this comment.
I think we should return an error
There was a problem hiding this comment.
After some testing, I now understand why itInvalid is created.
The current design allows the creation of an Iterator with the key range [nil:nil], which we interpret as "from the first to the last key in the database." However, there is a corner case: creating an Iterator with the range [nil:some_key] when there is only one key in the database (or only one key with the given prefix, i.e., prefix+some_key). Because the iterator's upper bound is exclusive, this results in an empty key range. There might be other corner cases that I haven't found, though.
In the case of Pebble (and possibly other databases), such a scenario leads to an invalid iterator state. According to Pebble's documentation, this situation produces an iterator that "has a non-exhausted internalIterator, but has reached a limit without any key for the caller." In other words, a call to Valid() will fail, and therefore the creation of the Iterator itself will fail.
I think cometbft’s code depends on newPrefixDBIterator returning an invalid iterator rather than an error. For example, in the evidence package, returning an invalid iterator that returns false on the first call to iterator.Valid() allows the related functions to proceed normally. They end up creating an empty Evidence slice, which cometbft interprets as "no evidence to broadcast." If we returned an error instead, these functions would fail, breaking the process.
This design is not ideal, but changing it would likely require revisiting a large portion of cometbft’s codebase.
| return | ||
| } | ||
|
|
||
| if bytes.Equal(it.source.Key(), it.prefix) { |
There was a problem hiding this comment.
Is the purpose of these if statements to double-check that the current key of the wrapped Iterator in prefixDBIterator is correct?
We already call Iterator.Valid on the wrapped iterator to check the validity of the key. Assuming a prefixDBIterator is correctly initialized with the key range [prefix|start_key : prefix|end_key], in what scenarios are these additional checks needed?
For example, consider our implementation of the Pebble iterator pebbleDBIterator. Its Valid() method ensures that the iterator's current key is between start and end. Now, suppose we create a prefixDBIterator wrapping a pebbleDBIterator with prefix foo, start key a, and end key f. The key range of this iterator will be [fooa : fooe] (since the end is exclusive, foof won't be included). If I understand correctly, a call to pebbleDBIterator.Valid() should be sufficient to verify that the current key is valid, as any key outside [fooa : fooe] will be considered invalid.
Therefore, it seems that checking the prefix again in the prefixDBIterator methods might be unnecessary.
Returning to my initial question: Are we performing these additional checks to guard against third-party Iterator implementations that might not strictly enforce key validity and could return keys we should consider invalid? If not, what am I missing?
There was a problem hiding this comment.
Agree that it's not needed.
There was a problem hiding this comment.
After some thinking, we have decided to leave these checks in place. It is very likely that the reason they are there is to guard against third-party Iterator implementations that might not strictly enforce key validity and could return keys we should consider invalid. Changing this requires some in-depth testing to make sure we aren't screwing up validation checks on the keys. We, unfortunately, don't have enough time to do it at the moment.
|
There might be a subtle bug in the validity check of for ; itr.Valid(); itr.Next() {
key := itr.Key()
value := itr.Value()
// do something...
}panics unexpectedly. After moving the iterator to the next key (which is beyond the last key of the iterator's range) we call Therefore, I think we need to make sure that In general, I think we should revisit the implementation of |
The Iterator would panic after reaching the last key of its range. The bug was caused by a naive refactoring of the `Valid` and `Next` methods of `PrefixDB`.
Found and solved the problem. It was caused by a refactoring of the
is something we should consider. |
|
Are there any particular reasons why for |
This will maintain compatibility with cometbft-db.
|
|
||
| if !source.Valid() || !bytes.HasPrefix(source.Key(), prefix) { | ||
| return itInvalid, nil | ||
| } |
There was a problem hiding this comment.
I think we should return an error
| return | ||
| } | ||
|
|
||
| if bytes.Equal(it.source.Key(), it.prefix) { |
There was a problem hiding this comment.
Agree that it's not needed.
| if err != nil { | ||
| return nil, fmt.Errorf("prefixed DB namespace get: %w", err) | ||
| } | ||
| return value, nil |
There was a problem hiding this comment.
I think we should rethink the policy of ignoring the case when the key is not found. We'll discuss synchronously, but as we are now using only one DB we can check whether this still makes sense.
There was a problem hiding this comment.
We are already ignoring that case. If pebble returns a pebble.ErrNotFound error, we ignore it and return an empty slice. CometBFT's code always checks if a len(returnedSlice) == 0 to see if the key was found.
It is sufficient to rely on the wrapped database's concurrency control mechanism. That is, concurrent calls to PrefixDB's methods will already "lock" on the calls to the wrapped DB methods. Additionally, pebble has conconcurrency control mechanisms of its own.
…end`. The previous code did not do so, and that was an oversight.
…s invalid. Previously, it returne an invalid `prefixDBIterator`, that is, an iterator that would panic on the next method call, thus making it useless.
…rator` is invalid." This reverts commit 9fb1647.
Apparently an oversight, changed the receiver to a pointer. |
Partially addresses #4486.
Context
CometBFT will stop importing cometbft-db as a dependency to support multiple database backends. Instead, it will use pebble by default.
Changes
This PR adds type
PrefixDBtointernal/storagefrom cometbft-db.PrefixDBis a database wrapper that implements theDB,Batch, andIteratorinterfaces.Note: this PR makes some changes to the implementation of
PrefixDB. Namely:PrefixDBno longer uses a mutex, because we think that it is sufficient to rely on the wrapped database's concurrency control mechanism. That is, concurrent calls toPrefixDB's methods will already "lock" on the calls to the wrapped DB methods.NewPrefixDBnow returns an error if the given prefix is empty. We think allowing the creation of aPrefixDBwith no prefix was an oversight in the design.prefixDBBatchmethods now want a pointer receiver.PrefixDB.Compactnow prepends the prefix tostartandendkeys defining the iterator's range. The previous implementation did not do so, and this was likely an oversight. We believed this behavior contradicted the definition of aPrefixDBas a logical database that prepends the prefix to all keys before operating on the underlying database.PR checklist
[ ] Changelog entry added in.changelog(we use unclog to manage our changelog)docs/orspec/) and code comments