storage: use consistent iterators when needed, and when possible#58515
storage: use consistent iterators when needed, and when possible#58515craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
nvb
left a comment
There was a problem hiding this comment.
Reviewed 4 of 7 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/engine.go, line 277 at r1 (raw file):
// TODO(sumeer): this partial consistency can be a source of bugs if future // code starts relying on it, but rarely uses a Reader that does not guarantee // it. Can we enumerate the current cases where KV uses Engine as a Reader?
Is Engine the only implementation that does not guarantee a consistent snapshot? It might be worth adding a ConsistentReads() bool method to this interface, if only to force each implementation to consider the question and to serve as documentation.
pkg/storage/pebble.go, line 1204 at r1 (raw file):
// iterators that make separated locks/intents look as interleaved need to // use both simultaneously. // When the first iterator is initialized, the underlying *pebble.Iterator
So this is still grabbing the snapshot lazily? That will make an issue like #55461 difficult to address in its entirety because it means that we still have less control than we'd like of exactly when the snapshot it captured. How do we feel about pebbleReadOnly grabbing an iterator immediately upon creation?
pkg/storage/pebble.go, line 1210 at r1 (raw file):
// which causes pebbleIterator.Close to not close iter. iter will be closed // when pebbleReadOnly.Close is called. prefixIter pebbleIterator
Not directly related to this PR, but I remember @petermattis mentioning that Pebble would give us the ability to switch an iterator into and out of prefix mode without needing to create two separate iterators. Is that something we'd like to pursue?
pkg/storage/pebble_iterator.go, line 88 at r1 (raw file):
// there are no timestamp hints, else it is created using handle. func (p *pebbleIterator) init( handle pebble.Reader, iterToClone *pebble.Iterator, opts IterOptions,
To prevent abuse and to make the intention abundantly clear in this param and in places where we hold on to this, we might want to wrap this iterator in some slim interface that only allows for cloning.
e4779bb to
245a009
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/storage/engine.go, line 277 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is Engine the only implementation that does not guarantee a consistent snapshot? It might be worth adding a
ConsistentReads() boolmethod to this interface, if only to force each implementation to consider the question and to serve as documentation.
Yes, Engine is the only implementation that does not make that guarantee.
I've added ConsistentIterators.
Can you shed any light on this question about what parts of KV use Engine directly?
pkg/storage/pebble.go, line 1204 at r1 (raw file):
So this is still grabbing the snapshot lazily?
Yes
That will make an issue like #55461 difficult to address in its entirety because it means that we still have less control than we'd like of exactly when the snapshot it captured. How do we feel about pebbleReadOnly grabbing an iterator immediately upon creation?
I've added a TODO here. I am worried about introducing more semantic differences among different Reader implementations. I would like to strengthen the calling-side first so we have a clearer understanding of when different Reader implementations are being used. Once we have that, this would be a trivial change. What do you think?
Also, do we need this eager creation for pebbleBatch too, as indicated in that issue? Unlike pebbleReadOnly, I am guessing there are clients of pebbleBatch who never create an iterator, so creating one eagerly may be wasteful -- are there any such cases?
pkg/storage/pebble.go, line 1210 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Not directly related to this PR, but I remember @petermattis mentioning that Pebble would give us the ability to switch an iterator into and out of prefix mode without needing to create two separate iterators. Is that something we'd like to pursue?
Do we never simultaneously need a prefix and non-prefix iterator?
If so, I can add a TODO here to remind us to improve this, since Pebble already supports it.
pkg/storage/pebble_iterator.go, line 88 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
To prevent abuse and to make the intention abundantly clear in this param and in places where we hold on to this, we might want to wrap this iterator in some slim interface that only allows for cloning.
Done
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/pebble.go, line 1210 at r1 (raw file):
Do we never simultaneously need a prefix and non-prefix iterator?
I have a recollection that we do, though perhaps only in tests. When we initially introduced Pebble we tried to use a single iterator and switch it back and forth between prefix and non-prefix modes. Something broke, but I don't recall offhand what it was. You might be able to look back at old PRs and see if there is any commentary on the breakage.
|
pkg/storage/pebble.go, line 1210 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
My memory is also hazy from late 2019 when this came up, but we did go with the single-iter path at first and came across one code path where we quickly spin up a prefixIter while having an open normal iter on a |
nvb
left a comment
There was a problem hiding this comment.
Reviewed 1 of 7 files at r1, 7 of 7 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @petermattis, and @sumeerbhola)
pkg/storage/engine.go, line 277 at r1 (raw file):
I've added ConsistentIterators.
You might want to mention in the comment that it is meant for documentation purposes, just so future readers don't think it's used for more than it is.
There are a number of users of Engine, but there should be none in the request evaluation path. From what I'm aware, it's mostly used for server-level tasks, which access it through Store.Engine and don't need consistency across multiple iterators.
pkg/storage/pebble.go, line 1204 at r1 (raw file):
I would like to strengthen the calling-side first so we have a clearer understanding of when different Reader implementations are being used.
By that, do you mean create an extension to the Reader interface (i.e. ConsistentReader) that some callers expect so that we can capture this property in the type system? If so, I'm on board with that.
Also, do we need this eager creation for pebbleBatch too, as indicated in that issue? Unlike pebbleReadOnly, I am guessing there are clients of pebbleBatch who never create an iterator, so creating one eagerly may be wasteful -- are there any such cases?
The only places where that might be the case are when a request results in a call to MVCCBlindPut/MVCCBlindConditionalPut/MVCCBlindInitPut, but it's not clear to me that even in those cases, the pebbleBatch is never read from. It certainly is for transactional batches (see checkIfTxnAborted) and I think it may also be read from optimizePuts` even for non-transactional batches.
pkg/storage/pebble.go, line 1210 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
My memory is also hazy from late 2019 when this came up, but we did go with the single-iter path at first and came across one code path where we quickly spin up a prefixIter while having an open normal iter on a
pebbleReadOnly. One of the unit tests surfaced that, but I don't remember which one it was, and I can't find the corresponding PR.
It sounds like we should open a TODO to track this then. Bilal, do you have thoughts about how we could try to surface the issue again so we can get a better feel for what the blocker is? Or how we could hunt down that old PR?
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @sumeerbhola)
pkg/storage/pebble.go, line 1210 at r1 (raw file):
It sounds like we should open a TODO to track this then. Bilal, do you have thoughts about how we could try to surface the issue again so we can get a better feel for what the blocker is? Or how we could hunt down that old PR?
Agreed. Part of the intention behind Pebble's SeekPrefixGE API was being able to switch an iterator from normal to prefix iteration and back. I'm hopeful we can get down to a single iterator. If we can't for some reason, it would be good to know why.
itsbilal
left a comment
There was a problem hiding this comment.
Reviewed 2 of 7 files at r1, 7 of 7 files at r2.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/storage/pebble.go, line 1210 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
It sounds like we should open a TODO to track this then. Bilal, do you have thoughts about how we could try to surface the issue again so we can get a better feel for what the blocker is? Or how we could hunt down that old PR?
Agreed. Part of the intention behind Pebble's
SeekPrefixGEAPI was being able to switch an iterator from normal to prefix iteration and back. I'm hopeful we can get down to a single iterator. If we can't for some reason, it would be good to know why.
I can take this on as a follow-on task. Finding the test case could be as simple as panicking on the second concurrent iterator opening, and running the test suite. Let me try that and report back.
For pebbleBatch and pebbleReadOnly, all iterators without timestamp hints see the same underlying engine state, with no interface change for the caller. This is done by cloning the first created pebble.Iterator. intentInterleavingIter explicitly requests a clone, when it creates two iterators, which ensures the consistency guarantee applies to all Reader implementations. Informs cockroachdb#55461 Informs cockroachdb#41720 Release note: None
245a009 to
49d259b
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
TFTRs!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/storage/engine.go, line 277 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I've added ConsistentIterators.
You might want to mention in the comment that it is meant for documentation purposes, just so future readers don't think it's used for more than it is.
There are a number of users of
Engine, but there should be none in the request evaluation path. From what I'm aware, it's mostly used for server-level tasks, which access it throughStore.Engineand don't need consistency across multiple iterators.
I've added panics in a couple of places in replica_read.go and replica_write.go so that this is more than documentation.
pkg/storage/pebble.go, line 1204 at r1 (raw file):
By that, do you mean create an extension to the Reader interface (i.e. ConsistentReader) that some callers expect so that we can capture this property in the type system? If so, I'm on board with that.
Something like that. I've added a link to this discussion to the TODO so that I don't forget.
pkg/storage/pebble.go, line 1210 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I can take this on as a follow-on task. Finding the test case could be as simple as panicking on the second concurrent iterator opening, and running the test suite. Let me try that and report back.
Thanks Bilal!
|
bors r+ |
nvb
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
|
Build succeeded: |
For pebbleBatch and pebbleReadOnly, all iterators without timestamp
hints see the same underlying engine state, with no interface
change for the caller. This is done by cloning the first created
pebble.Iterator.
intentInterleavingIter explicitly requests a clone, when it creates
two iterators, which ensures the consistency guarantee applies to
all Reader implementations.
Informs #55461
Informs #41720
Release note: None