storage: Implement teeing Engine to test/compare pebble and rocksdb#42494
storage: Implement teeing Engine to test/compare pebble and rocksdb#42494craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
Some aspects of this are more open to discussion:
Also note that not all |
petermattis
left a comment
There was a problem hiding this comment.
I've only partially reviewed this, but it won't be until later tonight or tomorrow until I can finish. Generally looks like what I was imagining. A few quick comments to get you started. The additional checks can be left as TODOs.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, @petermattis, and @sumeerbhola)
pkg/storage/engine/tee_pebble_rocksdb.go, line 40 at r1 (raw file):
// NewTee creates a new instance of the TeePebbleRocksDB engine. func NewTee(rocksdb *RocksDB, pebble *Pebble) *TeePebbleRocksDB {
Does anything actually require that one engine is RocksDB and the other is Pebble? I'm wondering if this should be:
NewTee(a, b Engine) *Tee {
...
}
And then you'd rename TeePebbleRocksDB to Tee
pkg/storage/engine/tee_pebble_rocksdb.go, line 64 at r1 (raw file):
// Closed implements the Engine interface. func (t *TeePebbleRocksDB) Closed() bool { return t.pebble.Closed() && t.rocksDB.Closed()
Let's verify that values are the same.
pkg/storage/engine/tee_pebble_rocksdb.go, line 74 at r1 (raw file):
io IterOptions, ) ([]byte, roachpb.BulkOpSummary, error) { return t.pebble.ExportToSst(startKey, endKey, startTS, endTS, exportAllRevisions, io)
This should run ExportToSst on both engines and compare the result.
fc634d4 to
5a02690
Compare
itsbilal
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajkr, @petermattis, and @sumeerbhola)
pkg/storage/engine/tee_pebble_rocksdb.go, line 40 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Does anything actually require that one engine is RocksDB and the other is Pebble? I'm wondering if this should be:
NewTee(a, b Engine) *Tee { ... }And then you'd rename
TeePebbleRocksDBtoTee
In addition to always ordering pebble before rocksdb in any panic text, we also have to pass the specialized iterators down in ClearIterRange (since the code there expects the right specialized type of iterators to be passed in). There were other cases where we had to specialize, like in MVCCScan (where we could efficiently compare kvData if we knew rocks would only return one large buffer), but now that that function is no longer an Iterator method, that complexity is gone.
pkg/storage/engine/tee_pebble_rocksdb.go, line 64 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Let's verify that values are the same.
Done.
pkg/storage/engine/tee_pebble_rocksdb.go, line 74 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This should run
ExportToSston both engines and compare the result.
Done.
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, and @sumeerbhola)
pkg/storage/engine/tee_pebble_rocksdb.go, line 40 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
In addition to always ordering pebble before rocksdb in any panic text, we also have to pass the specialized iterators down in
ClearIterRange(since the code there expects the right specialized type of iterators to be passed in). There were other cases where we had to specialize, like in MVCCScan (where we could efficiently compare kvData if we knew rocks would only return one large buffer), but now that that function is no longer an Iterator method, that complexity is gone.
For the panic ordering, we could have a convention that the first engine is expected to be correct and the second engine is the one we're checking. That would allow you to display the panic text in the order you desire.
I looked at ClearIterRange and didn't see the specialization you were talking about. Yes, you have to pass the correct iterator to ClearIterRange, but that isn't too difficult.
pkg/storage/engine/tee_pebble_rocksdb.go, line 51 at r2 (raw file):
return err } else if err != nil || err2 != nil { panic(fmt.Sprintf("error mismatch between pebble and rocksdb: %v != %v", err, err2))
Rather than panic, I think this should log.Fatalf. The former can cause tests to tear down in weird ways, while the latter is guaranteed to exit cleanly. Same goes for all of the panics in this file.
pkg/storage/engine/tee_pebble_rocksdb.go, line 843 at r2 (raw file):
} var _ Iterator = &TeePebbleRocksDBIter{}
I think this needs to implement MVCCOpsSpecialized, MVCCGet, and MVCCScan. The pebble version would then call back into mvccGet and mvccScan.
pkg/storage/engine/tee_pebble_rocksdb.go, line 855 at r2 (raw file):
t.pebble.SeekGE(key) t.rocksdb.SeekGE(key) valid, _ := t.Valid()
It is somewhat subtle that this has a side-effect of checking that the validity of both iterators is equal. I think it might be more obvious if you added an explicit TeePebbleRocksDBIter.check method that you call after every positioning method. Then Valid, Unsafe{Key,Value}, etc, can simply pass through to the RocksDB iterator.
pkg/storage/engine/tee_pebble_rocksdb.go, line 961 at r2 (raw file):
start, end roachpb.Key, nowNanos int64, ) (enginepb.MVCCStats, error) { return t.pebble.ComputeStats(start, end, nowNanos)
This needs to compare the results with the RocksDB version.
pkg/storage/engine/tee_pebble_rocksdb.go, line 968 at r2 (raw file):
start, end, minSplitKey roachpb.Key, targetSize int64, ) (MVCCKey, error) { return t.pebble.FindSplitKey(start, end, minSplitKey, targetSize)
This needs to compare the results with the RocksDB version.
pkg/storage/engine/tee_pebble_rocksdb.go, line 975 at r2 (raw file):
sstData []byte, start, end roachpb.Key, ) (enginepb.MVCCStats, error) { return t.pebble.CheckForKeyCollisions(sstData, start, end)
This needs to compare the results with the RocksDB version.
5a02690 to
53246d9
Compare
itsbilal
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajkr, @petermattis, and @sumeerbhola)
pkg/storage/engine/tee_pebble_rocksdb.go, line 40 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
For the panic ordering, we could have a convention that the first engine is expected to be correct and the second engine is the one we're checking. That would allow you to display the panic text in the order you desire.
I looked at
ClearIterRangeand didn't see the specialization you were talking about. Yes, you have to pass the correct iterator toClearIterRange, but that isn't too difficult.
Good point. Though in light of the most recent change, and how MVCCScan / MVCCGet have to treat the two iterators differently, I assume we need to continue to explicitly distinguish the two iterators/engines/etc in the code. Let me know if you still have an opinion on this - we can move towards a more abstract MVCCScan (that calls mvccScanToBytes on both iterators and lets that function deal with the specialization) and so on. The panic / error text convention makes sense as well.
pkg/storage/engine/tee_pebble_rocksdb.go, line 51 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Rather than
panic, I think this shouldlog.Fatalf. The former can cause tests to tear down in weird ways, while the latter is guaranteed to exit cleanly. Same goes for all of thepanicsin this file.
Done. No more panics, save for one instance that is a different kind of should-never-happen error (rocksdb returning multiple slices in MVCCScan, the lack of which lets us simplify some equality code around there).
pkg/storage/engine/tee_pebble_rocksdb.go, line 843 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I think this needs to implement
MVCCOpsSpecialized,MVCCGet, andMVCCScan. The pebble version would then call back intomvccGetandmvccScan.
Done.
pkg/storage/engine/tee_pebble_rocksdb.go, line 855 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
It is somewhat subtle that this has a side-effect of checking that the validity of both iterators is equal. I think it might be more obvious if you added an explicit
TeePebbleRocksDBIter.checkmethod that you call after every positioning method. ThenValid,Unsafe{Key,Value}, etc, can simply pass through to the RocksDB iterator.
Done. Passing it to the pebble iterator in the read path instead, but presumably any inconsistencies would have been caught well before Key() etc get called.
pkg/storage/engine/tee_pebble_rocksdb.go, line 961 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This needs to compare the results with the RocksDB version.
Done.
pkg/storage/engine/tee_pebble_rocksdb.go, line 968 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This needs to compare the results with the RocksDB version.
Done.
pkg/storage/engine/tee_pebble_rocksdb.go, line 975 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This needs to compare the results with the RocksDB version.
Done.
53246d9 to
c1dffb0
Compare
petermattis
left a comment
There was a problem hiding this comment.
modulo my final comment about removing the hardcoding of
*Pebble and *RocksDB.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, and @sumeerbhola)
pkg/storage/engine/tee_pebble_rocksdb.go, line 40 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Good point. Though in light of the most recent change, and how
MVCCScan/MVCCGethave to treat the two iterators differently, I assume we need to continue to explicitly distinguish the two iterators/engines/etc in the code. Let me know if you still have an opinion on this - we can move towards a more abstractMVCCScan(that callsmvccScanToByteson both iterators and lets that function deal with the specialization) and so on. The panic / error text convention makes sense as well.
Yes, we definitely need to know which iterator goes to which engine. Perhaps you're misunderstanding my suggestion. I'm suggesting that you s/pebble/eng1/g, s/rocksDB/eng2 everywhere. You'll still track which iterator comes form eng1 and which comes from eng2. I'm not seeing what the difficulty is in making the handling of MVCCScan / MVCCGet symmetric for the two iterators. Looks like that "should just work", though it is possible I'm missing something and there is a small amount of refactoring to do.
pkg/storage/engine/tee_pebble_rocksdb.go, line 1023 at r3 (raw file):
) (*roachpb.Value, *roachpb.Intent, error) { pebbleValue, pebbleIntent, err := mvccGet(t.ctx, t.pebble, key, timestamp, opts) rocksValue, rocksIntent, err2 := t.rocksDB.(MVCCIterator).MVCCGet(key, timestamp, opts)
I believe this can be mvccGet(t.ctx, t.rocksdb, ...).
pkg/storage/engine/tee_pebble_rocksdb.go, line 1042 at r3 (raw file):
) (kvData [][]byte, numKVs int64, resumeSpan *roachpb.Span, intents []roachpb.Intent, err error) { pebbleKvData, pebbleNumKVs, pebbleResumeSpan, pebbleIntents, err := mvccScanToBytes(t.ctx, t.pebble, start, end, max, timestamp, opts) rocksKvData, rocksNumKVs, rocksResumeSpan, rocksIntents, err2 := t.rocksDB.(MVCCIterator).MVCCScan(start, end, max, timestamp, opts)
I believe this can be mvccScanToBytes(t.ctx, t.rocksdb, ...).
c1dffb0 to
e196e37
Compare
itsbilal
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @petermattis, and @sumeerbhola)
pkg/storage/engine/tee_pebble_rocksdb.go, line 40 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Yes, we definitely need to know which iterator goes to which engine. Perhaps you're misunderstanding my suggestion. I'm suggesting that you
s/pebble/eng1/g,s/rocksDB/eng2everywhere. You'll still track which iterator comes formeng1and which comes fromeng2. I'm not seeing what the difficulty is in making the handling ofMVCCScan / MVCCGetsymmetric for the two iterators. Looks like that "should just work", though it is possible I'm missing something and there is a small amount of refactoring to do.
Yep, I was understanding it - just wanted to confirm if you still wanted to do it in light of TeeTestEngine.MVCCScan relying on the assumption that one side will return a slice with only one slice inside it. Either way, I've done it, and made some interesting discoveries/compromises along the way:
rocksdb.GetStats()andPreIngestDelay()segfaults inTestDBAddSSTable/store=on-disk, but the Pebble one succeeds, and Rocks also succeeds when the teeing engine is not used. This engine has switched toeng2for those functions since that's usually pebble, but I've also added a todo to investigate this further.- Some tests succeed in isolation but fail when the entire package is run, with errors about mismatched iterator validity or different values in
MVCCGet. Only happens in non-mem instances - may warrant further investigation, but it could just be a result of pebble not cleaning up its temp directory while the rocksdb one gets cleaned up somewhere in between tests? rocksdb.IngestExternalFiles()deletes the files passed in while Pebble's does not, so the RocksDB one has to always be run last.IngestExternalFilesin the teeing engine now checks ifeng1is RocksDB, and runs it last if that's the case.
That should be it. The rest is just a straightforward s/pebble/eng1 and s/rocksdb/eng2 rename. Let me know if this is still good to merge. Thanks for the review!
pkg/storage/engine/tee_pebble_rocksdb.go, line 1023 at r3 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I believe this can be
mvccGet(t.ctx, t.rocksdb, ...).
Done.
pkg/storage/engine/tee_pebble_rocksdb.go, line 1042 at r3 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I believe this can be
mvccScanToBytes(t.ctx, t.rocksdb, ...).
Done.
e196e37 to
ee46d16
Compare
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, and @sumeerbhola)
pkg/server/config.go, line 443 at r4 (raw file):
details = append(details, fmt.Sprintf("Pebble cache size: %s", humanizeutil.IBytes(cfg.CacheSize))) pebbleCache = pebble.NewCache(cfg.CacheSize) } else if cfg.StorageEngine == enginepb.EngineTypeRocksDB || cfg.StorageEngine == enginepb.EngineTypeTeePebbleRocksDB {
When we're using the teeing engine, should we halve the cache size used by each engine?
pkg/storage/engine/tee_pebble_rocksdb.go, line 40 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Yep, I was understanding it - just wanted to confirm if you still wanted to do it in light of
TeeTestEngine.MVCCScanrelying on the assumption that one side will return a slice with only one slice inside it. Either way, I've done it, and made some interesting discoveries/compromises along the way:
rocksdb.GetStats()andPreIngestDelay()segfaults inTestDBAddSSTable/store=on-disk, but the Pebble one succeeds, and Rocks also succeeds when the teeing engine is not used. This engine has switched toeng2for those functions since that's usually pebble, but I've also added a todo to investigate this further.- Some tests succeed in isolation but fail when the entire package is run, with errors about mismatched iterator validity or different values in
MVCCGet. Only happens in non-mem instances - may warrant further investigation, but it could just be a result of pebble not cleaning up its temp directory while the rocksdb one gets cleaned up somewhere in between tests?rocksdb.IngestExternalFiles()deletes the files passed in while Pebble's does not, so the RocksDB one has to always be run last.IngestExternalFilesin the teeing engine now checks ifeng1is RocksDB, and runs it last if that's the case.That should be it. The rest is just a straightforward s/pebble/eng1 and s/rocksdb/eng2 rename. Let me know if this is still good to merge. Thanks for the review!
If you're not going to tackle the TODOs immediately, make sure to file issues about them so they don't get lsot.
pkg/storage/engine/tee_pebble_rocksdb.go, line 1 at r4 (raw file):
// Copyright 2019 The Cockroach Authors.
Nit: s/tee_pebble_rocksdb.go/tee.go/g
pkg/storage/engine/tee_pebble_rocksdb.go, line 31 at r4 (raw file):
// This engine is only meant to be used in testing. No performance or // stability guarantees are made about this engine in production. type TeeTestEngine struct {
Nit: while this is only used for testing, I don't think the term Test needs to appear in the name. s/TeeTestEngine/TeeEngine/g.
This change adds a new Engine implementation: TeeTestEngine, as well as associated objects (batch, snapshots, files, iterators). When engine type EngineTypeTeePebbleRocksDB is specified, both pebble and rocksdb instances are spun up and passed to this engine. It writes to both of them in all write paths, and compares their outputs in the read path. A panic is thrown if there's a mismatch. This engine can be used in practice with `--storage-engine pebble+rocksdb`, or the related env variable `COCKROACH_STORAGE_ENGINE` as in `COCKROACH_STORAGE_ENGINE=pebble+rocksdb make test ...`. Fixes cockroachdb#42381. Release note: None
ee46d16 to
2782e89
Compare
itsbilal
left a comment
There was a problem hiding this comment.
Thanks for the review!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, @petermattis, and @sumeerbhola)
pkg/server/config.go, line 443 at r4 (raw file):
Previously, petermattis (Peter Mattis) wrote…
When we're using the teeing engine, should we halve the cache size used by each engine?
I'd say no - the current setup lets us test how each engine would work on its own with the same parameters, at the cost of twice the memory utilization which is fine since we make no performance guarantees. Happy to change it either way though.
pkg/storage/engine/tee_pebble_rocksdb.go, line 1 at r4 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit:
s/tee_pebble_rocksdb.go/tee.go/g
Done.
pkg/storage/engine/tee_pebble_rocksdb.go, line 31 at r4 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit: while this is only used for testing, I don't think the term
Testneeds to appear in the name.s/TeeTestEngine/TeeEngine/g.
Done.
|
CI is green, merging for now. Will file follow-up issues on the discrepancies. Thanks! bors r+ |
41806: docs/RFCs: draft RFC for protected timestamps r=nvanbenschoten a=ajwerner This commit provides a draft RFC for a subsystem to protect values in spans of data alive at specific timestamps from being garbage collected. Release note: None 42053: changefeedccl: add scan boundaries based on change in set of columns r=danhhz a=aayushshah15 Currently, the changefeed poller detects a scan boundary when detects that the last version of a table descriptor has a mutation but the current version doesn't. In case of an `ALTER TABLE DROP COLUMN` statement, the point at which this happens is the point at which the schema change backfill completes. This is incorrect since the dropped column is logically dropped before this point. This PR corrects this problem by instead checking that the last version of the descriptor has a mutation AND that the number of columns in the current table descriptor is different from the number of columns in the last table descriptor. Fixes #41961 Release note (bug fix): Changefeeds now emit backfill row updates for dropped column when the table descriptor drops that column. 42494: storage: Implement teeing Engine to test/compare pebble and rocksdb r=itsbilal a=itsbilal This change adds a new Engine implementation: TeePebbleRocksDB, as well as associated objects (batch, snapshots, files, iterators). This engine type spins up instances of both pebble and rocksdb under the hood, writes to both of them in all write paths, and compares their outputs in the read path. A panic is thrown if there's a mismatch. This engine can be used in practice with `--storage-engine pebble+rocksdb`, or the related env variable `COCKROACH_STORAGE_ENGINE` as in `COCKROACH_STORAGE_ENGINE=pebble+rocksdb make test ...`. Fixes #42381. Release note: None 42635: importccl: don't diasable nullif option when escape character is spec… r=spaskob a=spaskob …ified By default the string 'NULL' is parsed as null value in DELIMITED IMPORT mode. It's usefull to treat 'NULL' as just a normal string and parse it verbatim; this can be accomplished by providing option `WITH fields_escaped_by = '\'` and using '\N' to specify null values and so in this case the parser upon seeing the string 'NULL' will treat it as a normal string. However if the customer provides their own null string via `WITH nullif = 'my_null'` it does not make sense to disregard it if they use escaping as well for some other purposes, for example escaping the field separator. A typical use case is when the empty string should be treated as null. Fixes: https://github.com/cockroachlabs/support/issues/345. Release note (bug fix): when custom nullif is provided always treat it as null. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Aayush Shah <aayush.shah15@gmail.com> Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com> Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, and @sumeerbhola)
pkg/server/config.go, line 443 at r4 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I'd say no - the current setup lets us test how each engine would work on its own with the same parameters, at the cost of twice the memory utilization which is fine since we make no performance guarantees. Happy to change it either way though.
Yeah, that is a good point. This is fine for now.
Build succeeded |
This change adds a new Engine implementation: TeePebbleRocksDB, as well
as associated objects (batch, snapshots, files, iterators). This engine
type spins up instances of both pebble and rocksdb under the hood, writes
to both of them in all write paths, and compares their outputs in the
read path. A panic is thrown if there's a mismatch.
This engine can be used in practice with
--storage-engine pebble+rocksdb, orthe related env variable
COCKROACH_STORAGE_ENGINEas inCOCKROACH_STORAGE_ENGINE=pebble+rocksdb make test ....Fixes #42381.
Release note: None