[DNM]: gc garbage generator and KVDebug service#18661
Closed
tbg wants to merge 1 commit intocockroachdb:masterfrom
Closed
[DNM]: gc garbage generator and KVDebug service#18661tbg wants to merge 1 commit intocockroachdb:masterfrom
tbg wants to merge 1 commit intocockroachdb:masterfrom
Conversation
Member
This PR shows the tooling I used to [stress test the GC queue]. In short, I needed a way to put
large amounts of intents on a single range; I didn't particularly care to do this on a multi-node
cluster, but I needed to do it efficiently for quick turnaround (and also to prevent the GC queue
from cleaning up my garbage faster than I could insert it).
This was also a good opportunity to investigate "better" debugging tools and to revisit the
`ExternalServer` interface, which historically has been the KV store we once wanted to expose to
clients. It has since become internal and is technically slated for removal, but at the same time it
has seen continued use. The reasons for keeping (something like it) are:
1. debug running clusters that are potentially wedged due to invalid KV data. Be able to read
transaction entries and raw KV columns that are unexpected to the SQL layer.
2. in our testing, create problematic conditions that are unattainable by using the public
interfaces (creating artificial GC pressure being one example)
I also think that there's a point to be made to add functionality such as being able to force a
Range to run garbage collection, etc, though that's out of scope here.
In this PR, I've sketched out a TxnCoordSender-level entry point that is tied to a bidirectional
streaming connection. This has the advantage that there is a context available the lifetime of which
is tied to the connection, which means that `TxnCoordSender` can base its transaction heartbeats
based on that (this is not to suggest that we should be running serious transactions through this
interface, but it establishes parity and, assuming that `client.NewSender` went through this
endpoint instead, TxnCoordSender could be simplified to always use the incoming context). There is
more subtlety in this topic since we want to [merge] `TxnCoordSender` and `client.{DB,Txn}` though,
so don't take this as a concrete suggestion.
What's been more immediately useful is a pretty low-level endpoint that allows evaluating a
`BatchRequest` on any given `Replica` (bypassing the command queue, etc) and seeing the results
(more controversially and important for `gcpressurizer` is the ability to *execute* these batches,
something that's quite dangerous in the wrong hands due to the potential of creating inconsistency
and also its insufficient synchronization with splits, etc). I think that's the part worth exploring
since it's a universally useful last resort when things go wrong and visibility into on-disk state
is desired without shutting down the node.
Long story short, I have this code and it's definitely not something to check in, but to discuss.
It'd be nice to programmatically test the GC queue in that way, and perhaps randomly "pollute"
some of our test clusters in ever-escalating ways, to improve their resilience.
[stress test the GC queue]: cockroachdb#9540
[merge]: cockroachdb#16000
tbg
added a commit
to tbg/cockroach
that referenced
this pull request
Apr 16, 2018
A simple data generator that makes a single large range. The created dataset can then be used for various tests, for example to exercises issues as the ones ultimately leading to cockroachdb#20589, or to make sure [large snapshots] work (once implemented). This is a work-in-progress because I haven't reached clarity on what the best way to hook things up in these tests would be. Do we want to create the datasets and upload them somewhere? That has been fragile in the past as the upload progress usually gets seldom exercised and thus rots. The alternative (which I'm leaning towards) is to bundle this binary with the test code (either explicitly or via use as a library) and create fresh test data every time (these tests would run as nightlies and so dataset generation speed isn't the top concern). In making these decisions, we should also take into account more involved datasets that can't as easily be generated from a running cluster, such as [gcpressurizer]. For those, my current take is that we'll just generate an initialized data dir, open the resulting RocksDB instance manually again, and write straight into it (via some facility that updates stats correctly, i.e. presumably `MVCCPut` and friends). Release note: None [large snapshots]: cockroachdb#16954 [gcpressurizer]: cockroachdb#18661
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR shows the tooling I used to stress test the GC queue. In short, I needed a way to put
large amounts of intents on a single range; I didn't particularly care to do this on a multi-node
cluster, but I needed to do it efficiently for quick turnaround (and also to prevent the GC queue
from cleaning up my garbage faster than I could insert it).
This was also a good opportunity to investigate "better" debugging tools and to revisit the
ExternalServerinterface, which historically has been the KV store we once wanted to expose toclients. It has since become internal and is technically slated for removal, but at the same time it
has seen continued use. The reasons for keeping (something like it) are:
transaction entries and raw KV columns that are unexpected to the SQL layer.
interfaces (creating artificial GC pressure being one example)
I also think that there's a point to be made to add functionality such as being able to force a
Range to run garbage collection, etc, though that's out of scope here.
In this PR, I've sketched out a TxnCoordSender-level entry point that is tied to a bidirectional
streaming connection. This has the advantage that there is a context available the lifetime of which
is tied to the connection, which means that
TxnCoordSendercan base its transaction heartbeatsbased on that (this is not to suggest that we should be running serious transactions through this
interface, but it establishes parity and, assuming that
client.NewSenderwent through thisendpoint instead, TxnCoordSender could be simplified to always use the incoming context). There is
more subtlety in this topic since we want to merge
TxnCoordSenderandclient.{DB,Txn}though,so don't take this as a concrete suggestion.
What's been more immediately useful is a pretty low-level endpoint that allows evaluating a
BatchRequeston any givenReplica(bypassing the command queue, etc) and seeing the results(more controversially and important for
gcpressurizeris the ability to execute these batches,something that's quite dangerous in the wrong hands due to the potential of creating inconsistency
and also its insufficient synchronization with splits, etc). I think that's the part worth exploring
since it's a universally useful last resort when things go wrong and visibility into on-disk state
is desired without shutting down the node.
Long story short, I have this code and it's definitely not something to check in, but to discuss.
It'd be nice to programmatically test the GC queue in that way, and perhaps randomly "pollute"
some of our test clusters in ever-escalating ways, to improve their resilience.