Skip to content

storage: max number of intents in WriteIntentError during scan#65465

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
aliher1911:add-scan-intent-limit
Jun 1, 2021
Merged

storage: max number of intents in WriteIntentError during scan#65465
craig[bot] merged 1 commit intocockroachdb:masterfrom
aliher1911:add-scan-intent-limit

Conversation

@aliher1911
Copy link
Copy Markdown
Contributor

@aliher1911 aliher1911 commented May 19, 2021

Previously consistent scan operation could return all encountered
intents from a range which could lead to process running out of memory.
This was the case when trying to cleanup built up intents by a high pri
query.

To address this, limit is added to the scan, to only return so much
intents before aborting.

This limit is controlled by cluster setting
storage.mvcc.max_intents_per_error which is replacing previously
available setting storage.sst_export.max_intents_per_error. New
setting will cover both scan and export commands.

Release note (ops change): Added limit to number of intents
collected by scan before aborting.

@aliher1911 aliher1911 requested review from nvb and sumeerbhola May 19, 2021 15:37
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@aliher1911
Copy link
Copy Markdown
Contributor Author

Some context.

Limit is added as env var as
a) we are not likely to change it much
b) currently mvcc doesn't have easy access to cluster config same way as sst export does. refactoring scanner to be a part of reader (similar to mvcciterator) could solve the problem, but that would be hard to backport to older versions.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @nvanbenschoten)


pkg/storage/pebble_mvcc_scanner.go, line 453 at r1 (raw file):

			// that lie before the resume key.
			return false
		}

the code here looks peculiar. It is checking p.results.count but adding to p.intents, so it seems there isn't a limit on p.intents.Count.

@aliher1911 aliher1911 force-pushed the add-scan-intent-limit branch from 760f3bb to 1ec07a9 Compare May 19, 2021 20:03
@aliher1911
Copy link
Copy Markdown
Contributor Author


pkg/storage/pebble_mvcc_scanner.go, line 453 at r1 (raw file):

Previously, sumeerbhola wrote…

the code here looks peculiar. It is checking p.results.count but adding to p.intents, so it seems there isn't a limit on p.intents.Count.

That's the reason I didn't apply limit to inconsistent scans. It should return resumespan which should be after last value key, but should have no intents. Not sure who expects it though as there are quite a lot of call sites ending up in the scan.
To make it work we need to rewind to last key once we hit intent limit or something.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @sumeerbhola)


pkg/storage/mvcc.go, line 83 at r2 (raw file):

	}())

// maxIntentsPerScanError is maximum number of intents returned in

Is there a reason to keep this separate from the storage.sst_export.max_intents_per_error cluster setting? I was imagining that we'd generalize that a little bit. For instance, rename it to storage.mvcc.max_intents_per_error and use it in both places.

If the concern is that we don't want to plumb cluster settings down here then we could explore plumbing it through the MVCCScanOptions.


pkg/storage/mvcc_test.go, line 670 at r2 (raw file):

			// Limit number of intents per error in consistent scan.
			maxIntentsPerScanError = 2

Setting a global variable like this in a test is an anti-pattern. The effect is that any test that runs after this one in this package will now see the new value, which is not desirable.


pkg/storage/mvcc_test.go, line 696 at r2 (raw file):

					expIntents: []roachpb.Intent{
						roachpb.MakeIntent(&txn2ts.TxnMeta, testKey4),
						roachpb.MakeIntent(&txn2ts.TxnMeta, testKey5),

Is this test case hitting the limit? Wouldn't we want to make txn2 write 3 intents and show that only two are returned?


pkg/storage/mvcc_test.go, line 710 at r2 (raw file):

				},
				{
					name:       "inconsistent--all-keys",

nit: extra -?


pkg/storage/mvcc_test.go, line 734 at r2 (raw file):

					if wiErr == nil != !scan.consistent {
						t.Errorf("expected write intent error; got %s", err)
						return

Now that you're using a subtest, you can change the t.Errorf to t.Fatalf and you won't have to worry about returning.

@aliher1911 aliher1911 force-pushed the add-scan-intent-limit branch from 1ec07a9 to 9e0c010 Compare May 20, 2021 09:16
Copy link
Copy Markdown
Contributor Author

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/storage/mvcc.go, line 83 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is there a reason to keep this separate from the storage.sst_export.max_intents_per_error cluster setting? I was imagining that we'd generalize that a little bit. For instance, rename it to storage.mvcc.max_intents_per_error and use it in both places.

If the concern is that we don't want to plumb cluster settings down here then we could explore plumbing it through the MVCCScanOptions.

I totally agree with this approach to reuse the same cluster setting and maybe rename it to reflect its broader use. MVCCScanOptions are created in multiple call sites:

pkg/kv/kvserver/replica_raftstorage.go:242:
pkg/kv/kvserver/store.go:1338:
pkg/kv/kvserver/abortspan/abortspan.go:107:
pkg/kv/kvserver/batcheval/cmd_reverse_scan.go:42:
pkg/kv/kvserver/batcheval/cmd_refresh_range.go:62:
pkg/kv/kvserver/batcheval/cmd_scan.go:42:
pkg/kv/kvserver/gc/gc.go:500:
pkg/kv/kvserver/gc_queue_test.go:898:
pkg/cli/debug.go:704:
pkg/cli/debug_check_store.go:241:
pkg/storage/metamorphic/operations.go:366:
pkg/storage/mvcc.go:2326:
pkg/storage/mvcc.go:2618:

and i don't think most of them has immediate access to settings either. They already operate on storage.Reader at that point.
Second thing that I don't like about putting that to options is that this is a safety feature and it should work regardless of scan request options because it is easy to misuse otherwise and pushes burden on the scan user.
Imho to make it proper we could do something similar to what was done with mvcc iterator. Make it specific to underlying storage (it is already is because it is even called pebbleMVCCScanner) and obtain it from reader same way as mvcciterator. In that case we can reuse option and let storage control this in uniform way.
But that would make it hard to backport to 20.2 where it is needed because of rocksdb storage and different code there.
Maybe the right way is to have an env in 20.2 and make it different 21.1 and on with scanner being part of reader contract?


pkg/storage/mvcc_test.go, line 670 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Setting a global variable like this in a test is an anti-pattern. The effect is that any test that runs after this one in this package will now see the new value, which is not desirable.

Done.


pkg/storage/mvcc_test.go, line 696 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this test case hitting the limit? Wouldn't we want to make txn2 write 3 intents and show that only two are returned?

This particular one was not. Added another transaction to the mix.
Done.


pkg/storage/mvcc_test.go, line 710 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: extra -?

Done.

@aliher1911 aliher1911 force-pushed the add-scan-intent-limit branch 2 times, most recently from faf2440 to 4c1cec2 Compare May 20, 2021 10:09
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/pebble_mvcc_scanner.go, line 453 at r1 (raw file):

Previously, aliher1911 (Oleg) wrote…

That's the reason I didn't apply limit to inconsistent scans. It should return resumespan which should be after last value key, but should have no intents. Not sure who expects it though as there are quite a lot of call sites ending up in the scan.
To make it work we need to rewind to last key once we hit intent limit or something.

Yes, the "intents should only correspond to KVs that lie before the resume key" seems to tie our hands. Can you add a TODO here that this means that if we see a value, followed by N intents, where N >> maxIntents, we have to keep adding those intents until we see the next value, and this could cause OOMs. We should remember to fix this. I don't know how widespread a change will be needed to fix this since I don't know what semantics we desire from inconsistent scans. Looking at the code here, it seems that if we find an intent for a key, we also want to add a value (if one exists for that key) below that intent. So if all we need is completeness in terms of a resume key representing that all intents and values before that key (for forward scans) have been included, we could fix the enforcement logic locally and that would be sufficient (@nvanbenschoten for clarification).

And looking more closely, I'm not sure this "now we're adding the resume key" comment is correct. If I am reading correctly, the only place where we do maxKeys enforcement is in addAndAdvance and in all those cases we immediately unwind out of getAndAdvance up to the top-level of scan which is calling getAndAdvance in a loop. Once the maxKeys is reached, that loop terminates and we use advanceKey to find the resume key, not getAndAdvance, where advanceKey doesn't care whether something is an intent or not (it just wants the roachpb.Key, and not the timestamp part).

So one relatively clean solution for maxKeys and maxIntents enforcement for inconsistent scan could be: look at maxIntents in the code here, and if exceeded, call seekVersion as we do below, but regardless of what value that seekVersion returns, return false from getAndAdvance. This should work since seekVersion will either add a value, or there is no value to add. Either way, we have a safe stopping point.

If I am incorrect of the desired semantics a less complete fix would be: in addAndAdvance after adding a value, if maxIntents has been exceeded, return false. This is because a value addition is always a safe stopping point before we generate the resume span using advanceKey.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/storage/mvcc.go, line 83 at r2 (raw file):
@sumeerbhola do you have thoughts about this? Making this specific to the underlying storage feels strange to me, as does using the storage.Reader to access cluster settings. How much context do we expect these storage-level interfaces to have? Would we want to expose a Settings method on storage.Reader/Writer?

it is already is because it is even called pebbleMVCCScanner

Despite the name, pebbleMVCCScanner is actually not a specific pebble implementation. It operates on the MVCCIterator interface.


pkg/storage/pebble_mvcc_scanner.go, line 453 at r1 (raw file):
This analysis (And looking more closely, ...) and the proposed solution (So one relatively clean solution ...) all appear correct to me.

Although,

This should work since seekVersion will either add a value, or there is no value to add.

If there is no value to add at this key, this will mean that the iterator will be positioned at the next key, right? So then will the call to advanceKey outside of the getAndAdvance loop iterate again and skip over the next key?

That's the reason I didn't apply limit to inconsistent scans.

I don't think the maxIntents limit should be applied to inconsistent scans. Such scans treat intents similar to committed values and don't return WriteIntentErrors. So if anything, we should apply the maxKeys limit to inconsistent scans.

Copy link
Copy Markdown
Contributor Author

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/storage/mvcc.go, line 83 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

@sumeerbhola do you have thoughts about this? Making this specific to the underlying storage feels strange to me, as does using the storage.Reader to access cluster settings. How much context do we expect these storage-level interfaces to have? Would we want to expose a Settings method on storage.Reader/Writer?

it is already is because it is even called pebbleMVCCScanner

Despite the name, pebbleMVCCScanner is actually not a specific pebble implementation. It operates on the MVCCIterator interface.

Who do you think should be responsible for enforcing the limit in general?
It is either command or the underlying resource in my understanding. We have multiple call sites/code paths that end up in scan. Should they be responsible for pushing the limit here? In that case we should push that up the the request itself and propagate through options down. Alternatively if it is resources responsibility to maintain system health then they should set those limits.
I think it is latter is better because having commands that can break system if they forgot to set limits is bad.
As for the option itself, currently commands operate on reader using request data once they pass down the replica_evaluate and they don't have access to replica or other server components that has notion of configs.
For export command I used information in pebble since export is a method on reader itself and it has all the config references, but scan is better abstracted and it only relies on iterator on reader to do the scan.


pkg/storage/pebble_mvcc_scanner.go, line 453 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This analysis (And looking more closely, ...) and the proposed solution (So one relatively clean solution ...) all appear correct to me.

Although,

This should work since seekVersion will either add a value, or there is no value to add.

If there is no value to add at this key, this will mean that the iterator will be positioned at the next key, right? So then will the call to advanceKey outside of the getAndAdvance loop iterate again and skip over the next key?

That's the reason I didn't apply limit to inconsistent scans.

I don't think the maxIntents limit should be applied to inconsistent scans. Such scans treat intents similar to committed values and don't return WriteIntentErrors. So if anything, we should apply the maxKeys limit to inconsistent scans.

If we apply maxKeys, I assume we want to count results and intents together for that purpose. That would make more sense to me as it is even called keys not max values.

@aliher1911
Copy link
Copy Markdown
Contributor Author


pkg/storage/pebble_mvcc_scanner.go, line 453 at r1 (raw file):

Previously, aliher1911 (Oleg) wrote…

If we apply maxKeys, I assume we want to count results and intents together for that purpose. That would make more sense to me as it is even called keys not max values.

What is confusing to me in this check is how could we get even get here with this check failing. getAndAdvance would always return values with addAndAdvance which will perform this check and make getAndAdvance return false and break the scan loop. So we would not get here with those checks passing as true. But I'm sure there's some corner case here which I don't see.

@aliher1911
Copy link
Copy Markdown
Contributor Author


pkg/storage/pebble_mvcc_scanner.go, line 453 at r1 (raw file):

Previously, aliher1911 (Oleg) wrote…

What is confusing to me in this check is how could we get even get here with this check failing. getAndAdvance would always return values with addAndAdvance which will perform this check and make getAndAdvance return false and break the scan loop. So we would not get here with those checks passing as true. But I'm sure there's some corner case here which I don't see.

Now that i traced thing myself I see what Sumeer is talking about the same thing. 🗡️

@aliher1911 aliher1911 force-pushed the add-scan-intent-limit branch from 4c1cec2 to 5ece185 Compare May 27, 2021 12:55
Copy link
Copy Markdown
Contributor Author

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the way intent limits are calculated for inconsistent mode. Looking on what we do we iterate certain number of keys and return them regardless if they are values or intents so that we can resolve them before resuming. In that case it doesn't make much sense to count them separately, we only want to count keys and stop once we find desired number of keys among intents and keys.
That would prevent cases where we could collect lots of intents where we don't have a value in the middle and respect the limits that caller set.

The question with how to provide configuration (if at all) for WriteIntentError's is still open. I don't think there's currently an easy way, but do we really need to configure this value? Maybe current "hardcoded" approach would cut it with emergency override vie env?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/storage/pebble_mvcc_scanner.go, line 453 at r1 (raw file):

Previously, aliher1911 (Oleg) wrote…

Now that i traced thing myself I see what Sumeer is talking about the same thing. 🗡️

So I tried a couple of things and that's the result so far.
So simple solution was not enough because of how logic to set the resume span interacts with seekVersion. Seek version would always push to next key if it can't find value. If it can find value it would stay on the same key if we reached limits or it advances if it doesn't.
I decided that should be explicit then, if we stop at intent then we don't advance second time and just check that underlying iterator is valid. That should cover both cases.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach here looks good. However, I am starting to think that we should split out the pagination of intents in a WriteIntentError from the pagination of intents during an inconsistent scan. These are largely different concerns, and there's a good reason not to combine them into a single PR. To start, the latter change here is making this PR significantly more complex, to the point where we would not feel comfortable backporting this. The code here around counting keys and tracking where the iterator is pointed is just too subtle, and we've seen bugs in this area in the past. We'll also want to check that the latter change is needed (does anyone issue inconsistent scans with limits?) and, if so, that these users are properly written to handle pagination. Finally, I think there's still some uncertainty about the exact semantics we want key limits applied to inconsistent scans to have. Right now, each distinct key that has either a value, an intent, or both, is counted against the limit. There are good reasons for that from an implementation perspective, but do we know that this is what callers want? If I'm issuing an inconsistent scan, am I applying a limit to bound the memory size of the result set, or am I applying a limit to bound the span of keys scanned to N committed values (plus any intents I see along the way)? I don't know the answer to that question, but I could see callers wanting both. So we should understand how this is being used today.

For all of these reasons, I think we'd be better served splitting up the changes.

Reviewed 1 of 2 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @nvanbenschoten, and @sumeerbhola)


pkg/storage/mvcc.go, line 83 at r2 (raw file):

I think it is latter is better because having commands that can break system if they forgot to set limits is bad.

On the other hand, if a request did not set a key limit and scanned over a keyspace with 10 million committed keys, it would also cause issues. So I don't know that this logic holds in all cases.

As for the option itself, currently commands operate on reader using request data once they pass down the replica_evaluate and they don't have access to replica or other server components that has notion of configs.

Commands have access to the replica and the settings through the CommandArgs.EvalContext.


pkg/storage/mvcc_test.go, line 1461 at r4 (raw file):

}

func TestMVCCScanInconsistentLimits(t *testing.T) {

Let's give this a comment.


pkg/storage/mvcc_test.go, line 1473 at r4 (raw file):

			ts1 := hlc.Timestamp{WallTime: 1}
			ts2 := hlc.Timestamp{WallTime: 2}
			//ts3 := hlc.Timestamp{WallTime: 3}

nit: remove the unused vars.


pkg/storage/pebble_mvcc_scanner.go, line 118 at r4 (raw file):

	targetBytes int64
	// Stop adding intents and abort scan once maxIntents threshold is reached.
	// Ignored if zero or if doing inconsistent scan.

Mention why it is ignored if doing an inconsistent scan. Something about how the maxKeys limit also applies to intent in that case.


pkg/storage/pebble_mvcc_scanner.go, line 156 at r4 (raw file):

	// [1, maxItersBeforeSeek] and defaults to maxItersBeforeSeek/2 .
	itersBeforeSeek int
	// Number of keys collected as values and intents in inconsistent mode.

s/and intents in inconsistent mode/or intents (if in inconsistent mode)/


pkg/storage/pebble_mvcc_scanner.go, line 158 at r4 (raw file):

	// Number of keys collected as values and intents in inconsistent mode.
	// This is different from number of results plus number of intents as we could
	// have intent without value and value without intent as well as both.

"have intent without a value, value without an intent, or keys with both"


pkg/storage/pebble_mvcc_scanner.go, line 160 at r4 (raw file):

	// have intent without value and value without intent as well as both.
	collectedKeys int64
	// This flag indicates that last iteration collected only intent, but not

s/intent/an intent/ throughout.


pkg/storage/pebble_mvcc_scanner.go, line 162 at r4 (raw file):

	// This flag indicates that last iteration collected only intent, but not
	// a preceding value for it. It is set whenever intent is discovered and reset
	// back if corresponding value is present. It is used to calculate number of

s/number/the number/

Copy link
Copy Markdown
Contributor Author

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second part didn't work out. Some tests failed and they seem to rely on particular behaviour like if you scan hits intent then it expects to get a subsequent key as a result if it is under limit, so the whole idea that maxkeys could just limit what we return seem flawed. I think scanner behaviour is defined vaguely, so let's not touch it then and limit it to writeintenterror.
Let me revert all the parts that are not relevant to that and I'll push option to config via context.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @nvanbenschoten, and @sumeerbhola)

Copy link
Copy Markdown
Contributor Author

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/storage/mvcc.go, line 83 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think it is latter is better because having commands that can break system if they forgot to set limits is bad.

On the other hand, if a request did not set a key limit and scanned over a keyspace with 10 million committed keys, it would also cause issues. So I don't know that this logic holds in all cases.

As for the option itself, currently commands operate on reader using request data once they pass down the replica_evaluate and they don't have access to replica or other server components that has notion of configs.

Commands have access to the replica and the settings through the CommandArgs.EvalContext.

Changed to use shared cluster setting.

@aliher1911 aliher1911 force-pushed the add-scan-intent-limit branch 5 times, most recently from 216e8f5 to c9ad0c3 Compare May 28, 2021 07:59
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. A few minor comments.

Reviewed 4 of 5 files at r5, 6 of 6 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @nvanbenschoten, and @sumeerbhola)


pkg/storage/mvcc.go, line 83 at r2 (raw file):

Previously, aliher1911 (Oleg) wrote…

Unfortunately

I missed the earlier ping. It seems there are arguments for both ways.
Given this case is one where we are returning an error, and this is just limiting the size of the error, it sounds like a reasonable choice to have it here. Things with resume spans are different -- the caller has to understand the partial result semantics, unlike this error case where the incompleteness of the intent list is irrelevant to the caller.


pkg/storage/mvcc.go, line 2529 at r6 (raw file):

	// consistent mode. Scanning will stop without reaching MaxKeys if this
	// threshold is reached.
	// For inconsistent scans number of intents is limited by number of keys

Is this comment about inconsistent scans still relevant, given the discussion about not making any change to it in this PR?


pkg/storage/pebble.go, line 81 at r6 (raw file):

	"storage.sst_export.max_intents_per_error",
	"maximum number of intents returned in error when sst export fails",
	maxIntentsPerSstExportErrorDefault)

what's our story with changing names of settings? Was the PR that introduced this backported to various versions? Is this fine since this was not public? I've seen warnings in logs before when playing around with settings names in PRs, so I wonder if this change could worry users who see unnecessary warnings.


pkg/storage/pebble_mvcc_scanner.go, line 122 at r6 (raw file):

	// intents as an error. For inconsistent scans number of intents is limited
	// by number of keys scan requests and clients should rely on pagination
	// to sensibly process large ranges.

I find the "For inconsistent scans ..." phrasing confusing. There is no limit on intents there. Whatever limit happens is just a side-effect of counting non-intent keys, and stopping after maxKeys, yes?

Copy link
Copy Markdown
Contributor Author

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @nvanbenschoten, and @sumeerbhola)


pkg/storage/mvcc.go, line 2529 at r6 (raw file):

Previously, sumeerbhola wrote…

Is this comment about inconsistent scans still relevant, given the discussion about not making any change to it in this PR?

I think I can improve the comment, but the discussion was that we need to mention inconsistent scans for completeness here.


pkg/storage/pebble.go, line 81 at r6 (raw file):

Previously, sumeerbhola wrote…

what's our story with changing names of settings? Was the PR that introduced this backported to various versions? Is this fine since this was not public? I've seen warnings in logs before when playing around with settings names in PRs, so I wonder if this change could worry users who see unnecessary warnings.

@nvanbenschoten can you elaborate? I don't think this option was used by anyone yet and I don't think we encourage tweaking this option unless absolutely necessary. Other option would be not to rename and reuse the same setting, but that would be confusing and harder to find. Or we can have another one, which will also add complexity as you may need to cap both of those values to reduce memory usage in case of intent buildup.


pkg/storage/pebble_mvcc_scanner.go, line 122 at r6 (raw file):

Previously, sumeerbhola wrote…
	// intents as an error. For inconsistent scans number of intents is limited
	// by number of keys scan requests and clients should rely on pagination
	// to sensibly process large ranges.

I find the "For inconsistent scans ..." phrasing confusing. There is no limit on intents there. Whatever limit happens is just a side-effect of counting non-intent keys, and stopping after maxKeys, yes?

That is correct. Caller gets as many intents as fits in between keys. Let me try to make it more clear.

@aliher1911 aliher1911 force-pushed the add-scan-intent-limit branch from c9ad0c3 to 1c4caef Compare May 28, 2021 16:09
Copy link
Copy Markdown
Contributor Author

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @nvanbenschoten, and @sumeerbhola)


pkg/storage/mvcc.go, line 2529 at r6 (raw file):

Previously, aliher1911 (Oleg) wrote…

I think I can improve the comment, but the discussion was that we need to mention inconsistent scans for completeness here.

Done.


pkg/storage/pebble_mvcc_scanner.go, line 122 at r6 (raw file):

Previously, aliher1911 (Oleg) wrote…

That is correct. Caller gets as many intents as fits in between keys. Let me try to make it more clear.

Done.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @nvanbenschoten, and @sumeerbhola)

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: looks great!

Reviewed 3 of 5 files at r5, 5 of 6 files at r6, 2 of 2 files at r7.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @aliher1911 and @sumeerbhola)


pkg/storage/mvcc.go, line 2529 at r6 (raw file):

Previously, aliher1911 (Oleg) wrote…

Done.

Let's make a note that MaxIntents is a maximum number of intents collected by scanner in consistent mode before returning a WriteIntentError.


pkg/storage/mvcc.go, line 80 at r7 (raw file):

// MaxIntentsPerWriteIntentError sets maximum number of intents returned in
// WriteIntentError in operations that return multiple intents per error.
// Currently it is used in Scan and ExportToSST.

Scan, ReverseScan, and ExportToSST.


pkg/storage/pebble.go, line 81 at r6 (raw file):

Previously, aliher1911 (Oleg) wrote…

@nvanbenschoten can you elaborate? I don't think this option was used by anyone yet and I don't think we encourage tweaking this option unless absolutely necessary. Other option would be not to rename and reuse the same setting, but that would be confusing and harder to find. Or we can have another one, which will also add complexity as you may need to cap both of those values to reduce memory usage in case of intent buildup.

Usually, the story is that we try not to rename settings and we add the deleted setting name to the retiredSettings set to avoid ever creating them again. I agree that we don't expect anyone to have ever set storage.sst_export.max_intents_per_error, especially if we backport this to 20.2 and 21.1 when we can, so I don't think there's much of a concern that someone sets it and then its no longer working after an upgrade. My suggestion is to add it to retiredSettings and proceed with the rename.

Previously consistent scan operation could return all encountered
intents from a range which could lead to process running out of memory.
This was the case when trying to cleanup built up intents by a high pri
query.

To address this, limit is added to the scan, to only return so much
intents before aborting.

This limit is controlled by cluster setting
storage.mvcc.max_intents_per_error which is replacing previously
available setting storage.sst_export.max_intents_per_error. New
setting will cover both scan and export commands.

Release note (ops change): Added limit to number of intents
collected by scan before aborting.
@aliher1911 aliher1911 force-pushed the add-scan-intent-limit branch from 1c4caef to 75bddec Compare May 31, 2021 19:03
Copy link
Copy Markdown
Contributor Author

@aliher1911 aliher1911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @nvanbenschoten and @sumeerbhola)


pkg/storage/mvcc.go, line 2529 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's make a note that MaxIntents is a maximum number of intents collected by scanner in consistent mode before returning a WriteIntentError.

Done.

@aliher1911
Copy link
Copy Markdown
Contributor Author

bors r=sumeerbhola,nvanbenschoten

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 1, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kv: Add server option for maximum number of intents per WriteIntentError for scans

4 participants