Skip to content

Conversation

@alexmiller-apple
Copy link
Contributor

This PR is being posted as rebased on top of #2208, so begin the review only from "Spill SharedTLog when there's more than one".

Internally, we have a larger container around TLog generations, called a SharedTLog. Currently, when configuring between e.g. log_version:=4 and log_version:=5, this will cause two different SharedTLogs to be created, because each log version wants to live in its own file. SharedTLog was meant to allow multiple generations of TLogs to work harmoniously to never store more than 2GB of mutations in memory, and having more than one of them defeats this.

This PR is a re-post of #2029 , and does the same fix, but more. When a new TLog is recruited, we:
(1) Announce the ID of the SharedTLog that did the recruitment to all SharedTLogs
(2) All old SharedTLogs stop all of their TLog generations
(3) All SharedTLogs set their allowed memory usage limit to 40MB after a 10s delay, to force spilling.

The 10s is so that this doesn't affect the normal steady state of a cluster. Most data gets written to a transaction log, and popped within 10s. So we wait 10s before starting to spill. Not clever, but probably effective.

The ASSERT_WE_THINK in the spilling loop that verifies that the TLog isn't configured to the 40MB limit if it is the live TLog generation, so simulation will catch bugs that would cause excessive spilling.

alexmiller-apple and others added 5 commits October 7, 2019 18:08
When switching between spill_type or log_version, a new instance of a
SharedTLog is created in the transaction log processes.  If this is done
in a saturated database, then doubling the amount of memory to hold
mutations in memory can cause TLogs to be uncomfortably close to the 8GB
OOM limit.

Instead, we now thread which UID of a SharedTLog is active, and the
other TLog spill out the majority of their mutations.
And add some useful logging about when things do or do not spill.
Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
@alexmiller-apple
Copy link
Contributor Author

And #2208 is now merged, so I've rebased out its commits.

@jzhou77 jzhou77 merged commit fef89aa into apple:master Oct 8, 2019
alexmiller-apple added a commit to alexmiller-apple/foundationdb that referenced this pull request Oct 17, 2019
When switching between spill_type or log_version, a new instance of a
SharedTLog is created in the transaction log processes.  If this is done
in a saturated database, then doubling the amount of memory to hold
mutations in memory can cause TLogs to be uncomfortably close to the 8GB
OOM limit.

Instead, we now thread which UID of a SharedTLog is active, and the
other TLog spill out the majority of their mutations.

This is a backport of apple#2213 (fef89aa) to release-6.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants