Stop and spill TLogs when a new TLog is recruited in a different SharedTLog #2213

alexmiller-apple · 2019-10-04T08:41:08Z

This PR is being posted as rebased on top of #2208, so begin the review only from "Spill SharedTLog when there's more than one".

Internally, we have a larger container around TLog generations, called a SharedTLog. Currently, when configuring between e.g. log_version:=4 and log_version:=5, this will cause two different SharedTLogs to be created, because each log version wants to live in its own file. SharedTLog was meant to allow multiple generations of TLogs to work harmoniously to never store more than 2GB of mutations in memory, and having more than one of them defeats this.

This PR is a re-post of #2029 , and does the same fix, but more. When a new TLog is recruited, we:
(1) Announce the ID of the SharedTLog that did the recruitment to all SharedTLogs
(2) All old SharedTLogs stop all of their TLog generations
(3) All SharedTLogs set their allowed memory usage limit to 40MB after a 10s delay, to force spilling.

The 10s is so that this doesn't affect the normal steady state of a cluster. Most data gets written to a transaction log, and popped within 10s. So we wait 10s before starting to spill. Not clever, but probably effective.

The ASSERT_WE_THINK in the spilling loop that verifies that the TLog isn't configured to the 40MB limit if it is the live TLog generation, so simulation will catch bugs that would cause excessive spilling.

fdbserver/worker.actor.cpp

fdbserver/OldTLogServer_6_2.actor.cpp

fdbserver/TLogServer.actor.cpp

fdbserver/worker.actor.cpp

fdbserver/OldTLogServer_6_0.actor.cpp

fdbserver/TLogServer.actor.cpp

fdbserver/OldTLogServer_6_2.actor.cpp

When switching between spill_type or log_version, a new instance of a SharedTLog is created in the transaction log processes. If this is done in a saturated database, then doubling the amount of memory to hold mutations in memory can cause TLogs to be uncomfortably close to the 8GB OOM limit. Instead, we now thread which UID of a SharedTLog is active, and the other TLog spill out the majority of their mutations.

And add some useful logging about when things do or do not spill.

Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>

alexmiller-apple · 2019-10-08T01:08:58Z

And #2208 is now merged, so I've rebased out its commits.

When switching between spill_type or log_version, a new instance of a SharedTLog is created in the transaction log processes. If this is done in a saturated database, then doubling the amount of memory to hold mutations in memory can cause TLogs to be uncomfortably close to the 8GB OOM limit. Instead, we now thread which UID of a SharedTLog is active, and the other TLog spill out the majority of their mutations. This is a backport of apple#2213 (fef89aa) to release-6.2

alexmiller-apple requested a review from jzhou77 October 4, 2019 08:41

alexmiller-apple assigned jzhou77 Oct 4, 2019

jzhou77 reviewed Oct 4, 2019

View reviewed changes

fdbserver/worker.actor.cpp Outdated Show resolved Hide resolved

jzhou77 reviewed Oct 4, 2019

View reviewed changes

fdbserver/OldTLogServer_6_2.actor.cpp Outdated Show resolved Hide resolved

fdbserver/TLogServer.actor.cpp Outdated Show resolved Hide resolved

jzhou77 reviewed Oct 4, 2019

View reviewed changes

fdbserver/worker.actor.cpp Outdated Show resolved Hide resolved

alexmiller-apple commented Oct 8, 2019

View reviewed changes

fdbserver/OldTLogServer_6_0.actor.cpp Outdated Show resolved Hide resolved

fdbserver/TLogServer.actor.cpp Outdated Show resolved Hide resolved

fdbserver/OldTLogServer_6_2.actor.cpp Outdated Show resolved Hide resolved

alexmiller-apple and others added 5 commits October 7, 2019 18:08

Fix a bug that would cause active logs to spill aggressively

71af24d

And add some useful logging about when things do or do not spill.

Fix whitespace.

b3fd4f6

Shuffle member initialization in constructor.

a34a009

Comment variable and code style fix

77c72de

Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>

alexmiller-apple force-pushed the new-log-spill-default branch from 7b681ed to 77c72de Compare October 8, 2019 01:08

jzhou77 approved these changes Oct 8, 2019

View reviewed changes

jzhou77 merged commit fef89aa into apple:master Oct 8, 2019

alexmiller-apple mentioned this pull request Oct 17, 2019

Spill SharedTLog when there's more than one #2256

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stop and spill TLogs when a new TLog is recruited in a different SharedTLog #2213

Stop and spill TLogs when a new TLog is recruited in a different SharedTLog #2213

Uh oh!

alexmiller-apple commented Oct 4, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexmiller-apple commented Oct 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stop and spill TLogs when a new TLog is recruited in a different SharedTLog #2213

Stop and spill TLogs when a new TLog is recruited in a different SharedTLog #2213

Uh oh!

Conversation

alexmiller-apple commented Oct 4, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexmiller-apple commented Oct 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants