Skip to content

QQ: when invoking drain only shut down small batches at a time#14401

Merged
michaelklishin merged 2 commits intomainfrom
qq-drain-chunks
Aug 22, 2025
Merged

QQ: when invoking drain only shut down small batches at a time#14401
michaelklishin merged 2 commits intomainfrom
qq-drain-chunks

Conversation

@kjnilsson
Copy link
Copy Markdown
Contributor

@kjnilsson kjnilsson commented Aug 19, 2025

Then wait for elections to complete before shutting further
members down.

This should help avoid election storms when enabling maintenance
mode.

Transfer khepri before queues to ensure meta data store is
ready to accept pid updates.

Some other state related tweaks.

@kjnilsson kjnilsson marked this pull request as draft August 19, 2025 07:17
@kjnilsson kjnilsson requested a review from mkuratczyk August 19, 2025 07:17
@michaelklishin michaelklishin added this to the 4.2.0 milestone Aug 19, 2025
@kjnilsson kjnilsson force-pushed the qq-drain-chunks branch 2 times, most recently from f4bc9a2 to 16b7daa Compare August 20, 2025 16:08
Then wait for elections to complete before shutting further
members down.

This should help avoid election storms when enabling maintenance
mode.

Transfer khepri before queues to ensure meta data store is
ready to accept pid updates.

Some other state related tweaks.
@kjnilsson kjnilsson marked this pull request as ready for review August 21, 2025 08:19
@michaelklishin michaelklishin self-assigned this Aug 22, 2025
Copy link
Copy Markdown
Contributor

@mkuratczyk mkuratczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me but only if I increase the AWAIT_TIMEOUT in ra_log_segment_writer. I think we should increase that timeout as well, otherwise with enough QQs, node startup fails.

@michaelklishin michaelklishin merged commit 425a9d6 into main Aug 22, 2025
560 of 562 checks passed
@michaelklishin michaelklishin deleted the qq-drain-chunks branch August 22, 2025 22:19
@michaelklishin
Copy link
Copy Markdown
Collaborator

@mkuratczyk fair enough. What AWAIT_TIMEOUT value in ra_log_segment_writer did you use? Would doubling it to 60s be enough?

@michaelklishin
Copy link
Copy Markdown
Collaborator

For example, this PR doubles the aforementioned timeout rabbitmq/ra#556.

@michaelklishin
Copy link
Copy Markdown
Collaborator

@Mergifyio backport v4.1.x

@mergify
Copy link
Copy Markdown

mergify bot commented Aug 26, 2025

backport v4.1.x

❌ No backport have been created

Details
  • Backport to branch v4.1.x failed

GitHub error: Branch not found

michaelklishin added a commit that referenced this pull request Aug 26, 2025
This was referenced Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants