Skip to content

Avoid Keeper crash on shutdown (fix test_keeper_snapshot_on_exit)#44908

Merged
antonio2368 merged 1 commit intomasterfrom
fix-keeper-snapshot-on-exit
Jan 5, 2023
Merged

Avoid Keeper crash on shutdown (fix test_keeper_snapshot_on_exit)#44908
antonio2368 merged 1 commit intomasterfrom
fix-keeper-snapshot-on-exit

Conversation

@antonio2368
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Fix the issue found by test_keeper_snapshot_on_exit after #44881 was merged.

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jan 4, 2023
Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonio2368 thank you for the fix!

true /*is_local*/);
for (const auto & response : responses)
if (!responses_queue.push(response))
if (!responses_queue.push(response) && !responses_queue.isFinished())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to move this check out from loop?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question: is it possible that the queue is finished between the loop iterations? If so it's better to add:

if (responses_queue.isFinished())
    break;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address this and the comment above, the biggest problem is the NuRaft.
If I throw an exception from one of its threads, std::abort happens (that's what caused the problem), that lib doesn't handle exceptions nicely as CH. Assume all methods here are called by NuRaft and not CH.

The queue can be finished in between the loop iterations but it can also be finished between the checks:

if finished -> finished is false
  break;
if (!push(...)) -> finished is true, i.e we have same problem

I can add the check for break to not do unnecessary iterations as @novikd suggested, but because I would still need to keep the check after an unsuccessful push I wouldn't change much code-wise.

So I would like to merge this to fix CI ASAP and then polish it.

LOG_DEBUG(log, "Session ID response {} with timeout {}", session_id, session_id_request.session_timeout_ms);
response->session_id = session_id;
if (!responses_queue.push(response_for_session))
if (!responses_queue.push(response_for_session) && !responses_queue.isFinished())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to move this check to the beginning of the function?
Also what will be if exception will be thrown if queue is finished? Or at least some log message maybe?
@antonio2368 what do you think?

true /*is_local*/);
for (const auto & response : responses)
if (!responses_queue.push(response))
if (!responses_queue.push(response) && !responses_queue.isFinished())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question: is it possible that the queue is finished between the loop iterations? If so it's better to add:

if (responses_queue.isFinished())
    break;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants