Skip to content

Fix halt on 'PROXYSQL RESUME' command#4757

Merged
renecannao merged 3 commits intov2.7from
v2.7-fix_hang_on_resume
Nov 26, 2024
Merged

Fix halt on 'PROXYSQL RESUME' command#4757
renecannao merged 3 commits intov2.7from
v2.7-fix_hang_on_resume

Conversation

@JavierJF
Copy link
Collaborator

Issue Description

During a RESUME operation, if a 'MySQL_Thread' is bootstrapping listeners (in MySQL_Thread::run_BootstrapListener) and detect that MySQL_Threads_Handler::bootstrapping_listeners is 'false', the thread prematurely shuts down its own bootstrapping flag from 'mypolls' (ProxySQL_Poll::bootstrapping_listeners). Since this thread wont ever bootstrap its corresponding listening sockets, the other working threads will be stalled waiting on it, eventually triggering the watchdog and crashing the instance.

Solution

Simplified the logic using a unique flag for bootstrapping (MySQL_Threads_Handler::bootstrapping_listeners) and introduced a sensible delay to reduce the potential overhead of the worker threads busy-waiting for its time to start their listening sockets, as well as the counterpart overhead on the Admin thread while performing the PAUSE operation.

During a RESUME operation, if a 'MySQL_Thread' is bootstrapping
listeners (in `MySQL_Thread::run_BootstrapListener`) and detect that
`MySQL_Threads_Handler::bootstrapping_listeners` is 'false', the thread
prematurely shuts down its own bootstrapping flag from 'mypolls'
(`ProxySQL_Poll::bootstrapping_listeners`). Since this thread wont ever
bootstrap its corresponding listening sockets, the other working threads
will be stalled waiting on it, eventually triggering the watchdog and
crashing the instance.

Since the bootstrapping operation is sequential, it's expected that all
the threads but the one starting their listening sockets are in an
active wait. A sensible delay has been introduced to reduce the overhead
of such wait.
Since the operation of stopping each worker thread listeners is
performed during 'maintenance_loops', the active wait taking place in
'listener_del' is likely to take some time. A sensible delay has been
added to reduce unneccesary load.
This is a follow-up of commit #19c8f8698
@renecannao renecannao merged commit 7c64542 into v2.7 Nov 26, 2024
renecannao added a commit that referenced this pull request Nov 26, 2024
 Fix halt on 'PROXYSQL RESUME' command - Port of #4757
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants