Fix leak of SSL caches for auxiliary threads#4801
Merged
renecannao merged 1 commit intov2.7from Jan 24, 2025
Merged
Conversation
Whenever a thread creates MySQL connections using SSL params the thread-local caches are populated. These thread-local resources need to be free on thread exit via 'mysql_thread_end', otherwise these objects will be leak. This is not relevant for the fixed size thread-pools, but it's for workloads making use of temporary/auxiliary threads.
Contributor
|
Can one of the admins verify this patch? |
renecannao
added a commit
that referenced
this pull request
Feb 3, 2025
Fix leak of SSL caches for auxiliary threads - Port of #4801
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue Description
This PR fixes a memory leak that can be triggered when
SSLparameters are used for backend servers and the workload imposes the creation of auxiliary threads. Examples of this are:ProxySQLrequires to kill many backend connections; this spawns manykill_query_thread.Reproduction
kill_query_thread
There are several ways of achieving reproduction, one for
kill_query_threadand other for auxiliarymonitoring threads. For the former we require to have a backend server configured with SSL enabled and
having set
mysql-ssl_p2s_%config variables. The following test should be executed in a loop:For the leak to be noticeable, a high number of backend connections should be created, a sensible number for
the test is
100000. This also depends on the size of the SSL certificates in use.Monitor Auxiliary Threads
For making the leak noticeable (faster) in this scenario we need to create a decent number of auxiliary
threads for monitoring. The first step for this is configuring
ProxySQLwith a high number of backendservers and reduce the time for monitoring intervals to the minimum:
Even with this,
ProxySQLis too conservative in the creation of auxiliary threads to clearly notice theleak. So, we will slightly tweak
ProxySQLcode itself, to exaggerate thread creation and prevent reuse. Thefollowing patch will be sufficient:
Memory Analysis
This leak is more tricky to detect using memory analysis tools that focus on the heap, or even
jemallocprofiler itself. This is because the "lost references" are from thread-local variables that were lost on
exit threads. This prevents the profilers from tracing this memory as they would normally do with references
lost to heap-allocated resources.
This can be seeing in the two attached dumps. In
report-no-leak.pdfwe can see that whenjemallocfocus onin-usespace, it's unable to report any leaks, and it reports almost no memory usage, despite the processretaining several hundred
MBof (resident) memory:This is expected, as the references were lost on thread exit. In contrast, we need to focus on
alloc_space(total memory allocated) and how much of that memory
jemallocis able to trace back. When checkingreport-alloc.pdf, we can see thatma_tls_set_certsis responsible for an unusually large amount ofallocated memory (due to connection creation), and more importantly,
jemallochaven't traced back thatmemory to the allocator.
If we compare the previous reports with the new ones after the fix, we can see that
report-fix-inuse.pdfholds no difference, and in
report-fix-alloc.pdfis shown thatjemallocis now able to trace back thememory allocated for
ma_tls_set_certsback to the allocator. Processresidentmemory is now stable:The dumps an
ProxySQLbinary used for the analysis have been attached.report-no-leak.pdf
report-alloc.pdf
report-fix-alloc.pdf
report-fix-inuse.pdf
memory-dumps-01.tar.gz
memory-dumps-00.tar.gz