[percona-xtrabackup] prevent High memory usage for no reason (IO_CLOSE)#1724
Conversation
|
Hi, why did you close this PR? |
|
Hi @sduehr, |
|
Sure, I've set the on-hold label. We have a systemtest for the plugin, see The systemtest is probably too specific to be used outside of the build environment, but Thanks again, waiting for your notification start review. Regards, |
3382006 to
8171df0
Compare
|
Hi @sduehr, is there any way to rerun tests? |
8171df0 to
2e34195
Compare
|
Thanks, build and tests are automatically triggered when new commits are pushed. |
2e34195 to
051ac71
Compare
|
I rewrote the commit message and added details and explanations about the importance of the changes. |
sduehr
left a comment
There was a problem hiding this comment.
Did you consider trying self.stream.terminate() first, and only if that fails use self.stream.kill()?
core/src/plugins/filed/python/percona-xtrabackup/bareos-fd-percona-xtrabackup.py
Outdated
Show resolved
Hide resolved
core/src/plugins/filed/python/percona-xtrabackup/bareos-fd-percona-xtrabackup.py
Outdated
Show resolved
Hide resolved
Good idea. Honestly, I wrote a separate function for the dump process termination with timeouts and TERM signal instead of killing it straightaway. |
6a2b699 to
b9d2b5f
Compare
|
Thanks, normally an additional commit is ok. But in this case it's only one file, so force push is ok, too. |
|
Thanks, looks good so far. |
sduehr
left a comment
There was a problem hiding this comment.
Please check my suggestions. Are you able to reproduce the case where end_dumper() would go in the second timeout, where the process didn't terminate even after sending SIGTERM?
core/src/plugins/filed/python/percona-xtrabackup/bareos-fd-percona-xtrabackup.py
Outdated
Show resolved
Hide resolved
core/src/plugins/filed/python/percona-xtrabackup/bareos-fd-percona-xtrabackup.py
Show resolved
Hide resolved
core/src/plugins/filed/python/percona-xtrabackup/bareos-fd-percona-xtrabackup.py
Outdated
Show resolved
Hide resolved
core/src/plugins/filed/python/percona-xtrabackup/bareos-fd-percona-xtrabackup.py
Outdated
Show resolved
Hide resolved
I'm not sure that is 100% reproducible on my side. I saw an empty return code straight after sending SIGTERM which is why I put an additional 30 seconds waiting for the correct termination. I agree that that is possible not to wait 2nd 30 sec and jump to the else block with setting 9 to the stream's return code, but prefer to wait a bit more seconds. |
Fix non-terminating subprocess behaviour in case of bareos-storage unavailable during a Backup job. This commit terminates the subprocess in case bareos-storage is unavailable for some reason during a Backup job. This is a bug fix because the previous behaviour causes high memory usage that may cause OOM in some cases. Move the logic of terminating the xtrabackup(dumper) subprocess in case of IO_CLOSE to separate function end_dumper(). Add timeouts and change the kill signal to terminate. Add convenient debug and job messages.
Code formatted using black and corrected copyright yeyr to comply with our coding guidelines.
Co-authored-by: sduehr <stephan.duehr@bareos.com>
Agree, however, I'm not sure that the 9 exit code is the right one here. I've never been there or seen such a scenario, but potentially it may happen. Co-authored-by: sduehr <stephan.duehr@bareos.com>
Co-authored-by: sduehr <stephan.duehr@bareos.com>
3c72f85 to
c712412
Compare
|
OK, thanks. Yes, setting 9 to the stream's return isn't perfect, but ok for now. |
In the case of IO_CLOSE inside the plugin, we don't need to wait for the subprocess to finish, because it trying to get the data from xtrabackup, but the destination for such data to backup is unavailable.
It causes high memory usage and OOM in some cases.
Please check
If you have any questions or problems, please give a comment in the PR.
Helpful documentation and best practices
Checklist for the reviewer of the PR (will be processed by the Bareos team)
Make sure you check/merge the PR using
devtools/pr-toolto have some simple automated checks run and a proper changelog record added.General
Source code quality