Bug #57165
closedexpected valgrind issues and found none
0%
Description
2022-08-16T16:25:31.998 INFO:tasks.ceph:Checking for errors in any valgrind logs...
2022-08-16T16:25:31.999 DEBUG:teuthology.orchestra.run.smithi202:> sudo zgrep '<kind>' /var/log/ceph/valgrind/* /dev/null | sort | uniq
2022-08-16T16:25:32.087 INFO:tasks.ceph:Archiving crash dumps...
2022-08-16T16:25:32.089 DEBUG:teuthology.misc:Transferring archived files from smithi202:/var/lib/ceph/crash to /home/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390/remote/smithi202/crash
2022-08-16T16:25:32.091 DEBUG:teuthology.orchestra.run.smithi202:> sudo tar cz -f - -C /var/lib/ceph/crash -- .
2022-08-16T16:25:32.121 INFO:tasks.ceph:Compressing logs...
2022-08-16T16:25:32.122 DEBUG:teuthology.orchestra.run.smithi202:> sudo find /var/log/ceph -name '*.log' -print0 | sudo xargs -0 --no-run-if-empty -- gzip --
2022-08-16T16:25:32.483 INFO:tasks.ceph:Archiving logs...
2022-08-16T16:25:32.484 DEBUG:teuthology.misc:Transferring archived files from smithi202:/var/log/ceph to /home/teuthworker/archive/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390/remote/smithi202/log
2022-08-16T16:25:32.487 DEBUG:teuthology.orchestra.run.smithi202:> sudo tar cz -f - -C /var/log/ceph -- .
2022-08-16T16:25:32.639 ERROR:teuthology.run_tasks:Manager failed: ceph
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/run_tasks.py", line 188, in run_tasks
suppress = manager.__exit__(*exc_info)
File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 1922, in task
check_status=False,
File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
next(self.gen)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/contextutil.py", line 55, in nested
raise exc[1]
File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 251, in ceph_log
yield
File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/contextutil.py", line 47, in nested
if exit(*exc):
File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_53c6490a385b0352f40139eaaa2c2f607e42701d/qa/tasks/ceph.py", line 359, in valgrind_post
raise Exception('expected valgrind issues and found none')
/a/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390
Updated by Matan Breizman over 3 years ago
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973682
Updated by Laura Flores over 3 years ago
/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197
Updated by Laura Flores over 3 years ago
To me, this seems like a Teuthology failure. Perhaps Zack Cerza can rule this theory in/out.
In any case, it looks like the mgr is failing due to:
/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197/remote/smithi125/ceph-mgr.x.log.gz
2022-08-22T21:31:15.511+0000 7f1a27f90700 -1 received signal: Terminated from /usr/bin/python3 /bin/daemon-helper term ceph-mgr -f --cluster ceph -i x (PID: 41266) UID: 0
2022-08-22T21:31:15.511+0000 7f1a27f90700 -1 mgr handle_mgr_signal *** Got signal Terminated ***
Updated by Zack Cerza over 3 years ago
What I'm seeing is that the jobs in question were told to expect valgrind errors via the expect_valgrind_errors: true item in their job configs. They didn't find any, so they failed the job. Here's me doing a simpler version of what the ceph task does:
$ d=/a/yuriw-2022-08-16_15:48:32-rados-wip-yuri4-testing-2022-08-15-0951-distro-default-smithi/6975390 $ zgrep -l 'kind' $d/remote/*/log/valgrind/* $ d=/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973889 $ zgrep -l 'kind' $d/remote/*/log/valgrind/* $ d=/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973682 $ zgrep -l 'kind' $d/remote/*/log/valgrind/* $ d=/a/yuriw-2022-08-22_20:21:58-rados-wip-yuri11-testing-2022-08-22-1005-distro-default-smithi/6986197 $ zgrep -l 'kind' $d/remote/*/log/valgrind/*
Updated by Radoslaw Zarzynski over 3 years ago
- Assignee set to Nitzan Mordechai
- Priority changed from Normal to High
Bumped the priority up as I'm afraid the longer we wait with ensuring valgrind is fully operational, the greater is the risk we'll be bombed with leak reports after restoring it.
Updated by Nitzan Mordechai over 3 years ago
we are leaking moemory with "ceph tell mon.a leak_some_memory" for some reason we are not seeing any memory leak in valgrind logs.
i checked with and without tcmalloc - both not showing any memory leak.
i removed the valgrind.supp file, i couldn't spot the memory leak as well. something causing that leak to disappear
Updated by Nitzan Mordechai over 3 years ago
- Status changed from New to In Progress
Updated by Nitzan Mordechai over 3 years ago
This is a memory optimization "fault" - the new gcc causing that to not leak the memory that we are trying to leak.
Updated by Nitzan Mordechai over 3 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 47802
Updated by Laura Flores over 3 years ago
/a/yuriw-2022-08-17_19:34:54-rados-wip-yuri7-testing-2022-08-17-0943-quincy-distro-default-smithi/6977767
Updated by Kefu Chai over 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot over 3 years ago
- Copied to Backport #57346: quincy: expected valgrind issues and found none added
Updated by Yaarit Hatuka over 3 years ago
Updated by Nitzan Mordechai almost 3 years ago
- Status changed from Pending Backport to Resolved
Updated by Radoslaw Zarzynski over 2 years ago
- Related to Bug #63501: ceph::common::leak_some_memory() got interpreted as an actual leak added
Updated by Upkeep Bot 8 months ago
- Merge Commit set to 0daef7c82ec1b4e4ffc4ea74806fbf7e7adf54f2
- Fixed In set to v17.0.0-14667-g0daef7c82ec
- Released In set to v18.2.0~1451
- Upkeep Timestamp set to 2025-07-13T06:11:22+00:00