Bug #59196
closedceph_test_lazy_omap_stats segfault while waiting for active+clean
0%
Description
2023-03-11T08:23:47.545 DEBUG:teuthology.orchestra.run.smithi005:> sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph_test_lazy_omap_stats
2023-03-11T08:23:48.487 INFO:teuthology.orchestra.run.smithi005.stdout:pool 'lazy_omap_test_pool' created
2023-03-11T08:23:48.489 INFO:teuthology.orchestra.run.smithi005.stdout:Querying pool id
2023-03-11T08:23:48.492 INFO:teuthology.orchestra.run.smithi005.stdout:Found pool ID: 2
2023-03-11T08:23:48.496 INFO:teuthology.orchestra.run.smithi005.stdout:Created payload with 2000 keys of 445 bytes each. Total size in bytes = 890000
2023-03-11T08:23:48.496 INFO:teuthology.orchestra.run.smithi005.stdout:Waiting for active+clean
2023-03-11T08:23:48.513 DEBUG:teuthology.orchestra.run:got remote process result: None
2023-03-11T08:23:48.513 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_a6a4a7f010ae6b3f7fc2aef91377d4a6bee6de40/teuthology/run_tasks.py", line 103, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/git.ceph.com_teuthology_a6a4a7f010ae6b3f7fc2aef91377d4a6bee6de40/teuthology/run_tasks.py", line 82, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_a6a4a7f010ae6b3f7fc2aef91377d4a6bee6de40/teuthology/task/exec.py", line 66, in task
remote.run(
File "/home/teuthworker/src/git.ceph.com_teuthology_a6a4a7f010ae6b3f7fc2aef91377d4a6bee6de40/teuthology/orchestra/remote.py", line 525, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_a6a4a7f010ae6b3f7fc2aef91377d4a6bee6de40/teuthology/orchestra/run.py", line 455, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_a6a4a7f010ae6b3f7fc2aef91377d4a6bee6de40/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_a6a4a7f010ae6b3f7fc2aef91377d4a6bee6de40/teuthology/orchestra/run.py", line 179, in _raise_for_status
raise CommandCrashedError(command=self.command)
teuthology.exceptions.CommandCrashedError: Command crashed: 'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph_test_lazy_omap_stats'
2023-03-11T08:23:48.594 ERROR:teuthology.run_tasks: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=bc132455da90423caddad14e0a097e30
Found this in system kernel and journalctl logs.
2023-03-11T08:23:48.510950+00:00 smithi005 kernel: [ 641.041577] ceph_test_lazy_[34021]: segfault at 7ffda9628fd8 ip 0000563670ee5b19 sp 00007ffda9628fe0 error 6 in ceph_test_lazy_omap_stats[563670ec7000+21000]
Updated by Brad Hubbard almost 3 years ago
- Tags set to test-failure
Note that this tracker was originally #59058 until it was accidentally deleted by myself.
Below is a summary of the comments in that previous tracker.
Issue #59058 has been updated by Brad Hubbard.
Reproduced this and I suspect this only happens on Jammy as it has only been
seen once and on that distro that we have only started testing with.
It looks like a stack overflow due to unbounded recursion in std::regex code
which has precedents. I may be able to get around it by massaging the regular
expression being used, we'll see after some more testing.
Issue #59058 has been updated by Brad Hubbard.
We may be dealing with something similar to
https://tracker.ceph.com/issues/55304 here. I can not reproduce this issue on
the latest Jammy container image and if I upload the version of
ceph_test_lazy_omap_stats, that I built and successfully ran on my local
container, to the smithi machine failing the test it runs without segfaulting
whereas the version of ceph_test_lazy_omap_stats that was installed for the test
does segfault when run manually.
I see some difference in symbols when I compare the output of 'nm' and that led
me to compare the versions of gcc they were compiled with.
root@smithi026:/home/ubuntu# strings ./ceph_test_lazy_omap_stats|grep "GCC: ("
GCC: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
root@smithi026:/home/ubuntu# strings /usr/lib/debug/.build-id/08/de203b7b3fa0b5080173750be0a7b2576335d9.debug|grep "GCC: ("
GCC: (Ubuntu 11.2.0-19ubuntu1) 11.2.0
So the binary that works was created with gcc 11.3.0 and the version that fails
with 11.2.0. Next step is to see if I can set up a Jammy system with 11.2.0
installed and build ceph_test_lazy_omap_stats to see if doing that will
reproduce the issue.
Issue #59058 has been updated by Brad Hubbard.
Compiling with 11.2.0 failed to reproduce but in comparing the failing binary
to the one that succeeds under the debugger I found what appears to be the
cause of the issue at a low level (with the higher level cause still a mystery
but most likely some sort of issue in the build environment or build vs.
runtime environment).
The last code before we enter the std code are these two lines.
If I place a breakpoint on the last line I get the following discrepancy.
Success case:
(gdb) whatis match
type = std::__cxx11::smatch
(gdb) p match
$1 = {<std::vector<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >> = std::vector of length 0, capacity 0, _M_begin = non-dereferenceable iterator for std::vector}
Fail case:
(gdb) whatis match
type = std::__cxx11::smatch
(gdb) p match
$1 = {<std::vector<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >> = std::vector of length 172889574, capacity -1954508256483 = {
<error reading variable: Cannot access memory at address 0x7fff00000010>
So in the failure case there appears to be an issue with the just-initialised
smatch variable. Continuing to look at this.
Issue #59058 has been updated by Laura Flores.
Tags set to test-failure
/a/lflores-2023-03-27_02:17:31-rados-wip-aclamk-bs-elastic-shared-blob-save-25.03.2023-a-distro-default-smithi/7220933
/a/lflores-2023-03-27_02:17:31-rados-wip-aclamk-bs-elastic-shared-blob-save-25.03.2023-a-distro-default-smithi/7221086
Issue #59058 has been updated by Laura Flores.
/a/yuriw-2023-03-27_23:05:54-rados-wip-yuri4-testing-2023-03-25-0714-distro-default-smithi/7222036
Issue #59058 has been updated by Laura Flores.
This is happening quite frequently in the rados suite. It certainly points to a recent regression.
Updated by Radoslaw Zarzynski almost 3 years ago
- Status changed from New to In Progress
Looks the problem is under investigation. Please correct me if I'm wrong.
Updated by Laura Flores almost 3 years ago
Yes Radek, it is being investigated by Brad.
/a/yuriw-2023-03-27_23:05:54-rados-wip-yuri4-testing-2023-03-25-0714-distro-default-smithi/7222036
Updated by Laura Flores almost 3 years ago
/a/yuriw-2023-03-30_21:53:20-rados-wip-yuri7-testing-2023-03-29-1100-distro-default-smithi/7227904
Updated by Laura Flores almost 3 years ago
/a/yuriw-2023-04-04_15:24:40-rados-wip-yuri4-testing-2023-03-31-1237-distro-default-smithi/7231452
Updated by Laura Flores almost 3 years ago
/a/yuriw-2023-03-30_21:29:24-rados-wip-yuri2-testing-2023-03-30-0826-distro-default-smithi/7227539
Updated by Laura Flores almost 3 years ago
/a/lflores-2023-04-07_22:22:04-rados-wip-yuri4-testing-2023-04-07-1825-distro-default-smithi/7235344
Updated by Sridhar Seshasayee almost 3 years ago
/a/sseshasa-2023-05-02_03:12:27-rados-wip-sseshasa3-testing-2023-05-01-2154-distro-default-smithi/7260300
journalctl-b0.gz:May 02 04:26:30 smithi175 sudo[34509]: ubuntu : PWD=/home/ubuntu ; USER=root ; ENV=TESTDIR=/home/ubuntu/cephtest ; COMMAND=/usr/bin/bash -c ceph_test_lazy_omap_stats journalctl-b0.gz:May 02 04:26:31 smithi175 kernel: ceph_test_lazy_[34510]: segfault at 7ffeedae1ff8 ip 00005567a18b6549 sp 00007ffeedae1f30 error 6 in ceph_test_lazy_omap_stats[5567a1898000+21000] kern.log.gz:2023-05-02T04:26:31.788640+00:00 smithi175 kernel: [ 640.526218] ceph_test_lazy_[34510]: segfault at 7ffeedae1ff8 ip 00005567a18b6549 sp 00007ffeedae1f30 error 6 in ceph_test_lazy_omap_stats[5567a1898000+21000]
Updated by Radoslaw Zarzynski almost 3 years ago
Let's check whether this reproduces in Reef too. If so, then... there is no OMAP without RocksDB and we upgraded it recently...
Updated by Brad Hubbard almost 3 years ago
Radoslaw Zarzynski wrote:
Let's check whether this reproduces in Reef too. If so, then... there is no OMAP without RocksDB and we upgraded it recently...
Hey Radek,
To me it's more significant that every instance above was seen on VERSION="22.04.1 LTS (Jammy Jellyfish)" and I think this has something to do with the way we are building for Jammy. I think somehow we are exposing some sort of library mismatch, or something similar. I need to try and reproduce the build environment to test this theory I guess, which I may need some help with.
Updated by Laura Flores almost 3 years ago
/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271184
So far, no Reef sightings.
Updated by Brad Hubbard almost 3 years ago
Laura Flores wrote:
/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271184
So far, no Reef sightings.
And Jammy yet again.
Updated by Laura Flores almost 3 years ago
- Backport set to reef
/a/lflores-2023-05-22_16:08:13-rados-wip-yuri6-testing-2023-05-19-1351-reef-distro-default-smithi/7282703
Was already in Reef as far back as March 11 (/a/yuriw-2023-03-10_22:46:37-rados-reef-distro-default-smithi/7203287), so this test batch is not introducing the bug to Reef.
Updated by Laura Flores almost 3 years ago
/a/yuriw-2023-05-24_14:33:21-rados-wip-yuri6-testing-2023-05-23-0757-reef-distro-default-smithi/7285192
Updated by Radoslaw Zarzynski almost 3 years ago
The RocksDB upgrade PR has been merged on 1st March.
Updated by Radoslaw Zarzynski almost 3 years ago
Brad, let's sync talk about that in DS meeting.
Updated by Laura Flores over 2 years ago
/a/yuriw-2023-06-22_20:29:56-rados-wip-yuri3-testing-2023-06-22-0812-reef-distro-default-smithi/7313235
Updated by Matan Breizman over 2 years ago
/a/yuriw-2023-08-22_18:16:03-rados-wip-yuri10-testing-2023-08-17-1444-distro-default-smithi/7376687
Updated by Radoslaw Zarzynski over 2 years ago
This time it's CentOS!
rzarzynski@teuthology:/a/yuriw-2023-08-22_18:16:03-rados-wip-yuri10-testing-2023-08-17-1444-distro-default-smithi/7376687$ less teuthology.log ... 2023-08-22T21:37:19.930 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=centos%2F9%2Fx86_64&ref=wip-yuri10-testing-2023-08-17-1444 2023-08-22T21:37:20.152 INFO:teuthology.task.internal:Found packages for ceph version 18.0.0-5573.gf0ed7046
Updated by Brad Hubbard over 2 years ago
Taking a fresh look at this, thanks Radek.
Updated by Laura Flores over 2 years ago
/a/yuriw-2023-08-15_18:58:56-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7369175
Updated by Matan Breizman over 2 years ago
/a/yuriw-2023-10-11_14:08:36-rados-wip-yuri11-testing-2023-10-10-1226-reef-distro-default-smithi/7421542/
/a/yuriw-2023-10-11_14:08:36-rados-wip-yuri11-testing-2023-10-10-1226-reef-distro-default-smithi/7421695/
Updated by Nitzan Mordechai over 2 years ago
/a/yuriw-2023-10-16_14:44:27-rados-wip-yuri10-testing-2023-10-11-0812-distro-default-smithi/7429668
/a/yuriw-2023-10-16_14:44:27-rados-wip-yuri10-testing-2023-10-11-0812-distro-default-smithi/7429845
/a/yuriw-2023-10-16_14:44:27-rados-wip-yuri10-testing-2023-10-11-0812-distro-default-smithi/7429846
Updated by Laura Flores over 2 years ago
/a/yuriw-2023-10-24_00:11:54-rados-wip-yuri4-testing-2023-10-23-0903-distro-default-smithi/7435549
Updated by Nitzan Mordechai over 2 years ago
/a/yuriw-2023-10-30_15:34:36-rados-wip-yuri10-testing-2023-10-27-0804-distro-default-smithi/7441096
/a/yuriw-2023-10-30_15:34:36-rados-wip-yuri10-testing-2023-10-27-0804-distro-default-smithi/7441250
Updated by Laura Flores over 2 years ago
/a/yuriw-2023-10-31_14:43:48-rados-wip-yuri4-testing-2023-10-30-1117-distro-default-smithi/7442155
Updated by Laura Flores over 2 years ago
/a/yuriw-2023-11-02_14:20:05-rados-wip-yuri6-testing-2023-11-01-0745-reef-distro-default-smithi/7444597
Updated by Laura Flores over 2 years ago
/a/yuriw-2023-11-05_15:32:58-rados-reef-release-distro-default-smithi/7448518
Updated by Nitzan Mordechai over 2 years ago
/a/yuriw-2023-12-07_16:37:24-rados-wip-yuri8-testing-2023-12-06-1425-distro-default-smithi/7482188
/a/yuriw-2023-12-07_16:37:24-rados-wip-yuri8-testing-2023-12-06-1425-distro-default-smithi/7482168
Updated by Matan Breizman about 2 years ago
/a/yuriw-2023-12-26_16:10:01-rados-wip-yuri3-testing-2023-12-19-1211-distro-default-smithi/7501415
Updated by Aishwarya Mathuria about 2 years ago
/a/yuriw-2024-01-03_16:19:00-rados-wip-yuri6-testing-2024-01-02-0832-distro-default-smithi/7505560/
/a/yuriw-2024-01-03_16:19:00-rados-wip-yuri6-testing-2024-01-02-0832-distro-default-smithi/7505716/
Updated by Nitzan Mordechai about 2 years ago
/a/yuriw-2024-01-18_15:10:37-rados-wip-yuri3-testing-2024-01-17-0753-distro-default-smithi/7520620
/a/yuriw-2024-01-18_15:10:37-rados-wip-yuri3-testing-2024-01-17-0753-distro-default-smithi/7520463
Updated by Kamoltat (Junior) Sirivadhna about 2 years ago
/a/yuriw-2024-01-31_19:20:14-rados-wip-yuri3-testing-2024-01-29-1434-distro-default-smithi/7540671
Updated by Laura Flores about 2 years ago
/a/yuriw-2024-02-05_19:32:33-rados-wip-yuri4-testing-2024-02-05-0849-distro-default-smithi/7547525
Updated by Matan Breizman about 2 years ago
/a/yuriw-2024-02-09_00:15:46-rados-wip-yuri2-testing-2024-02-08-0727-distro-default-smithi/7553332
/a/yuriw-2024-02-09_00:15:46-rados-wip-yuri2-testing-2024-02-08-0727-distro-default-smithi/7553494
Updated by Radoslaw Zarzynski about 2 years ago
- Assignee changed from Brad Hubbard to Nitzan Mordechai
Updated by Aishwarya Mathuria about 2 years ago
/a/yuriw-2024-02-13_15:50:02-rados-wip-yuri2-testing-2024-02-12-0808-reef-distro-default-smithi/7558347/
Updated by Laura Flores about 2 years ago
/a/lflores-2024-02-13_16:18:32-rados-wip-yuri5-testing-2024-02-12-1152-distro-default-smithi/7558507
Updated by Aishwarya Mathuria about 2 years ago
/a/yuriw-2024-02-14_14:58:57-rados-wip-yuri4-testing-2024-02-13-1546-distro-default-smithi/7560007/
Updated by Nitzan Mordechai about 2 years ago
- Status changed from In Progress to Fix Under Review
Updated by Laura Flores about 2 years ago
/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576306
Updated by Radoslaw Zarzynski about 2 years ago
note from scrub: the PR is approved. Needs-qa.
Updated by Sridhar Seshasayee about 2 years ago
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587684
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587943
ceph_test_lazy_omap_stats still appears to crash with the fix at a later point when processing the "pg dump" output.
Logs:
2024-03-10T00:36:58.974 DEBUG:teuthology.orchestra.run.smithi005:> sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph_test_lazy_omap_stats
2024-03-10T00:36:59.122 INFO:teuthology.orchestra.run.smithi005.stdout:pool 'lazy_omap_test_pool' created
2024-03-10T00:36:59.126 INFO:teuthology.orchestra.run.smithi005.stdout:Querying pool id
2024-03-10T00:36:59.128 INFO:teuthology.orchestra.run.smithi005.stdout:Found pool ID: 2
2024-03-10T00:36:59.131 INFO:teuthology.orchestra.run.smithi005.stdout:Created payload with 2000 keys of 445 bytes each. Total size in bytes = 890000
2024-03-10T00:36:59.132 INFO:teuthology.orchestra.run.smithi005.stdout:Waiting for active+clean
2024-03-10T00:36:59.384 INFO:teuthology.orchestra.run.smithi005.stdout:.
2024-03-10T00:37:00.168 INFO:teuthology.orchestra.run.smithi005.stdout:Wrote 2000 omap keys of 445 bytes to the 69650377-ca6a-4d76-9ed0-b8232baf4954 object
2024-03-10T00:37:00.168 INFO:teuthology.orchestra.run.smithi005.stdout:Scrubbing
2024-03-10T00:37:00.168 INFO:teuthology.orchestra.run.smithi005.stdout:Before scrub stamps:
2024-03-10T00:37:00.170 INFO:teuthology.orchestra.run.smithi005.stdout:dumped all
2024-03-10T00:37:00.189 INFO:teuthology.orchestra.run.smithi005.stdout:pg = 1.0 stamp = 2024-03-10T00:36:54.112470+0000
2024-03-10T00:37:00.189 INFO:teuthology.orchestra.run.smithi005.stdout:pg = 1.1 stamp = 2024-03-10T00:36:54.112470+0000
2024-03-10T00:37:00.189 INFO:teuthology.orchestra.run.smithi005.stdout:pg = 1.2 stamp = 2024-03-10T00:36:54.112470+0000
2024-03-10T00:37:00.189 INFO:teuthology.orchestra.run.smithi005.stdout:pg = 1.3 stamp = 2024-03-10T00:36:54.112470+0000
...
2024-03-10T00:37:25.609 INFO:teuthology.orchestra.run.smithi005.stdout:Scrubbing complete
2024-03-10T00:37:25.610 INFO:teuthology.orchestra.run.smithi005.stdout:dumped all
2024-03-10T00:37:25.610 INFO:teuthology.orchestra.run.smithi005.stdout:version 29
2024-03-10T00:37:25.610 INFO:teuthology.orchestra.run.smithi005.stdout:stamp 2024-03-10T00:37:25.107820+0000
2024-03-10T00:37:25.610 INFO:teuthology.orchestra.run.smithi005.stdout:last_osdmap_epoch 0
2024-03-10T00:37:25.610 INFO:teuthology.orchestra.run.smithi005.stdout:last_pg_scan 0
2024-03-10T00:37:25.610 INFO:teuthology.orchestra.run.smithi005.stdout:PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN LAST_SCRUB_DURATION SCRUB_SCHEDULING OBJECTS_SCRUBBED OBJECTS_TRIMMED
2024-03-10T00:37:25.611 INFO:teuthology.orchestra.run.smithi005.stdout:2.1f 0 0 0 0 0 0 0 0 0 0 0 active+clean 2024-03-10T00:37:17.212953+0000 0'0 15:22 [0,1,2] 0 [0,1,2] 0 0'0 2024-03-10T00:37:17.212856+0000 0'0 2024-03-10T00:37:17.212856+0000 0 0 periodic scrub scheduled @ 2024-03-11T01:30:12.432348+0000 0 0
...
2024-03-10T00:37:25.614 INFO:teuthology.orchestra.run.smithi005.stdout:2 1 0 0 0 0 0 890000 2000 2 2
2024-03-10T00:37:25.614 INFO:teuthology.orchestra.run.smithi005.stdout:1 0 0 0 0 0 0 0 0 0 0
2024-03-10T00:37:25.614 INFO:teuthology.orchestra.run.smithi005.stdout:
2024-03-10T00:37:25.615 INFO:teuthology.orchestra.run.smithi005.stdout:sum 1 0 0 0 0 0 890000 2000 2 2
2024-03-10T00:37:25.615 INFO:teuthology.orchestra.run.smithi005.stdout:OSD_STAT USED AVAIL USED_RAW TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM
2024-03-10T00:37:25.615 INFO:teuthology.orchestra.run.smithi005.stdout:2 27 MiB 100 GiB 27 MiB 100 GiB [0,1] 36 11
2024-03-10T00:37:25.615 INFO:teuthology.orchestra.run.smithi005.stdout:1 27 MiB 100 GiB 27 MiB 100 GiB [0,2] 38 16
2024-03-10T00:37:25.615 INFO:teuthology.orchestra.run.smithi005.stdout:0 27 MiB 100 GiB 27 MiB 100 GiB [1,2] 38 13
2024-03-10T00:37:25.615 INFO:teuthology.orchestra.run.smithi005.stdout:sum 80 MiB 300 GiB 80 MiB 300 GiB
2024-03-10T00:37:25.615 INFO:teuthology.orchestra.run.smithi005.stdout:
2024-03-10T00:37:25.615 INFO:teuthology.orchestra.run.smithi005.stdout:* NOTE: Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details.
2024-03-10T00:37:25.615 INFO:teuthology.orchestra.run.smithi005.stdout:
2024-03-10T00:37:25.792 DEBUG:teuthology.orchestra.run:got remote process result: None
2024-03-10T00:37:25.793 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/run_tasks.py", line 105, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/task/exec.py", line 66, in task
remote.run(
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/remote.py", line 523, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/run.py", line 455, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/run.py", line 179, in _raise_for_status
raise CommandCrashedError(command=self.command)
teuthology.exceptions.CommandCrashedError: Command crashed: 'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph_test_lazy_omap_stats'
2024-03-10T00:37:26.000 ERROR:teuthology.util.sentry: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=b4490d53d0074f1ea4e0a94a7cf24187
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/run_tasks.py", line 105, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/task/exec.py", line 66, in task
remote.run(
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/remote.py", line 523, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/run.py", line 455, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_e691533f9cbb33d85b2187bba20d7102f098636d/teuthology/orchestra/run.py", line 179, in _raise_for_status
raise CommandCrashedError(command=self.command)
teuthology.exceptions.CommandCrashedError: Command crashed: 'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph_test_lazy_omap_stats'
2024-03-10T00:37:26.002 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2024-03-10T00:37:26.011 INFO:tasks.ceph.ceph_manager.ceph:waiting for clean
Updated by Radoslaw Zarzynski about 2 years ago
The fix isn't merged yet which could explain the reoccurrence above
Updated by Sridhar Seshasayee about 2 years ago
Radoslaw Zarzynski wrote:
The fix isn't merged yet which could explain the reoccurrence above
The run mentioned in #note-47 above includes the associated PR for testing. The fix apparently worked but failed down the line at some other point.
Updated by Nitzan Mordechai about 2 years ago
according to the console logs:
[ 473.104619] ceph_test_lazy_[35269]: segfault at 7fff643adff8 ip 0000558a2c9a3953 sp 00007fff643adf20 error 6 in ceph_test_lazy_omap_stats[558a2c987000+20000] likely on CPU 7 (core 3, socket 0)
we still getting segfault somehow, checking
Updated by Nitzan Mordechai about 2 years ago
now the segfault happens on check_one function where we also have pre-regex to truncate the output that causing segfault. i fixed it as well and pushed to exist PR
Updated by Brad Hubbard about 2 years ago
Nitzan Mordechai wrote:
now the segfault happens on check_one function where we also have pre-regex to truncate the output that causing segfault. i fixed it as well and pushed to exist PR
Explained this in the PR and resubmitted for testing since needs_qa was removed due to the test failures. This is probably my fault as I should have picked that up during my review. Hopefully we are not going to see too many more of these.
Updated by Nitzan Mordechai about 2 years ago
Brad Hubbard wrote:
Nitzan Mordechai wrote:
now the segfault happens on check_one function where we also have pre-regex to truncate the output that causing segfault. i fixed it as well and pushed to exist PR
Explained this in the PR and resubmitted for testing since needs_qa was removed due to the test failures. This is probably my fault as I should have picked that up during my review. Hopefully we are not going to see too many more of these.
Thanks for bringing this up. I saw you already added the PR note and tag for need-qa, thanks!
Updated by Aishwarya Mathuria about 2 years ago
/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609959
Updated by Brad Hubbard about 2 years ago
Looking at the above crash which is referred to in https://github.com/ceph/ceph/pull/55596#issuecomment-2011798771
#0 0x000055555557a081 in std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_M_handle_match (
__match_mode=std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_Match_mode::_Prefix, __i=11,
this=0x7fffffffdcd0) at /usr/include/c++/11/bits/regex_executor.tcc:326
...
#72779 std::regex_search<std::char_traits<char>, std::allocator<char>, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, char, std::__cxx11::regex_traits<char> > (__s="version 843\nstamp 2024-03-22T00:37:58.305948+0000\nlast_osdmap_epoch 0\nlast_pg_scan 0\nPG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS "..., __s="version 843\nstamp 2024-03-22T00:37:58.305948+0000\nlast_osdmap_epoch 0\nlast_pg_scan 0\nPG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS "..., __f=0, __e=..., __m=...) at /usr/include/c++/11/bits/regex.h:2445
#72780 LazyOmapStatsTest::check_one (this=0x7fffffffe460) at ./src/test/lazy-omap-stats/lazy_omap_stats_test.cc:305
#72781 0x000055555556d240 in LazyOmapStatsTest::run (this=0x7fffffffe460, argc=<optimized out>, argv=<optimized out>) at ./src/test/lazy-omap-stats/lazy_omap_stats_test.cc:602
#72782 0x0000555555560233 in main (argc=1, argv=0x7fffffffe638) at ./src/test/lazy-omap-stats/main.cc:20
(gdb) f 72780
#72780 LazyOmapStatsTest::check_one (this=0x7fffffffe460) at ./src/test/lazy-omap-stats/lazy_omap_stats_test.cc:305
305 regex_search(full_output, match, reg);
(gdb) l
300 string full_output = get_output();
301 cout << full_output << endl;
302 regex reg(
303 "\n((PG_STAT[\\s\\S]*)\n)OSD_STAT"); // Strip OSD_STAT table so we don't find matches there
304 smatch match;
305 regex_search(full_output, match, reg);
306 auto truncated_output = match[1].str();
307 cout << truncated_output << endl;
308 reg = regex(
309 "\n"
So this is the same issue with the new code.
This is most likely https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86164 and it looks like we need a similar approach to https://github.com/scylladb/scylladb/pull/13452
I'm going to look at the feasibility of moving to boost::regex, at least until this gets sorted out in libstdc++
Stand by.
Updated by Laura Flores almost 2 years ago
/a/yuriw-2024-03-22_13:09:48-rados-wip-yuri11-testing-2024-03-21-0851-reef-distro-default-smithi/7616706
Updated by Laura Flores almost 2 years ago
/a/yuriw-2024-03-20_18:33:32-rados-wip-yuri6-testing-2024-03-18-1406-squid-distro-default-smithi/7613235
Updated by Laura Flores almost 2 years ago
- Backport changed from reef to squid,reef
Updated by Brad Hubbard almost 2 years ago
Closing https://github.com/ceph/ceph/pull/55596 in favour of https://github.com/ceph/ceph/pull/56574
Updated by Brad Hubbard almost 2 years ago
- Pull request ID changed from 55596 to 56574
Updated by Laura Flores almost 2 years ago
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620621
Updated by Brad Hubbard almost 2 years ago
- Assignee changed from Nitzan Mordechai to Brad Hubbard
Taking this back.
Updated by Aishwarya Mathuria almost 2 years ago
/a/yuriw-2024-04-09_14:35:50-rados-wip-yuri5-testing-2024-03-21-0833-distro-default-smithi/7648693/
Updated by Matan Breizman almost 2 years ago
/a/yuriw-2024-04-16_23:25:35-rados-wip-yuriw-testing-20240416.150233-distro-default-smithi/7659395
/a/yuriw-2024-04-16_23:25:35-rados-wip-yuriw-testing-20240416.150233-distro-default-smithi/7659539
Updated by Radoslaw Zarzynski almost 2 years ago ยท Edited
QA Review in progress.
Updated by Matan Breizman almost 2 years ago
/a/teuthology/yuriw-2024-04-20_01:10:46-rados-wip-yuri7-testing-2024-04-18-1351-reef-distro-default-smithi/7664305
Updated by Sridhar Seshasayee almost 2 years ago
Observed on Squid:
/a/yuriw-2024-04-30_03:21:19-rados-wip-yuri4-testing-2024-04-29-0642-distro-default-smithi/7680244
/a/yuriw-2024-04-30_03:21:19-rados-wip-yuri4-testing-2024-04-29-0642-distro-default-smithi/7680389
Updated by Aishwarya Mathuria almost 2 years ago
/a/yuriw-2024-04-30_14:17:59-rados-wip-yuri5-testing-2024-04-17-1400-distro-default-smithi/7680957/
/a/yuriw-2024-04-30_14:17:59-rados-wip-yuri5-testing-2024-04-17-1400-distro-default-smithi/7681056/
Updated by Laura Flores almost 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot almost 2 years ago
- Copied to Backport #65997: reef: ceph_test_lazy_omap_stats segfault while waiting for active+clean added
Updated by Upkeep Bot almost 2 years ago
- Copied to Backport #65998: squid: ceph_test_lazy_omap_stats segfault while waiting for active+clean added
Updated by Laura Flores over 1 year ago
/a/lflores-2024-06-18_23:03:53-rados-squid-release-distro-default-smithi/7762908
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Laura Flores over 1 year ago
- Backport changed from squid,reef to squid,reef,quincy
/a/yuriw-2024-08-01_19:59:16-rados-wip-yuri5-testing-2024-08-01-0821-quincy-distro-default-smithi/7830806
Updated by Aishwarya Mathuria over 1 year ago
/a/yuriw-2024-08-07_21:27:43-rados-wip-yuri2-testing-2024-08-05-1243-reef-distro-default-smithi/7842747
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-08-09_22:43:18-rados-wip-yuri2-testing-2024-08-09-0834-reef-distro-default-smithi/7846350
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-08-26_21:22:35-rados-wip-yuri2-testing-2024-08-26-1032-quincy-distro-default-smithi/7874410
Updated by Brad Hubbard over 1 year ago
@Laura Flores do we need a quincy backport?
Updated by Brad Hubbard over 1 year ago
- Copied to Backport #67822: quincy: ceph_test_lazy_omap_stats segfault while waiting for active+clean added
Updated by Sridhar Seshasayee over 1 year ago
/a/skanta-2024-09-19_07:49:31-rados-wip-bharath11-testing-2024-09-18-1115-quincy-distro-default-smithi/7912358
Updated by Nitzan Mordechai over 1 year ago
/a/skanta-2024-09-26_23:50:23-rados-wip-bharath13-testing-2024-09-26-2103-reef-distro-default-smithi/7921085
/a/skanta-2024-09-26_23:50:23-rados-wip-bharath13-testing-2024-09-26-2103-reef-distro-default-smithi/7921256
Updated by Aishwarya Mathuria over 1 year ago
/a/yuriw-2024-10-04_14:39:24-rados-wip-yuri3-testing-2024-10-02-1405-reef-distro-default-smithi/7932715
/a/yuriw-2024-10-04_14:39:24-rados-wip-yuri3-testing-2024-10-02-1405-reef-distro-default-smithi/7932724
Updated by Kamoltat (Junior) Sirivadhna over 1 year ago
['7943419', '7943404']
yuriw-2024-10-10_14:29:43-rados-wip-yuri11-testing-2024-10-08-0753-quincy-distro-default-smithi
Updated by Aishwarya Mathuria over 1 year ago
/a/yuriw-2024-10-10_14:21:40-rados-wip-yuri8-testing-2024-10-07-1646-quincy-distro-default-smithi/7943302
Updated by Brad Hubbard over 1 year ago
$ git branch -r --contains c09ef76 ceph-ci/wip-yuri11-testing-2024-10-14-0808-quincy upstream/quincy
Only one ceph-ci branch currently has the patch so it's not surprising you are still seeing this. Try adding c09ef76 to your branch.
Updated by Shraddha Agrawal over 1 year ago
/a/skanta-2024-10-05_10:36:55-rados-wip-bharath15-testing-2024-10-05-1105-quincy-distro-default-smithi/7934682
Updated by Brad Hubbard over 1 year ago
Please stop reporting this in branches where the fix is available but you have not included it as it is confusing and I need to keep checking your branches, thanks!
Updated by Yuri Weinstein over 1 year ago
Updated by Brad Hubbard over 1 year ago
- Status changed from Pending Backport to Resolved
Updated by Upkeep Bot 8 months ago
- Merge Commit set to 05b534b5ea2cf69321b648a741283746846f60ad
- Fixed In set to v19.3.0-2127-g05b534b5ea2
- Upkeep Timestamp set to 2025-07-12T16:41:54+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-2127-g05b534b5ea2 to v19.3.0-2127-g05b534b5ea
- Upkeep Timestamp changed from 2025-07-12T16:41:54+00:00 to 2025-07-14T23:40:17+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2914
- Upkeep Timestamp changed from 2025-07-14T23:40:17+00:00 to 2025-11-01T01:34:35+00:00