Bug #66658
closedqa/workunits/dencoder/test-dencoder.sh: Error encountered in subprocess. Command: ['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret'
0%
Description
/a/yuriw-2024-06-20_13:41:02-rados-wip-yuri11-testing-2024-06-19-1425-distro-default-smithi/7765270/
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/pg_stat_t/fbd7ad2b3ea90c7418f64f8762e4bf57
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/SequencerPosition/785cfb3496866c599b040977c79e27ec
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/f81d88bd53fba42eef521c3ea5aa335d
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/b5d2bff1d33b15ac9d748de4506d3663
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/fe2c864935473ee22f7e3d9167711b81
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:Error encountered in subprocess. Command: ['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/cls_rgw_reshard_get_ret/eef7aa6337f7cb0f82f62cc06807b169'), 'decode', 'dump_json']
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:Return code: 1 Command:['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/cls_rgw_reshard_get_ret/eef7aa6337f7cb0f82f62cc06807b169'), 'decode', 'dump_json'] Output:
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:Error encountered in subprocess. Command: ['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/cls_rgw_reshard_get_ret/eef7aa6337f7cb0f82f62cc06807b169'), 'decode', 'encode', 'decode', 'dump_json']
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:Return code: 1 Command:['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/cls_rgw_reshard_get_ret/eef7aa6337f7cb0f82f62cc06807b169'), 'decode', 'encode', 'decode', 'dump_json'] Output:
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/fec7d28512c0c03c6f0332cea66f3c04
2024-06-21T04:00:25.105 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/fa25a4e609ea04e8dadb9e20322ff36a
2024-06-21T04:00:25.105 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/fa52e8476e88ff741b644e63360aafa2
2024-06-21T04:00:25.105 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/f7a74186b5107c1af627f7a4b00f5771
2024-06-21T04:00:25.108 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ACLGrant/e2abd25aeb0558a9138ba7114a9ca0f4
2024-06-21T04:00:25.108 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ghobject_t/ff558ab198526851482017414a73502a
2024-06-21T04:00:25.108 INFO:tasks.workunit.client.0.smithi192.stdout:FAILED 80/13421 tests.
2024-06-21T04:00:25.109 DEBUG:teuthology.orchestra.run:got remote process result: 1
2024-06-21T04:00:25.110 INFO:tasks.workunit:Stopping ['dencoder/test-dencoder.sh'] on client.0...
2024-06-21T04:00:25.110 DEBUG:teuthology.orchestra.run.smithi192:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2024-06-21T04:00:25.419 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/run_tasks.py", line 105, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/github.com_ceph_ceph-c_97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4/qa/tasks/workunit.py", line 126, in task
with parallel() as p:
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 84, in __exit__
for result in self:
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 98, in __next__
resurrect_traceback(result)
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 30, in resurrect_traceback
raise exc.exc_info[1]
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 23, in capture_traceback
return func(*args, **kwargs)
File "/home/teuthworker/src/github.com_ceph_ceph-c_97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4/qa/tasks/workunit.py", line 434, in _run_tests
remote.run(
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/remote.py", line 523, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 455, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 181, in _raise_for_status
raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed (workunit test dencoder/test-dencoder.sh) on smithi192 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/dencoder/test-dencoder.sh'
2024-06-21T04:00:25.603 ERROR:teuthology.util.sentry: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=2903b18d045c44a7b8bab3316f8512a1
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/run_tasks.py", line 105, in run_tasks
manager = run_one_task(taskname, ctx=ctx, config=config)
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/run_tasks.py", line 83, in run_one_task
return task(**kwargs)
File "/home/teuthworker/src/github.com_ceph_ceph-c_97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4/qa/tasks/workunit.py", line 126, in task
with parallel() as p:
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 84, in __exit__
for result in self:
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 98, in __next__
resurrect_traceback(result)
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 30, in resurrect_traceback
raise exc.exc_info[1]
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 23, in capture_traceback
return func(*args, **kwargs)
File "/home/teuthworker/src/github.com_ceph_ceph-c_97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4/qa/tasks/workunit.py", line 434, in _run_tests
remote.run(
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/remote.py", line 523, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 455, in run
r.wait()
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 161, in wait
self._raise_for_status()
File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 181, in _raise_for_status
raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed (workunit test dencoder/test-dencoder.sh) on smithi192 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/dencoder/test-dencoder.sh'
2024-06-21T04:00:25.605 DEBUG:teuthology.run_tasks:Unwinding manager cephadm
2024-06-21T04:00:25.616 INFO:tasks.cephadm:Teardown begin
2024-06-21T04:00:25.616 DEBUG:teuthology.orchestra.run.smithi192:> sudo rm -f /etc/ceph/ce
Updated by Kamoltat (Junior) Sirivadhna over 1 year ago
- Tags set to main-failures
Updated by Radoslaw Zarzynski over 1 year ago
- Status changed from New to In Progress
Updated by Nitzan Mordechai over 1 year ago
This is not a test failure! this is a real bug
void decode(ceph::buffer::list::const_iterator& bl) {
DECODE_START(2, bl);
decode(time, bl);
decode(tenant, bl);
decode(bucket_name, bl);
decode(bucket_id, bl);
if (struct_v < 2) {
std::string new_instance_id; // removed in v2
decode(new_instance_id, bl);
}
decode(old_num_shards, bl);
decode(new_num_shards, bl);
DECODE_FINISH(bl);
}
[root@bca3fe3043c2 /]# /usr/bin/ceph-dencoder type cls_rgw_reshard_add_op import ~/ceph-object-corpus/archive/18.2.0/objects/cls_rgw_reshard_add_op/eef7aa6337f7cb0f82f62cc06807b169 hexdump 00000000 01 01 22 00 00 00 02 01 1c 00 00 00 00 00 00 00 |..".............| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000020 00 00 00 00 00 00 00 00 |........| 00000028
still checking
Updated by Nitzan Mordechai over 1 year ago
cls_rgw_reshard_entry encode\decode was update during v18, we removed new_instance_id but we didn't mention any compat version:
void encode(ceph::buffer::list& bl) const {
ENCODE_START(2, 1, bl);
encode(time, bl);
encode(tenant, bl);
encode(bucket_name, bl);
encode(bucket_id, bl);
encode(old_num_shards, bl);
encode(new_num_shards, bl);
ENCODE_FINISH(bl);
}
void decode(ceph::buffer::list::const_iterator& bl) {
DECODE_START(2, bl);
decode(time, bl);
decode(tenant, bl);
decode(bucket_name, bl);
decode(bucket_id, bl);
if (struct_v < 2) {
std::string new_instance_id; // removed in v2
decode(new_instance_id, bl);
}
decode(old_num_shards, bl);
decode(new_num_shards, bl);
DECODE_FINISH(bl);
}
but quincy encode\decode didn't update as well, so, when quincy tries to decode\encode cls_rgw_reshard_entry it will get out of the buffer boundary and fail.
@Radoslaw Zarzynski any thoughts?
Updated by Laura Flores over 1 year ago
Note from bug scrub: Radek will respond.
Updated by Radoslaw Zarzynski over 1 year ago
First thought: have you considered an encoder that would check the feature bits of a decoder to generate fitting bytestreams?
Updated by Nitzan Mordechai over 1 year ago
Radoslaw Zarzynski wrote in #note-7:
First thought: have you considered an encoder that would check the feature bits of a decoder to generate fitting bytestreams?
I didn't suggest yet any fix, just wanted your thoughts about whether it's a real bug that may affect users immediately
Updated by Nitzan Mordechai over 1 year ago
@Casey Bodley do you mind taking a look?
our test caught this bug during encode\decode test of 18.2 encoded cls_rgw_reshard_entry and tried to decode using quincy.
one of rgw PRs removed a member from the middle of the encoded module, but the older version can't handle it and will get the end of buffer error.
btw - another commit bump up the version of encode but left the decode version as is (https://github.com/ceph/ceph/commit/9302fbb3f5416871c1978af5d45f3bf568c2c190) but this is another issue that not related.
Updated by Casey Bodley over 1 year ago
thanks Nitzan,
this is a real bug, but i expect its effect to be minor and short-lived until upgrades complete. can we just whitelist this in ceph-object-corpus? would something like https://github.com/ceph/ceph-object-corpus/pull/19 work?
Updated by Casey Bodley over 1 year ago
Nitzan Mordechai wrote in #note-9:
btw - another commit bump up the version of encode but left the decode version as is (https://github.com/ceph/ceph/commit/9302fbb3f5416871c1978af5d45f3bf568c2c190) but this is another issue that not related.
thanks for the heads-up, i opened https://github.com/ceph/ceph/pull/58399 to fix that part
Updated by Nitzan Mordechai over 1 year ago
@Casey Bodley thank you a lot for the quick response! I thought it would be a bigger issue if the client have a different osds version in the same cluster, but if you are ok with that, I'm ok with that as well.
thanks for the fix, it will work for encode\decode tests!
Updated by Nitzan Mordechai over 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 58404
Updated by Nitzan Mordechai over 1 year ago
- Related to Bug #66918: dencoder/test-dencoder.sh: dencoder tests fail when tested against quincy added
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-07-17_13:32:02-rados-wip-yuri12-testing-2024-07-16-1122-distro-default-smithi/7805530
Updated by Aishwarya Mathuria over 1 year ago
/a/yuriw-2024-07-16_01:05:51-rados-wip-yuri6-testing-2024-07-15-1335-distro-default-smithi/7803124/
Updated by Aishwarya Mathuria over 1 year ago
/a/yuriw-2024-07-17_13:35:08-rados-wip-yuri10-testing-2024-07-15-1330-distro-default-smithi/7805755/
Updated by Radoslaw Zarzynski over 1 year ago
- Status changed from Fix Under Review to Pending Backport
Merged!
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67234: squid: qa/workunits/dencoder/test-dencoder.sh: Error encountered in subprocess. Command: ['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret' added
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-07-23_19:38:12-rados-wip-yuri5-testing-2024-07-23-0804-distro-default-smithi/7814448
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-10-15_14:06:51-rados-wip-yuri8-testing-2024-10-14-1103-distro-default-smithi/7948102
Updated by Aishwarya Mathuria over 1 year ago
/a/yuriw-2024-10-13_19:06:13-rados-wip-yuri4-testing-2024-10-13-0836-distro-default-smithi/7944843
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-10-23_23:17:32-rados-wip-yuri13-testing-2024-10-23-0743-distro-default-smithi/7963675
Updated by Laura Flores over 1 year ago
- Related to Bug #69009: dencoder/test-dencoder.sh: Error encountered with cls_rgw_reshard_get_ret added
Updated by Upkeep Bot 9 months ago
- Status changed from Pending Backport to Resolved
- Upkeep Timestamp set to 2025-07-08T18:35:39+00:00
Updated by Upkeep Bot 8 months ago
- Merge Commit set to d09655fab6373ded077555f2a0d45746bc387dd9
- Fixed In set to v19.3.0-3814-gd09655fab6
- Upkeep Timestamp changed from 2025-07-08T18:35:39+00:00 to 2025-08-02T04:50:40+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2360
- Upkeep Timestamp changed from 2025-08-02T04:50:40+00:00 to 2025-11-01T01:35:59+00:00