Project

General

Profile

Actions

Bug #66658

closed

qa/workunits/dencoder/test-dencoder.sh: Error encountered in subprocess. Command: ['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret'

Added by Kamoltat (Junior) Sirivadhna over 1 year ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-3814-gd09655fab6
Released In:
v20.2.0~2360
Upkeep Timestamp:
2025-11-01T01:35:59+00:00

Description

/a/yuriw-2024-06-20_13:41:02-rados-wip-yuri11-testing-2024-06-19-1425-distro-default-smithi/7765270/

2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/pg_stat_t/fbd7ad2b3ea90c7418f64f8762e4bf57
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/SequencerPosition/785cfb3496866c599b040977c79e27ec
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/f81d88bd53fba42eef521c3ea5aa335d
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/b5d2bff1d33b15ac9d748de4506d3663
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/fe2c864935473ee22f7e3d9167711b81
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:Error encountered in subprocess. Command: ['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/cls_rgw_reshard_get_ret/eef7aa6337f7cb0f82f62cc06807b169'), 'decode', 'dump_json']
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:Return code: 1 Command:['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/cls_rgw_reshard_get_ret/eef7aa6337f7cb0f82f62cc06807b169'), 'decode', 'dump_json'] Output:
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:Error encountered in subprocess. Command: ['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/cls_rgw_reshard_get_ret/eef7aa6337f7cb0f82f62cc06807b169'), 'decode', 'encode', 'decode', 'dump_json']
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:Return code: 1 Command:['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/cls_rgw_reshard_get_ret/eef7aa6337f7cb0f82f62cc06807b169'), 'decode', 'encode', 'decode', 'dump_json'] Output:
2024-06-21T04:00:25.104 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/fec7d28512c0c03c6f0332cea66f3c04
2024-06-21T04:00:25.105 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/fa25a4e609ea04e8dadb9e20322ff36a
2024-06-21T04:00:25.105 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/fa52e8476e88ff741b644e63360aafa2
2024-06-21T04:00:25.105 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ScrubMap::object/f7a74186b5107c1af627f7a4b00f5771
2024-06-21T04:00:25.108 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ACLGrant/e2abd25aeb0558a9138ba7114a9ca0f4
2024-06-21T04:00:25.108 INFO:tasks.workunit.client.0.smithi192.stdout:dencoder test for /home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/18.2.0/objects/ghobject_t/ff558ab198526851482017414a73502a
2024-06-21T04:00:25.108 INFO:tasks.workunit.client.0.smithi192.stdout:FAILED 80/13421 tests.
2024-06-21T04:00:25.109 DEBUG:teuthology.orchestra.run:got remote process result: 1
2024-06-21T04:00:25.110 INFO:tasks.workunit:Stopping ['dencoder/test-dencoder.sh'] on client.0...
2024-06-21T04:00:25.110 DEBUG:teuthology.orchestra.run.smithi192:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2024-06-21T04:00:25.419 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/run_tasks.py", line 105, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/run_tasks.py", line 83, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4/qa/tasks/workunit.py", line 126, in task
    with parallel() as p:
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4/qa/tasks/workunit.py", line 434, in _run_tests
    remote.run(
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/remote.py", line 523, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 181, in _raise_for_status
    raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed (workunit test dencoder/test-dencoder.sh) on smithi192 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/dencoder/test-dencoder.sh'
2024-06-21T04:00:25.603 ERROR:teuthology.util.sentry: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=2903b18d045c44a7b8bab3316f8512a1
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/run_tasks.py", line 105, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/run_tasks.py", line 83, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4/qa/tasks/workunit.py", line 126, in task
    with parallel() as p:
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4/qa/tasks/workunit.py", line 434, in _run_tests
    remote.run(
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/remote.py", line 523, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_teuthology_8e9714173de9e92c97e8ef1045d333e96b793454/teuthology/orchestra/run.py", line 181, in _raise_for_status
    raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed (workunit test dencoder/test-dencoder.sh) on smithi192 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=97d1f68a77dd6bf1e17c01ce278ad49b6eb45aa4 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/dencoder/test-dencoder.sh'
2024-06-21T04:00:25.605 DEBUG:teuthology.run_tasks:Unwinding manager cephadm
2024-06-21T04:00:25.616 INFO:tasks.cephadm:Teardown begin
2024-06-21T04:00:25.616 DEBUG:teuthology.orchestra.run.smithi192:> sudo rm -f /etc/ceph/ce

Related issues 3 (1 open2 closed)

Related to RADOS - Bug #66918: dencoder/test-dencoder.sh: dencoder tests fail when tested against quincyDuplicateNitzan Mordechai

Actions
Related to RADOS - Bug #69009: dencoder/test-dencoder.sh: Error encountered with cls_rgw_reshard_get_retPending BackportNitzan Mordechai

Actions
Copied to RADOS - Backport #67234: squid: qa/workunits/dencoder/test-dencoder.sh: Error encountered in subprocess. Command: ['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret'ResolvedNitzan MordechaiActions
Actions #1

Updated by Nitzan Mordechai over 1 year ago

  • Assignee set to Nitzan Mordechai
Actions #2

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

  • Tags set to main-failures
Actions #3

Updated by Radoslaw Zarzynski over 1 year ago

  • Status changed from New to In Progress
Actions #4

Updated by Nitzan Mordechai over 1 year ago

This is not a test failure! this is a real bug

void decode(ceph::buffer::list::const_iterator& bl) {
DECODE_START(2, bl);
decode(time, bl);
decode(tenant, bl);
decode(bucket_name, bl);
decode(bucket_id, bl);
if (struct_v < 2) {
std::string new_instance_id; // removed in v2
decode(new_instance_id, bl);
}
decode(old_num_shards, bl);
decode(new_num_shards, bl);
DECODE_FINISH(bl);
}

[root@bca3fe3043c2 /]# /usr/bin/ceph-dencoder type cls_rgw_reshard_add_op import ~/ceph-object-corpus/archive/18.2.0/objects/cls_rgw_reshard_add_op/eef7aa6337f7cb0f82f62cc06807b169 hexdump  
00000000  01 01 22 00 00 00 02 01  1c 00 00 00 00 00 00 00  |..".............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00                           |........|
00000028

still checking

Actions #5

Updated by Nitzan Mordechai over 1 year ago

cls_rgw_reshard_entry encode\decode was update during v18, we removed new_instance_id but we didn't mention any compat version:

void encode(ceph::buffer::list& bl) const {
    ENCODE_START(2, 1, bl);
    encode(time, bl);
    encode(tenant, bl);
    encode(bucket_name, bl);
    encode(bucket_id, bl);
    encode(old_num_shards, bl);
    encode(new_num_shards, bl);
    ENCODE_FINISH(bl);
  }

  void decode(ceph::buffer::list::const_iterator& bl) {
    DECODE_START(2, bl);
    decode(time, bl);
    decode(tenant, bl);
    decode(bucket_name, bl);
    decode(bucket_id, bl);
    if (struct_v < 2) {
      std::string new_instance_id; // removed in v2
      decode(new_instance_id, bl);
    }
    decode(old_num_shards, bl);
    decode(new_num_shards, bl);
    DECODE_FINISH(bl);
  }

but quincy encode\decode didn't update as well, so, when quincy tries to decode\encode cls_rgw_reshard_entry it will get out of the buffer boundary and fail.
@Radoslaw Zarzynski any thoughts?

Actions #6

Updated by Laura Flores over 1 year ago

Note from bug scrub: Radek will respond.

Actions #7

Updated by Radoslaw Zarzynski over 1 year ago

First thought: have you considered an encoder that would check the feature bits of a decoder to generate fitting bytestreams?

Actions #8

Updated by Nitzan Mordechai over 1 year ago

Radoslaw Zarzynski wrote in #note-7:

First thought: have you considered an encoder that would check the feature bits of a decoder to generate fitting bytestreams?

I didn't suggest yet any fix, just wanted your thoughts about whether it's a real bug that may affect users immediately

Actions #9

Updated by Nitzan Mordechai over 1 year ago

@Casey Bodley do you mind taking a look?
our test caught this bug during encode\decode test of 18.2 encoded cls_rgw_reshard_entry and tried to decode using quincy.
one of rgw PRs removed a member from the middle of the encoded module, but the older version can't handle it and will get the end of buffer error.

btw - another commit bump up the version of encode but left the decode version as is (https://github.com/ceph/ceph/commit/9302fbb3f5416871c1978af5d45f3bf568c2c190) but this is another issue that not related.

Actions #10

Updated by Casey Bodley over 1 year ago

thanks Nitzan,

this is a real bug, but i expect its effect to be minor and short-lived until upgrades complete. can we just whitelist this in ceph-object-corpus? would something like https://github.com/ceph/ceph-object-corpus/pull/19 work?

Actions #11

Updated by Casey Bodley over 1 year ago

Nitzan Mordechai wrote in #note-9:

btw - another commit bump up the version of encode but left the decode version as is (https://github.com/ceph/ceph/commit/9302fbb3f5416871c1978af5d45f3bf568c2c190) but this is another issue that not related.

thanks for the heads-up, i opened https://github.com/ceph/ceph/pull/58399 to fix that part

Actions #12

Updated by Nitzan Mordechai over 1 year ago

@Casey Bodley thank you a lot for the quick response! I thought it would be a bigger issue if the client have a different osds version in the same cluster, but if you are ok with that, I'm ok with that as well.

thanks for the fix, it will work for encode\decode tests!

Actions #13

Updated by Nitzan Mordechai over 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 58404
Actions #14

Updated by Nitzan Mordechai over 1 year ago

  • Backport set to squid
Actions #15

Updated by Radoslaw Zarzynski over 1 year ago

scrub note: reviewed, awaits QA.

Actions #16

Updated by Nitzan Mordechai over 1 year ago

  • Related to Bug #66918: dencoder/test-dencoder.sh: dencoder tests fail when tested against quincy added
Actions #17

Updated by Radoslaw Zarzynski over 1 year ago

scrub note: awaits QA.

Actions #18

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-07-17_13:32:02-rados-wip-yuri12-testing-2024-07-16-1122-distro-default-smithi/7805530

Actions #19

Updated by Radoslaw Zarzynski over 1 year ago

scrub note: still in QA.

Actions #20

Updated by Aishwarya Mathuria over 1 year ago

/a/yuriw-2024-07-16_01:05:51-rados-wip-yuri6-testing-2024-07-15-1335-distro-default-smithi/7803124/

Actions #21

Updated by Aishwarya Mathuria over 1 year ago

/a/yuriw-2024-07-17_13:35:08-rados-wip-yuri10-testing-2024-07-15-1330-distro-default-smithi/7805755/

Actions #22

Updated by Radoslaw Zarzynski over 1 year ago

  • Status changed from Fix Under Review to Pending Backport

Merged!

Actions #23

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #67234: squid: qa/workunits/dencoder/test-dencoder.sh: Error encountered in subprocess. Command: ['ceph-dencoder', 'type', 'cls_rgw_reshard_get_ret' added
Actions #24

Updated by Upkeep Bot over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #25

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-07-23_19:38:12-rados-wip-yuri5-testing-2024-07-23-0804-distro-default-smithi/7814448

Actions #26

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-10-15_14:06:51-rados-wip-yuri8-testing-2024-10-14-1103-distro-default-smithi/7948102

Actions #27

Updated by Aishwarya Mathuria over 1 year ago

/a/yuriw-2024-10-13_19:06:13-rados-wip-yuri4-testing-2024-10-13-0836-distro-default-smithi/7944843

Actions #28

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-10-23_23:17:32-rados-wip-yuri13-testing-2024-10-23-0743-distro-default-smithi/7963675

Actions #29

Updated by Laura Flores over 1 year ago

  • Related to Bug #69009: dencoder/test-dencoder.sh: Error encountered with cls_rgw_reshard_get_ret added
Actions #30

Updated by Upkeep Bot 9 months ago

  • Status changed from Pending Backport to Resolved
  • Upkeep Timestamp set to 2025-07-08T18:35:39+00:00
Actions #31

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to d09655fab6373ded077555f2a0d45746bc387dd9
  • Fixed In set to v19.3.0-3814-gd09655fab6
  • Upkeep Timestamp changed from 2025-07-08T18:35:39+00:00 to 2025-08-02T04:50:40+00:00
Actions #32

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2360
  • Upkeep Timestamp changed from 2025-08-02T04:50:40+00:00 to 2025-11-01T01:35:59+00:00
Actions

Also available in: Atom PDF