Project

General

Profile

Actions

Bug #64832

open

valgrind UninitCondition in RGWSelectObj_ObjStore_S3::run_s3select_on_csv

Added by Casey Bodley about 2 years ago. Updated 4 months ago.

Status:
Pending Backport
Priority:
Urgent
Assignee:
Target version:
-
% Done:

0%

Source:
Backport:
squid tentacle
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v20.3.0-4084-ga46507ac4b
Released In:
Upkeep Timestamp:
2025-11-11T08:58:37+00:00

Description

from rgw/verify job: http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:49:11-rgw-wip-cbodley-testing-distro-default-smithi/7589454/teuthology.log
valgrind log: http://qa-proxy.ceph.com/teuthology/cbodley-2024-03-10_14:49:11-rgw-wip-cbodley-testing-distro-default-smithi/7589454/remote/smithi154/log/valgrind/ceph.client.0.log.gz

<error>
  <unique>0xacb11</unique>
  <tid>298</tid>
  <threadname>io_context_pool</threadname>
  <kind>UninitCondition</kind>
  <what>Conditional jump or move depends on uninitialised value(s)</what>
  <stack>
    <frame>
      <ip>0x8F4C10</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWSelectObj_ObjStore_S3::run_s3select_on_csv(char const*, char const*, unsigned long)</fn>
    </frame>
    <frame>
      <ip>0x8F76E1</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWSelectObj_ObjStore_S3::csv_processing(ceph::buffer::v15_2_0::list&amp;, long, long)</fn>
    </frame>
    <frame>
      <ip>0x7B3BD6</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWGetObj_Decompress::handle_data(ceph::buffer::v15_2_0::list&amp;, long, long)</fn>
    </frame>
    <frame>
      <ip>0x9D47E7</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>get_obj_data::flush(rgw::OwningList&lt;rgw::AioResultEntry&gt;&amp;&amp;)</fn>
    </frame>
    <frame>
      <ip>0x9E552A</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWRados::Object::Read::iterate(DoutPrefixProvider const*, long, long, RGWGetDataCB*, optional_yield)</fn>
    </frame>
    <frame>
      <ip>0x7FD455</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWGetObj::execute(optional_yield)</fn>
    </frame>
    <frame>
      <ip>0x8F9E20</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>RGWSelectObj_ObjStore_S3::execute(optional_yield)</fn>
    </frame>
    <frame>
      <ip>0x6BC4CB</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>rgw_process_authenticated(RGWHandler_REST*, RGWOp*&amp;, RGWRequest*, req_state*, optional_yield, rgw::sal::Driver*, bool)</fn>
    </frame>
    <frame>
      <ip>0x6C054D</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>process_request(RGWProcessEnv const&amp;, RGWRequest*, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, RGWRestfulIO*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;*, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt;*, int*)</fn>
    </frame>
    <frame>
      <ip>0x5E60F0</ip>
      <obj>/usr/bin/radosgw</obj>
    </frame>
    <frame>
      <ip>0x62757F</ip>
      <obj>/usr/bin/radosgw</obj>
    </frame>
    <frame>
      <ip>0x12591A6</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>make_fcontext</fn>
    </frame>
  </stack>
</error>

Files


Related issues 2 (2 open0 closed)

Copied to rgw - Backport #73786: squid: valgrind UninitCondition in RGWSelectObj_ObjStore_S3::run_s3select_on_csvNewGal SalomonActions
Copied to rgw - Backport #73787: tentacle: valgrind UninitCondition in RGWSelectObj_ObjStore_S3::run_s3select_on_csvNewGal SalomonActions
Actions #1

Updated by Casey Bodley almost 2 years ago

  • Status changed from New to Can't reproduce
Actions #2

Updated by Casey Bodley about 1 year ago

  • Status changed from Can't reproduce to New

from https://qa-proxy.ceph.com/teuthology/cbodley-2024-12-13_17:43:49-rgw-wip-65369-distro-default-smithi/8035717/teuthology.log on main

valgrind error: UninitCondition

RGWSelectObj_ObjStore_S3::run_s3select_on_csv(char const*, char const*, unsigned long)
RGWSelectObj_ObjStore_S3::csv_processing(ceph::buffer::v15_2_0::list&, long, long)
RGWGetObj_Decompress::handle_data(ceph::buffer::v15_2_0::list&, long, long)

Actions #3

Updated by J. Eric Ivancich about 1 year ago

Just wanted to make sure you aware of the status change of this one, Gal.

Actions #4

Updated by J. Eric Ivancich about 1 year ago

pinged Gal

Actions #6

Updated by Gal Salomon about 1 year ago · Edited

J. Eric Ivancich wrote in #note-5:

ping @Gal Salomon

Hi @J. Eric Ivancich
i had tried to re-produce that,i'm not getting that stack.

do you have an updated valgrind-log?

Actions #7

Updated by Casey Bodley about 1 year ago

Gal Salomon wrote in #note-6:

J. Eric Ivancich wrote in #note-5:

ping @Gal Salomon

Hi @J. Eric Ivancich
i had tried to re-produce that,i'm not getting that stack.

do you have an updated valgrind-log?

https://qa-proxy.ceph.com/teuthology/cbodley-2024-12-13_17:43:49-rgw-wip-65369-distro-default-smithi/8035717/remote/smithi082/log/valgrind/ceph.client.0.log.gz

Actions #9

Updated by J. Eric Ivancich about 1 year ago

ping @Gal Salomon Just want to make sure you saw Casey's new comment above.

Actions #10

Updated by Gal Salomon about 1 year ago

@J. Eric Ivancich
yeah, i saw that (its jibrish)

from teuthology.log i can see its same stack as before.

Thanks.

Actions #11

Updated by Casey Bodley about 1 year ago

just saw this in 2 jobs of https://pulpito.ceph.com/cbodley-2025-02-28_00:57:03-rgw-wip-70013-distro-default-smithi/

Gal Salomon wrote in #note-10:

yeah, i saw that (its jibrish)

the web server is misconfigured somehow, so most logs get double-gzipped

Actions #12

Updated by Gal Salomon 12 months ago

i did the following
-- the conditional-jump happens upon decompress flow
-- i boot RGW with compression mode (4 different options zstd,snappy,lz4,zlib)
-- i uploaded big objects,small object, zero-size object.
-- run s3select expression on these objects
-- so far, no re-production of anything close to that

it may relate to runtime-environment / build-env(?)

investigation further more.

Actions #13

Updated by Gal Salomon 11 months ago

in order to investigate the valgrind detection, a code that simulate uninitialized-value added to rgw/s3select.
not clear, why it was not detected upon teuthology-run (it is detected on a local run)

https://github.com/ceph/ceph/pull/62689/files

https://pulpito.ceph.com/gsalomon-2025-04-06_12:46:48-rgw:verify-valgrind_issue_investigation-distro-default-smithi/

checking the issue.

Actions #14

Updated by J. Eric Ivancich 10 months ago

  • Backport changed from squid to squid tentacle
Actions #15

Updated by Casey Bodley 10 months ago

again in https://qa-proxy.ceph.com/teuthology/cbodley-2025-05-19_22:28:25-rgw-wip-cbodley-testing-distro-default-smithi/8289665/teuthology.log

valgrind error: UninitCondition
RGWSelectObj_ObjStore_S3::run_s3select_on_csv(char const*, char const*, unsigned long)
RGWSelectObj_ObjStore_S3::csv_processing(ceph::buffer::v15_2_0::list&, long, long)
RGWGetObj_Decompress::handle_data(ceph::buffer::v15_2_0::list&, long, long)

Actions #17

Updated by Casey Bodley 8 months ago

also on tentacle: https://qa-proxy.ceph.com/teuthology/benhanokh-2025-07-01_16:00:03-rgw-dedup_backport-distro-default-smithi/8363950/teuthology.log

valgrind exception message: valgrind error: UninitCondition
RGWSelectObj_ObjStore_S3::run_s3select_on_csv(char const*, char const*, unsigned long)
RGWSelectObj_ObjStore_S3::csv_processing(ceph::buffer::v15_2_0::list&, long, long)
RGWGetObj_Decompress::handle_data(ceph::buffer::v15_2_0::list&, long, long)
Actions #18

Updated by J. Eric Ivancich 8 months ago

Just hit this again:

failure_reason: 'valgrind error: UninitCondition

  RGWSelectObj_ObjStore_S3::run_s3select_on_csv(char const*, char const*, unsigned
  long)

  RGWSelectObj_ObjStore_S3::csv_processing(ceph::buffer::v15_2_0::list&, long, long)

  RGWGetObj_Decompress::handle_data(ceph::buffer::v15_2_0::list&, long, long)'

found here: https://qa-proxy.ceph.com/teuthology/ivancich-2025-07-21_21:58:43-rgw-wip-eric-testing-1-distro-default-smithi/8401069/teuthology.log

Actions #19

Updated by J. Eric Ivancich 8 months ago

In an email exchange with @Gal Salomon he wrote:

this issue is "spinning", it is not appears constantly, make it difficult to fix.
(it appeared in some of my other PR)

even when I tried to make it appear (I created Uninit Condition)
it was not detected.

it may relate to compiler flag options, so I will try to re-produce that.

Actions #20

Updated by J. Eric Ivancich 5 months ago

Hey @Gal Salomon , are there any updates on this?

Actions #21

Updated by Gal Salomon 5 months ago

J. Eric Ivancich wrote in #note-20:

Hey @Gal Salomon , are there any updates on this?

Hi Eric,
yes, I have updates, I created a centos and ubuntu versions, and ran s3test on both distro's.
The valgrind did not detect the same report.
i'm not sure why this issue does not reproduce, maybe it is missing a runtime factor, such as additional tests (which exist in teuthology and not in my tests)

I will change / add tests.

Actions #22

Updated by Gal Salomon 5 months ago

the attached valgrind log, was produced upon using valgrind on RGW running the S3-tests(Centos-stream), the log contains only ("Conditional jump .." reports), it catch errors in other stack, not the s3select stack.

Actions #23

Updated by Gal Salomon 5 months ago

Gal Salomon wrote in #note-22:

the attached valgrind log, was produced upon using valgrind on RGW running the S3-tests(Centos-stream), the log contains only ("Conditional jump .." reports), it catch errors in other stack, not the s3select stack.

attaching additional valgrind report, produce on ubuntu.

Actions #27

Updated by Gal Salomon 4 months ago

@J. Eric Ivancich
the recent valgrind report (https://qa-proxy.ceph.com/teuthology/teuthology-2025-10-10_20:40:24-rgw-main-distro-default-smithi/8545757/remote/smithi016/log/valgrind/)

contain alot of new reports, such as InvalidRead , and different s3select-call-stack.

for one example the push_*::builder functions which are called upon parsing the SQL statement. not while processing the input.

the engine-code did not change for sometime, and the same with the test-code.
what is triggering these new reports? is there a change in runtime environment?

Actions #28

Updated by Gal Salomon 4 months ago

  • Pull request ID set to 66067
Actions #29

Updated by Casey Bodley 4 months ago

  • Status changed from New to Fix Under Review
Actions #30

Updated by Upkeep Bot 4 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Merge Commit set to a46507ac4b9fcfc5cee038da0129470fee0bd545
  • Fixed In set to v20.3.0-4084-ga46507ac4b
  • Upkeep Timestamp set to 2025-11-11T08:58:37+00:00
Actions #31

Updated by Upkeep Bot 4 months ago

  • Copied to Backport #73786: squid: valgrind UninitCondition in RGWSelectObj_ObjStore_S3::run_s3select_on_csv added
Actions #32

Updated by Upkeep Bot 4 months ago

  • Copied to Backport #73787: tentacle: valgrind UninitCondition in RGWSelectObj_ObjStore_S3::run_s3select_on_csv added
Actions #33

Updated by Upkeep Bot 4 months ago

  • Tags (freeform) set to backport_processed
Actions

Also available in: Atom PDF