rgw/s3select: limit memory usage on Parquet flow by galsalomon66 · Pull Request #59465 · ceph/ceph

galsalomon66 · 2024-08-27T16:04:18Z

a large-size Parquet object with few row-groups will increase memory consumption.
the code change set a fixed size for the reader buffer.

ceph/s3select#163
https://bugzilla.redhat.com/show_bug.cgi?id=2252403

attached are results of heaptrack; on the processing a parquet-SQL-query (select *) with different parquet-reader-buffer-size
the parquet object contains 2 row-groups, 1GB each.
no significant differences in latency per different parquet-reader-buffer-size were observed.

heaptrack stack analyze -- parquet reader set to a small buffer (10MB)

heaptrack stack analyze -- parquet reader set to a big buffer (100MB)

another way to inspect this issue is reviewing the metric that are part of the response
the scanned-bytes show how many bytes were fetched by the system,
in the case of big-row-group, the parquet-reader-buffer-size has an impact on that metric.

<Stats>
<BytesScanned>1624670</BytesScanned>
<BytesProcessed>1624670</BytesProcessed>
<BytesReturned>2</BytesReturned>
</Stats>

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

yuvalif · 2024-08-28T18:37:21Z

the list of modified files in the submodule does not match the files changed in the 2 relevant PRs on s3select?

galsalomon66 · 2024-08-29T06:11:53Z

The list of modified files in the submodule does not match the files changed in the 2 relevant PRs on s3select?

yes,
there is a gap between upstream/s3select-last-merged-PR and current s3select-master.

the relevant PR for this issue is ceph/s3select#163

cbodley · 2024-09-03T13:56:41Z

+  type: size
+  level: advanced
+  desc: the Maximum parquet buffer size, a limit to memory consumption for parquet reading operations.
+  default: 1_G


this seems way too high for the default. rgw is willing to process rgw_max_concurrent_requests=1024 of these requests at once. normal ops like Put/GetObject only buffer up to ~16MB each

is this much buffering really necessary to get reasonable performance?

this seems way too high for the default. rgw is willing to process rgw_max_concurrent_requests=1024 of these requests at once. normal ops like Put/GetObject only buffer up to ~16MB each

is this much buffering really necessary to get reasonable performance?

examining this issue.
as mentioned the the tradeoff is the number of calls to s3-storage (its internal calls, s3select-engine -> RGW::range_request)

from a few tests i did there is a difference in the number of calls(obviously), i'm not sure as for the total-latencies

taking into account the parallel calls, maybe it is worth considering a "more dynamic" system
i.e. as the number of s3select-requests increases the max-buffer-size goes down.

as mentioned the the tradeoff is the number of calls to s3-storage (its internal calls, s3select-engine -> RGW::range_request)

that calls into RGWGetObj::execute() which is streaming 4MB rados objects to satisfy the requested range. requesting larger ranges won't reduce the number of osd requests

galsalomon66 · 2024-09-04T11:23:33Z

https://pulpito.ceph.com/gsalomon-2024-09-04_02:26:19-rgw:verify-limit_mem_usage_on_parquet_flow-distro-default-smithi/
all green

galsalomon66 · 2024-09-05T05:04:23Z

(the test is upon commit=move the parquet-reader-setup call location)

https://pulpito.ceph.com/gsalomon-2024-09-04_18:07:53-rgw:verify-limit_mem_usage_on_parquet_flow-distro-default-smithi/
all green

yuvalif · 2024-09-05T07:45:48Z

      op_ret = -ERR_INVALID_REQUEST;
    } else {
-      ldout(s->cct, 10) << "S3select: complete query with success " << dendl;
+      {//status per amount of processed data


nit: please fix indentation

galsalomon66 · 2024-09-09T10:09:46Z

https://pulpito.ceph.com/gsalomon-2024-09-09_06:57:48-rgw-limit_mem_usage_on_parquet_flow-distro-default-smithi/

one valgrind issue -- not related
https://qa-proxy.ceph.com/teuthology/gsalomon-2024-09-09_06:57:48-rgw-limit_mem_usage_on_parquet_flow-distro-default-smithi/7896212/remote/smithi104/log/valgrind/ceph.client.1.log.gz

<error>
  <unique>0xa3f1</unique>
  <tid>1</tid>
  <kind>Leak_DefinitelyLost</kind>
  <xwhat>
    <text>264 (8 direct, 256 indirect) bytes in 1 blocks are definitely lost in loss record 10 of 19</text>
    <leakedbytes>264</leakedbytes>
    <leakedblocks>1</leakedblocks>
  </xwhat>
  <stack>
    <frame>
      <ip>0x4846743</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator new[](unsigned long)</fn>
      <dir>/builddir/build/BUILD/valgrind-3.23.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>729</line>
    </frame>
    <frame>
      <ip>0x5FC0CC7</ip>
      <obj>/usr/lib64/librados.so.2.0.0</obj>
      <fn>librados::v14_2_0::Rados::aio_create_completion()</fn>
    </frame>
    <frame>
      <ip>0xC9E297</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>rgw::notify::publish_commit(rgw::sal::Object*, unsigned long, std::chrono::time_point&lt;ceph::real_clock, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt; &gt; const&amp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, std::__cxx11::basic
_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, rgw::notify::reservation_t&amp;, DoutPrefixProvider const*)</fn>
    </frame>
    <frame>
      <ip>0xA0F528</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>rgw::sal::RadosNotification::publish_commit(DoutPrefixProvider const*, unsigned long, std::chrono::time_point&lt;ceph::real_clock, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt; &gt; const&amp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&a
mp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;)</fn>

other failures -- not related

2024-09-09T07:41:06.663 INFO:teuthology.orchestra.run.smithi002.stdout:ERROR:   py: InvocationError for command /home/ubuntu/cephtest/ragweed.client.0/.tox/py/bin/python -m pip install --exists-action w .tox/.tmp/package/1/ragweed-0.0.1.zip (exited with code 1)
2024-09-09T07:41:06.675 DEBUG:teuthology.orchestra.run:got remote process result: 1
2024-09-09T07:41:06.675 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__ 
    return next(self.gen)
  File "/home/teuthworker/src/github.com_galsalomon66_ceph_94421829f7b81be633c5be5e50acc84fe1bd4f9f/qa/tasks/ragweed.py", line 261, in run_tests
    toxvenv_sh(ctx, remote, args, label="ragweed tests against rgw")
  File "/home/teuthworker/src/github.com_galsalomon66_ceph_94421829f7b81be633c5be5e50acc84fe1bd4f9f/qa/tasks/ragweed.py", line 230, in toxvenv_sh
    return remote.sh(['source', activate, run.Raw('&&')] + args, **kwargs) 
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/orchestra/remote.py", line 97, in sh
    proc = self.run(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/orchestra/remote.py", line 523, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/orchestra/run.py", line 181, in _raise_for_status
    raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed (ragweed tests against rgw) on smithi002 with status 1: "source /home/ubuntu/cephtest/tox-venv/bin/activate && cd /home/ubuntu/cephtest/ragweed.client.0 && RAGWEED_CONF=/home/ubuntu/cephtest/archive/ragweed.client.0.conf RAGWEED_STAGES=prepare,check BOTO_CONFIG=/home/ubuntu/cephtest/boto.cfg tox --sitepackages -- -v -m 'not fails_on_rgw'"

galsalomon66 · 2024-10-06T07:55:43Z

https://pulpito.ceph.com/gsalomon-2024-10-05_22:09:41-rgw-limit_mem_usage_on_parquet_flow-distro-default-smithi/

one failure - not related to PR

RGW option per parquet read-buffer. identation. alignment with s3select updated APIs. removing the JSON format part, it is not align with the current s3-tests (will be done on different part) move the parquet-reader-setup call location. editorial. the returned-bytes metric was missing upon parquet flow. Signed-off-by: Gal Salomon <gal.salomon@gmail.com>

yuvalif · 2024-10-14T12:46:19Z

  //initializing json processor
-  json_object::csv_definitions output_definition;
-  m_s3_json_object.set_json_query(&s3select_syntax,output_definition);
+  m_s3_json_object.set_json_query(&s3select_syntax, json);


why did you move the csv definitions. does not seem to be used before this line?

this change is related to other PR (JSON output format. user may request a JSON format per SQL result),
that PR requires a change to the json_object::csv_definitions (output format definition).

for this PR it is not relevant (it just reduced differences between PR's)

yuvalif · 2024-10-14T12:48:35Z

      }
      append_in_callback += it.length();
      ldout(s->cct, 10) << "S3select: part " << part_no++ << " it.length() = " << it.length() << dendl;
+      if ((ofs + len) > it.length()){


is this related to the change in this pr?
when can this error case happen and what would be the outcome?

this change relates to a Valgrind report(invalid read).
meaning the ofs + len parameters which are set by the callback, may be bigger than it.length()
and may cause an invalid-read.

the change verifies that no invalid-read will occur.

it happened only on teuthology.

but what is the root cause of the issue?
also, gow is the change fixing the problem logical problem - assuming the invalid read is just a sumptom of a logical error.

i could not re-produce that locally,
thus could not find the root cause.

Note for myself:
the following Valgrind report (https://pad.ceph.com/p/r.5e25eac194d94ee049df4c98c3d4819a)
was produced by the following teuthology run
https://qa-proxy.ceph.com/teuthology/gsalomon-2024-07-23_15:06:49-rgw:verify-albin_json_output_format-distro-default-smithi/7814019/remote/smithi060/log/valgrind/

(it contains a stack of invalid-read upon parquet-processing)

…ifies them if necessary(invalid read) Signed-off-by: Gal Salomon <gal.salomon@gmail.com>

galsalomon66 · 2024-10-22T07:49:50Z

https://pulpito.ceph.com/gsalomon-2024-10-22_04:53:23-rgw:verify-limit_mem_usage_on_parquet_flow-distro-default-smithi/
all green

galsalomon66 · 2024-10-22T10:54:46Z

https://pulpito.ceph.com/gsalomon-2024-10-22_07:50:43-rgw-limit_mem_usage_on_parquet_flow-distro-default-smithi/
2 failures not related to PR changes

galsalomon66 · 2024-10-23T00:14:31Z

jenkins test api

galsalomon66 · 2024-10-27T09:24:21Z

jenkins test api

yuvalif · 2024-10-28T07:28:47Z

jenkins test api

…rquet_flow rgw/s3select: limit memory usage on Parquet flow Resolves : rhbz#2252403 (cherry picked from commit 20007ea)

galsalomon66 requested a review from a team as a code owner August 27, 2024 16:04

galsalomon66 mentioned this pull request Aug 27, 2024

rgw/s3select : fix for error flow. add an option to disable s3select-request. #56834

Merged

14 tasks

galsalomon66 changed the title ~~rgw/s3select: limit memory usage on Parquet floe~~ rgw/s3select: limit memory usage on Parquet flow Aug 27, 2024

yuvalif mentioned this pull request Aug 29, 2024

arrow/parquet : limit the parquet-reader memory usage ceph/s3select#163

Merged

galsalomon66 force-pushed the limit_mem_usage_on_parquet_flow branch from 3082a7e to 810eb5a Compare September 2, 2024 17:22

github-actions Bot added common rgw labels Sep 2, 2024

galsalomon66 force-pushed the limit_mem_usage_on_parquet_flow branch 2 times, most recently from 30dd945 to 80238cc Compare September 3, 2024 13:52

cbodley reviewed Sep 3, 2024

View reviewed changes

yuvalif reviewed Sep 5, 2024

View reviewed changes

Comment thread src/rgw/rgw_s3select.cc

yuvalif reviewed Sep 5, 2024

View reviewed changes

galsalomon66 force-pushed the limit_mem_usage_on_parquet_flow branch from db916da to 9442182 Compare September 8, 2024 13:26

galsalomon66 force-pushed the limit_mem_usage_on_parquet_flow branch 3 times, most recently from cde7ff1 to eda7555 Compare October 5, 2024 06:23

galsalomon66 force-pushed the limit_mem_usage_on_parquet_flow branch from 76c404a to 597a702 Compare October 6, 2024 11:48

yuvalif reviewed Oct 14, 2024

View reviewed changes

remove the code-lines that checking the {ofs,len} bounderies; and mod…

2f81b4d

…ifies them if necessary(invalid read) Signed-off-by: Gal Salomon <gal.salomon@gmail.com>

yuvalif approved these changes Oct 22, 2024

View reviewed changes

galsalomon66 merged commit 20007ea into ceph:main Oct 28, 2024

mkogan1 pushed a commit to mkogan1/ceph that referenced this pull request May 14, 2025

Merge pull request ceph#59465 from galsalomon66/limit_mem_usage_on_pa…

5201b7c

…rquet_flow rgw/s3select: limit memory usage on Parquet flow Resolves : rhbz#2252403 (cherry picked from commit 20007ea)

Conversation

galsalomon66 commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

yuvalif commented Aug 28, 2024

Uh oh!

galsalomon66 commented Aug 29, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

galsalomon66 commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

galsalomon66 commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

galsalomon66 commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

galsalomon66 commented Oct 6, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

galsalomon66 commented Oct 22, 2024

Uh oh!

galsalomon66 commented Oct 22, 2024

Uh oh!

galsalomon66 commented Oct 23, 2024

Uh oh!

galsalomon66 commented Oct 27, 2024

Uh oh!

yuvalif commented Oct 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

galsalomon66 commented Aug 27, 2024 •

edited

Loading

galsalomon66 commented Sep 4, 2024 •

edited

Loading

galsalomon66 commented Sep 5, 2024 •

edited

Loading

galsalomon66 commented Sep 9, 2024 •

edited

Loading