Skip to content

rgw/s3select: limit memory usage on Parquet flow#59465

Merged
galsalomon66 merged 2 commits intoceph:mainfrom
galsalomon66:limit_mem_usage_on_parquet_flow
Oct 28, 2024
Merged

rgw/s3select: limit memory usage on Parquet flow#59465
galsalomon66 merged 2 commits intoceph:mainfrom
galsalomon66:limit_mem_usage_on_parquet_flow

Conversation

@galsalomon66
Copy link
Copy Markdown
Contributor

@galsalomon66 galsalomon66 commented Aug 27, 2024

a large-size Parquet object with few row-groups will increase memory consumption.
the code change set a fixed size for the reader buffer.

ceph/s3select#163
https://bugzilla.redhat.com/show_bug.cgi?id=2252403

attached are results of heaptrack; on the processing a parquet-SQL-query (select *) with different parquet-reader-buffer-size
the parquet object contains 2 row-groups, 1GB each.
no significant differences in latency per different parquet-reader-buffer-size were observed.

heaptrack stack analyze -- parquet reader set to a small buffer (10MB)
Screenshot from 2024-09-04 11-52-36

heaptrack stack analyze -- parquet reader set to a big buffer (100MB)
Screenshot from 2024-09-04 11-57-02

another way to inspect this issue is reviewing the metric that are part of the response
the scanned-bytes show how many bytes were fetched by the system,
in the case of big-row-group, the parquet-reader-buffer-size has an impact on that metric.

<Stats>
<BytesScanned>1624670</BytesScanned>
<BytesProcessed>1624670</BytesProcessed>
<BytesReturned>2</BytesReturned>
</Stats>

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@galsalomon66 galsalomon66 requested a review from a team as a code owner August 27, 2024 16:04
@galsalomon66 galsalomon66 changed the title rgw/s3select: limit memory usage on Parquet floe rgw/s3select: limit memory usage on Parquet flow Aug 27, 2024
@yuvalif
Copy link
Copy Markdown
Contributor

yuvalif commented Aug 28, 2024

the list of modified files in the submodule does not match the files changed in the 2 relevant PRs on s3select?

@galsalomon66
Copy link
Copy Markdown
Contributor Author

The list of modified files in the submodule does not match the files changed in the 2 relevant PRs on s3select?

yes,
there is a gap between upstream/s3select-last-merged-PR and current s3select-master.

the relevant PR for this issue is ceph/s3select#163

@galsalomon66 galsalomon66 force-pushed the limit_mem_usage_on_parquet_flow branch from 3082a7e to 810eb5a Compare September 2, 2024 17:22
@galsalomon66 galsalomon66 force-pushed the limit_mem_usage_on_parquet_flow branch 2 times, most recently from 30dd945 to 80238cc Compare September 3, 2024 13:52
Comment thread src/common/options/rgw.yaml.in Outdated
type: size
level: advanced
desc: the Maximum parquet buffer size, a limit to memory consumption for parquet reading operations.
default: 1_G
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems way too high for the default. rgw is willing to process rgw_max_concurrent_requests=1024 of these requests at once. normal ops like Put/GetObject only buffer up to ~16MB each

is this much buffering really necessary to get reasonable performance?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems way too high for the default. rgw is willing to process rgw_max_concurrent_requests=1024 of these requests at once. normal ops like Put/GetObject only buffer up to ~16MB each

is this much buffering really necessary to get reasonable performance?

examining this issue.
as mentioned the the tradeoff is the number of calls to s3-storage (its internal calls, s3select-engine -> RGW::range_request)

from a few tests i did there is a difference in the number of calls(obviously), i'm not sure as for the total-latencies

taking into account the parallel calls, maybe it is worth considering a "more dynamic" system 
i.e. as the number of s3select-requests increases the max-buffer-size goes down.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned the the tradeoff is the number of calls to s3-storage (its internal calls, s3select-engine -> RGW::range_request)

that calls into RGWGetObj::execute() which is streaming 4MB rados objects to satisfy the requested range. requesting larger ranges won't reduce the number of osd requests

@galsalomon66
Copy link
Copy Markdown
Contributor Author

galsalomon66 commented Sep 4, 2024

@galsalomon66
Copy link
Copy Markdown
Contributor Author

galsalomon66 commented Sep 5, 2024

(the test is upon commit=move the parquet-reader-setup call location)

https://pulpito.ceph.com/gsalomon-2024-09-04_18:07:53-rgw:verify-limit_mem_usage_on_parquet_flow-distro-default-smithi/
all green

Comment thread src/rgw/rgw_s3select.cc
Comment thread src/rgw/rgw_s3select.cc Outdated
op_ret = -ERR_INVALID_REQUEST;
} else {
ldout(s->cct, 10) << "S3select: complete query with success " << dendl;
{//status per amount of processed data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please fix indentation

@galsalomon66 galsalomon66 force-pushed the limit_mem_usage_on_parquet_flow branch from db916da to 9442182 Compare September 8, 2024 13:26
@galsalomon66
Copy link
Copy Markdown
Contributor Author

galsalomon66 commented Sep 9, 2024

https://pulpito.ceph.com/gsalomon-2024-09-09_06:57:48-rgw-limit_mem_usage_on_parquet_flow-distro-default-smithi/

one valgrind issue -- not related
https://qa-proxy.ceph.com/teuthology/gsalomon-2024-09-09_06:57:48-rgw-limit_mem_usage_on_parquet_flow-distro-default-smithi/7896212/remote/smithi104/log/valgrind/ceph.client.1.log.gz

<error>
  <unique>0xa3f1</unique>
  <tid>1</tid>
  <kind>Leak_DefinitelyLost</kind>
  <xwhat>
    <text>264 (8 direct, 256 indirect) bytes in 1 blocks are definitely lost in loss record 10 of 19</text>
    <leakedbytes>264</leakedbytes>
    <leakedblocks>1</leakedblocks>
  </xwhat>
  <stack>
    <frame>
      <ip>0x4846743</ip>
      <obj>/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so</obj>
      <fn>operator new[](unsigned long)</fn>
      <dir>/builddir/build/BUILD/valgrind-3.23.0/coregrind/m_replacemalloc</dir>
      <file>vg_replace_malloc.c</file>
      <line>729</line>
    </frame>
    <frame>
      <ip>0x5FC0CC7</ip>
      <obj>/usr/lib64/librados.so.2.0.0</obj>
      <fn>librados::v14_2_0::Rados::aio_create_completion()</fn>
    </frame>
    <frame>
      <ip>0xC9E297</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>rgw::notify::publish_commit(rgw::sal::Object*, unsigned long, std::chrono::time_point&lt;ceph::real_clock, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt; &gt; const&amp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, std::__cxx11::basic
_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, rgw::notify::reservation_t&amp;, DoutPrefixProvider const*)</fn>
    </frame>
    <frame>
      <ip>0xA0F528</ip>
      <obj>/usr/bin/radosgw</obj>
      <fn>rgw::sal::RadosNotification::publish_commit(DoutPrefixProvider const*, unsigned long, std::chrono::time_point&lt;ceph::real_clock, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; &gt; &gt; const&amp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&a
mp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;)</fn>


other failures -- not related

2024-09-09T07:41:06.663 INFO:teuthology.orchestra.run.smithi002.stdout:ERROR:   py: InvocationError for command /home/ubuntu/cephtest/ragweed.client.0/.tox/py/bin/python -m pip install --exists-action w .tox/.tmp/package/1/ragweed-0.0.1.zip (exited with code 1)
2024-09-09T07:41:06.675 DEBUG:teuthology.orchestra.run:got remote process result: 1
2024-09-09T07:41:06.675 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__ 
    return next(self.gen)
  File "/home/teuthworker/src/github.com_galsalomon66_ceph_94421829f7b81be633c5be5e50acc84fe1bd4f9f/qa/tasks/ragweed.py", line 261, in run_tests
    toxvenv_sh(ctx, remote, args, label="ragweed tests against rgw")
  File "/home/teuthworker/src/github.com_galsalomon66_ceph_94421829f7b81be633c5be5e50acc84fe1bd4f9f/qa/tasks/ragweed.py", line 230, in toxvenv_sh
    return remote.sh(['source', activate, run.Raw('&&')] + args, **kwargs) 
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/orchestra/remote.py", line 97, in sh
    proc = self.run(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/orchestra/remote.py", line 523, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_teuthology_3752d3834a7b6cd13dc17dfaa6c2fd3658f3439a/teuthology/orchestra/run.py", line 181, in _raise_for_status
    raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed (ragweed tests against rgw) on smithi002 with status 1: "source /home/ubuntu/cephtest/tox-venv/bin/activate && cd /home/ubuntu/cephtest/ragweed.client.0 && RAGWEED_CONF=/home/ubuntu/cephtest/archive/ragweed.client.0.conf RAGWEED_STAGES=prepare,check BOTO_CONFIG=/home/ubuntu/cephtest/boto.cfg tox --sitepackages -- -v -m 'not fails_on_rgw'"

@galsalomon66 galsalomon66 force-pushed the limit_mem_usage_on_parquet_flow branch 3 times, most recently from cde7ff1 to eda7555 Compare October 5, 2024 06:23
@galsalomon66
Copy link
Copy Markdown
Contributor Author

RGW option per parquet read-buffer.
identation.
alignment with s3select updated APIs.

removing the JSON format part, it is not align with the current s3-tests (will be done on different part)

move the parquet-reader-setup call location.
editorial.

the returned-bytes metric was missing upon parquet flow.

Signed-off-by: Gal Salomon <gal.salomon@gmail.com>
@galsalomon66 galsalomon66 force-pushed the limit_mem_usage_on_parquet_flow branch from 76c404a to 597a702 Compare October 6, 2024 11:48
Comment thread src/rgw/rgw_s3select.cc
//initializing json processor
json_object::csv_definitions output_definition;
m_s3_json_object.set_json_query(&s3select_syntax,output_definition);
m_s3_json_object.set_json_query(&s3select_syntax, json);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you move the csv definitions. does not seem to be used before this line?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is related to other PR (JSON output format. user may request a JSON format per SQL result),
that PR requires a change to the json_object::csv_definitions (output format definition).

for this PR it is not relevant (it just reduced differences between PR's)

Comment thread src/rgw/rgw_s3select.cc Outdated
}
append_in_callback += it.length();
ldout(s->cct, 10) << "S3select: part " << part_no++ << " it.length() = " << it.length() << dendl;
if ((ofs + len) > it.length()){
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related to the change in this pr?
when can this error case happen and what would be the outcome?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change relates to a Valgrind report(invalid read).
meaning the ofs + len parameters which are set by the callback, may be bigger than it.length()
and may cause an invalid-read.

the change verifies that no invalid-read will occur.

it happened only on teuthology.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but what is the root cause of the issue?
also, gow is the change fixing the problem logical problem - assuming the invalid read is just a sumptom of a logical error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i could not re-produce that locally,
thus could not find the root cause.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for myself:
the following Valgrind report (https://pad.ceph.com/p/r.5e25eac194d94ee049df4c98c3d4819a)
was produced by the following teuthology run
https://qa-proxy.ceph.com/teuthology/gsalomon-2024-07-23_15:06:49-rgw:verify-albin_json_output_format-distro-default-smithi/7814019/remote/smithi060/log/valgrind/

(it contains a stack of invalid-read upon parquet-processing)

…ifies them if necessary(invalid read)

Signed-off-by: Gal Salomon <gal.salomon@gmail.com>
@galsalomon66
Copy link
Copy Markdown
Contributor Author

@galsalomon66
Copy link
Copy Markdown
Contributor Author

@galsalomon66
Copy link
Copy Markdown
Contributor Author

jenkins test api

2 similar comments
@galsalomon66
Copy link
Copy Markdown
Contributor Author

jenkins test api

@yuvalif
Copy link
Copy Markdown
Contributor

yuvalif commented Oct 28, 2024

jenkins test api

@galsalomon66 galsalomon66 merged commit 20007ea into ceph:main Oct 28, 2024
VallariAg pushed a commit to VallariAg/ceph that referenced this pull request Mar 10, 2025
…rquet_flow

rgw/s3select: limit memory usage on Parquet flow

Resolves : rhbz#2252403
(cherry picked from commit 20007ea)
mkogan1 pushed a commit to mkogan1/ceph that referenced this pull request May 14, 2025
…rquet_flow

rgw/s3select: limit memory usage on Parquet flow

Resolves : rhbz#2252403
(cherry picked from commit 20007ea)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants