rgw/s3select: limit memory usage on Parquet flow#59465
rgw/s3select: limit memory usage on Parquet flow#59465galsalomon66 merged 2 commits intoceph:mainfrom
Conversation
|
the list of modified files in the submodule does not match the files changed in the 2 relevant PRs on s3select? |
yes, the relevant PR for this issue is ceph/s3select#163 |
3082a7e to
810eb5a
Compare
30dd945 to
80238cc
Compare
| type: size | ||
| level: advanced | ||
| desc: the Maximum parquet buffer size, a limit to memory consumption for parquet reading operations. | ||
| default: 1_G |
There was a problem hiding this comment.
this seems way too high for the default. rgw is willing to process rgw_max_concurrent_requests=1024 of these requests at once. normal ops like Put/GetObject only buffer up to ~16MB each
is this much buffering really necessary to get reasonable performance?
There was a problem hiding this comment.
this seems way too high for the default. rgw is willing to process rgw_max_concurrent_requests=1024 of these requests at once. normal ops like Put/GetObject only buffer up to ~16MB each
is this much buffering really necessary to get reasonable performance?
examining this issue.
as mentioned the the tradeoff is the number of calls to s3-storage (its internal calls, s3select-engine -> RGW::range_request)
from a few tests i did there is a difference in the number of calls(obviously), i'm not sure as for the total-latencies
taking into account the parallel calls, maybe it is worth considering a "more dynamic" system
i.e. as the number of s3select-requests increases the max-buffer-size goes down.
There was a problem hiding this comment.
as mentioned the the tradeoff is the number of calls to s3-storage (its internal calls, s3select-engine -> RGW::range_request)
that calls into RGWGetObj::execute() which is streaming 4MB rados objects to satisfy the requested range. requesting larger ranges won't reduce the number of osd requests
|
(the test is upon commit= |
| op_ret = -ERR_INVALID_REQUEST; | ||
| } else { | ||
| ldout(s->cct, 10) << "S3select: complete query with success " << dendl; | ||
| {//status per amount of processed data |
There was a problem hiding this comment.
nit: please fix indentation
db916da to
9442182
Compare
|
one valgrind issue -- not related other failures -- not related |
cde7ff1 to
eda7555
Compare
|
one failure - not related to PR |
RGW option per parquet read-buffer. identation. alignment with s3select updated APIs. removing the JSON format part, it is not align with the current s3-tests (will be done on different part) move the parquet-reader-setup call location. editorial. the returned-bytes metric was missing upon parquet flow. Signed-off-by: Gal Salomon <gal.salomon@gmail.com>
76c404a to
597a702
Compare
| //initializing json processor | ||
| json_object::csv_definitions output_definition; | ||
| m_s3_json_object.set_json_query(&s3select_syntax,output_definition); | ||
| m_s3_json_object.set_json_query(&s3select_syntax, json); |
There was a problem hiding this comment.
why did you move the csv definitions. does not seem to be used before this line?
There was a problem hiding this comment.
this change is related to other PR (JSON output format. user may request a JSON format per SQL result),
that PR requires a change to the json_object::csv_definitions (output format definition).
for this PR it is not relevant (it just reduced differences between PR's)
| } | ||
| append_in_callback += it.length(); | ||
| ldout(s->cct, 10) << "S3select: part " << part_no++ << " it.length() = " << it.length() << dendl; | ||
| if ((ofs + len) > it.length()){ |
There was a problem hiding this comment.
is this related to the change in this pr?
when can this error case happen and what would be the outcome?
There was a problem hiding this comment.
this change relates to a Valgrind report(invalid read).
meaning the ofs + len parameters which are set by the callback, may be bigger than it.length()
and may cause an invalid-read.
the change verifies that no invalid-read will occur.
it happened only on teuthology.
There was a problem hiding this comment.
but what is the root cause of the issue?
also, gow is the change fixing the problem logical problem - assuming the invalid read is just a sumptom of a logical error.
There was a problem hiding this comment.
i could not re-produce that locally,
thus could not find the root cause.
There was a problem hiding this comment.
Note for myself:
the following Valgrind report (https://pad.ceph.com/p/r.5e25eac194d94ee049df4c98c3d4819a)
was produced by the following teuthology run
https://qa-proxy.ceph.com/teuthology/gsalomon-2024-07-23_15:06:49-rgw:verify-albin_json_output_format-distro-default-smithi/7814019/remote/smithi060/log/valgrind/
(it contains a stack of invalid-read upon parquet-processing)
…ifies them if necessary(invalid read) Signed-off-by: Gal Salomon <gal.salomon@gmail.com>
|
https://pulpito.ceph.com/gsalomon-2024-10-22_07:50:43-rgw-limit_mem_usage_on_parquet_flow-distro-default-smithi/ |
|
jenkins test api |
2 similar comments
|
jenkins test api |
|
jenkins test api |
…rquet_flow rgw/s3select: limit memory usage on Parquet flow Resolves : rhbz#2252403 (cherry picked from commit 20007ea)
…rquet_flow rgw/s3select: limit memory usage on Parquet flow Resolves : rhbz#2252403 (cherry picked from commit 20007ea)
a large-size Parquet object with few row-groups will increase memory consumption.
the code change set a fixed size for the reader buffer.
ceph/s3select#163
https://bugzilla.redhat.com/show_bug.cgi?id=2252403
attached are results of
heaptrack; on the processing a parquet-SQL-query (select *) with differentparquet-reader-buffer-sizethe parquet object contains 2 row-groups, 1GB each.
no significant differences in latency per different parquet-reader-buffer-size were observed.
heaptrack stack analyze -- parquet reader set to a small buffer (10MB)heaptrack stack analyze -- parquet reader set to a big buffer (100MB)another way to inspect this issue is reviewing the metric that are part of the response
the scanned-bytes show how many bytes were fetched by the system,
in the case of big-row-group, the parquet-reader-buffer-size has an impact on that metric.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e