rgw/s3select: json integration: identifying new data-source and execute a JSON flow.#49411
rgw/s3select: json integration: identifying new data-source and execute a JSON flow.#49411
Conversation
src/rgw/rgw_s3select.cc
Outdated
| s3select_syntax.get_error_description().c_str(), | ||
| s3select_resource_id); | ||
| ldpp_dout(this, 10) << "s3-select query: wrong json format; {" << s3select_syntax.get_error_description() << "}" << dendl; | ||
| return -1; |
There was a problem hiding this comment.
please use posix error codes like -EINVAL
src/rgw/rgw_s3select.cc
Outdated
| if (s->obj_size == 0) { | ||
| status = run_s3select_on_json(m_sql_query.c_str(), nullptr, 0); | ||
| } else { | ||
| auto bl_len = bl.get_num_buffers(); |
There was a problem hiding this comment.
status is unchecked here, then we immediately go on to call status = run_s3select_on_json(...) below
do we need this special case for obj_size == 0 at all? it should be fine to loop over an empty bl.buffers()
There was a problem hiding this comment.
as for the obj_size == 0 i recall we needed a specific flow for CSV upon zero size object.
it will be re-check.
There was a problem hiding this comment.
statusis unchecked here, then we immediately go on to callstatus = run_s3select_on_json(...)belowdo we need this special case for
obj_size == 0at all? it should be fine to loop over an emptybl.buffers()
upon empty-object(zero size), the bl = buffer-list is empty, thus buffer-list loop does not execute, no call to s3select-function, which results with a wrong response.
if (s->obj_size == 0) { ... } returns the appropriate empty response.
| m_aws_response_handler.update_processed_size(it.length()); | ||
| status = run_s3select_on_json(m_sql_query.c_str(), &(it)[0], it.length()); | ||
| if(status<0) { | ||
| break; |
There was a problem hiding this comment.
shouldn't we return this error, instead of overwriting it with status = run_s3select_on_json(...) just below?
|
a good start with teuthology |
|
8 of 8 had passed with success |
|
1 dead |
|
1 dead |
|
1 dead |
a1825e0 to
d0b40ef
Compare
|
I don't think commit subject we intend to merge should have "WIP:" prefixed |
|
sure.
i will remove that.
…On Sun, Jan 1, 2023 at 8:34 PM Matt Benjamin ***@***.***> wrote:
I don't think commit subject we intend to merge should have "WIP:" prefixed
—
Reply to this email directly, view it on GitHub
<#49411 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFYVMYTXRD4GHMDZNZL7EA3WQHE3XANCNFSM6AAAAAAS5JA7OU>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
288eecd to
bd9a64f
Compare
7a771ef to
d5fe6d8
Compare
d5fe6d8 to
44a3446
Compare
|
the last teuthology run had failed and pointed to valgrind issues. |
1c32c2a to
8a531b5
Compare
|
7 had passed successfully |
|
all 8 jobs and s3select-tests had passed successfully. |
5c2e294 to
92b2354
Compare
|
http://pulpito.front.sepia.ceph.com/gsalomon-2023-01-29_15:19:28-rgw:verify-json_rgw_integration-distro-default-smithi/ |
fa57139 to
ed9785c
Compare
|
https://pulpito.ceph.com/gsalomon-2023-02-16_10:43:13-rgw:verify-json_rgw_integration-distro-default-smithi/ |
|
http://pulpito.front.sepia.ceph.com/gsalomon-2023-02-18_05:53:05-rgw:verify-json_rgw_integration-distro-default-smithi/ |
|
47 s3select tests had passed successfully |
76aa0ca to
506840d
Compare
|
http://pulpito.front.sepia.ceph.com/gsalomon-2023-02-26_12:32:11-rgw:verify-json_rgw_integration-distro-default-smithi/ |
dbd262b to
781ff4d
Compare
s3select submodule integrate the limit-operator into RGW. current changes is for CSV format integrate the limit-operator with CSV-flow, and JSON-flow. an update of s3select submodule a fix for the input-serialization-type selection debug functionality. fix for error handling adding the scan-range feature. i.e. enables the user to define the range of processing(text only) remove the temp variable(the s3select-layer handles the sql-result setting upon JSON flow) adding documentation. for JSON, SQL limit-operator, scan-range option. a fix for JSON test framework in s3select module investigate crash. replace len with it.length() editorial changes; non-initialized variables adding validation for offset&length. (it seems that the offset&length causes Invalid reads, in some cases) changes related to trino integration.(1) syntax issues related to trino-engine statement conbersions(s3select parser handles that) (2). (2). trino rejects(HIVE_CURSUR_ERROR) some of the s3select responses do to its size, the change is about control the size of the respond. update JSON documentation trino changes: response in paging (limitation on size), trino syntax issues. Signed-off-by: galsalomon66 <gal.salomon@gmail.com>
781ff4d to
9f20e75
Compare
|
i see that all of those failed jobs were using since rgw/verify looks clean, i think this just needs to run against the full rgw suite? once you rebase the suite-branch, the rgw suite at |
|
https://pulpito.ceph.com/gsalomon-2023-03-03_04:37:38-rgw:verify-json_rgw_integration-distro-default-smithi/ |
this is still just the rgw:verify subsuite. can you please run the rgw suite at --subset 1/3? |
|
@galsalomon66 can you please make sure that a tag gets added to the s3select repo corresponding to the commit our submodule now points to? |
|
@cbodley |
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
doc/radosgw: Redd up s3select.rst as followup to #49411 Reviewed-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com> (cherry picked from commit 78b42ee)
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com> (cherry picked from commit 78b42ee)
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com> (cherry picked from commit 78b42ee)
…0384-to-reef reef: doc/radosgw: Redd up s3select.rst as followup to #49411 Reviewed-by: Cole Mitchell <cole.mitchell.ceph@gmail.com>
integration of JSON flow
integration of SQL limit operator
scan range: it enables the user to select a range(start --> end) for SQL processing.
thus, the user has the flexibility to process a specific range within the object.
the significant use of the scan-range option is running parallel requests on the same object.
the user must "know" how to merge the return results from the parallel requests (Trino is one example). (https://trino.io/docs/current/connector/hive.html).
Trino alignments (1) upon pushing down the Trino-engine converts the SQL-statements(different from AWS), the s3select-parser handles that. (2) Trino rejects(HIVE_CURSOR_ERROR) some of s3select responds due to its size. RGW/S3select limits the size of responses.
s3select submodule
Signed-off-by: Albin Antony aantony@redhat.com
Signed-off-by: galsalomon66 gal.salomon@gmail.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows