Bug #65828: radosgw process killed with "Out of memory" while executing query "select * from s3object limit 1" on a 12GB parquet file - rgw - Ceph

Actions

Copy link

Bug #65828

closed

radosgw process killed with "Out of memory" while executing query "select * from s3object limit 1" on a 12GB parquet file

Added by Gal Salomon about 2 years ago. Updated 6 months ago.

Status:

Resolved

Priority:

High

Assignee:

Gal Salomon

Target version:

% Done:

Source:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

56834

Tags (freeform):

Merge Commit:

2f02bcef5db24c8cec94a1e0084229fda2392b35

Fixed In:

v19.3.0-3238-g2f02bcef5d

Released In:

v20.2.0~2546

Upkeep Timestamp:

2025-11-01T01:18:22+00:00

Description

(coppied from https://bugzilla.redhat.com/show_bug.cgi?id=2275323)

Description of problem:
radosgw process killed with "Out of memory" while executing query "select * from s3object limit 1" on a 12GB parquet file

[cephuser@ceph-hmaheswa-reef-x220k9-node6 ~]$ time aws s3api --endpoint-url http://10.0.211.33:80 select-object-content --bucket bkt1 --key file12GBparquet --expression-type 'SQL' --input-serialization '{"Parquet": {}, "CompressionType": "NONE"}' --output-serialization '{"CSV": {}}' --expression "select * from s3object limit 1;" /dev/stdout

("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

real 0m5.769s
user 0m0.477s
sys 0m0.110s

Journalctl logs snippet on rgw node:

Out of memory: Killed process 970456 (radosgw) total-vm:7666032kB, anon-rss:2285168kB, file-rss:0kB, shmem-rss:0kB, UID:167 pgtables:7108kB oom_score_adj:0
ceph-fe41f8f0-8d0d-11ee-aee8-fa163ec880af@rgw.rgw.all.ceph-hmaheswa-reef-x220k9-node5.nkuffe.service: A process of this unit has been killed by the OOM killer.

Actual results:
radosgw process killed because of 'Out of memory" while trying to query just one row on a low end cluster.

Expected results:
query should execute fine on a low end cluster as well.

Additional info:
parquet file of 11.95 GB size is downloaded from:
https://www.kaggle.com/datasets/aaronweymouth/nyc-rideshare-raw-data?select=rideshare_data.parquet

======
from the result we can observe that free space is very less, remaining processes(like ceph-osd) also consuming significant amount of memory and I guess the radosgw which is requesting for even more memory is being killed.

so, used another rgw node ip as the endpoint-url where no other ceph daemon is running. then query executed fine and mem utilization is 84% by radosgw. you can see the top output captured below while the query is getting executed.

====
--- Additional comment from gal salomon on 2024-04-14 10:58:19 UTC ---

these findings imply that there isn't anything wrong with the radosgw behavior upon processing Parquet object.
it depends on machine sizing and workload.

this specific 12GB parquet file contains only 6 row-groups (on 365M rows!)
thus, upon `select *` (extract all columns), it "forces" the reader to load a great amount of data. ====

Actions

Copy link

Updated by Casey Bodley almost 2 years ago

Priority changed from Normal to High

Actions

Copy link

Updated by J. Eric Ivancich almost 2 years ago

Gal, Can you add a PR to this tracker? Thanks!

Actions

Copy link

Updated by Gal Salomon almost 2 years ago

Pull request ID set to 56834

Actions

Copy link

Updated by Casey Bodley almost 2 years ago

Status changed from New to Resolved

Actions

Copy link

Updated by Upkeep Bot 10 months ago

Merge Commit set to 2f02bcef5db24c8cec94a1e0084229fda2392b35
Fixed In set to v19.3.0-3238-g2f02bcef5db
Upkeep Timestamp set to 2025-07-11T11:09:02+00:00

Actions

Copy link

Updated by Upkeep Bot 10 months ago

Fixed In changed from v19.3.0-3238-g2f02bcef5db to v19.3.0-3238-g2f02bcef5d
Upkeep Timestamp changed from 2025-07-11T11:09:02+00:00 to 2025-07-14T23:09:05+00:00

Actions

Copy link

Updated by Upkeep Bot 6 months ago

Released In set to v20.2.0~2546
Upkeep Timestamp changed from 2025-07-14T23:09:05+00:00 to 2025-11-01T01:18:22+00:00

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Tags

Custom queries

Bug #65828

radosgw process killed with "Out of memory" while executing query "select * from s3object limit 1" on a 12GB parquet file

Updated by Casey Bodley almost 2 years ago

Updated by J. Eric Ivancich almost 2 years ago

Updated by Gal Salomon almost 2 years ago

Updated by Casey Bodley almost 2 years ago

Updated by Upkeep Bot 10 months ago

Updated by Upkeep Bot 10 months ago

Updated by Upkeep Bot 6 months ago