Project

General

Profile

Actions

Bug #65828

closed

radosgw process killed with "Out of memory" while executing query "select * from s3object limit 1" on a 12GB parquet file

Added by Gal Salomon about 2 years ago. Updated 6 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Fixed In:
v19.3.0-3238-g2f02bcef5d
Released In:
v20.2.0~2546
Upkeep Timestamp:
2025-11-01T01:18:22+00:00

Description

(coppied from https://bugzilla.redhat.com/show_bug.cgi?id=2275323)

Description of problem:
radosgw process killed with "Out of memory" while executing query "select * from s3object limit 1" on a 12GB parquet file

[cephuser@ceph-hmaheswa-reef-x220k9-node6 ~]$ time aws s3api --endpoint-url http://10.0.211.33:80 select-object-content --bucket bkt1 --key file12GBparquet --expression-type 'SQL' --input-serialization '{"Parquet": {}, "CompressionType": "NONE"}' --output-serialization '{"CSV": {}}' --expression "select * from s3object limit 1;" /dev/stdout

("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

real 0m5.769s
user 0m0.477s
sys 0m0.110s

Journalctl logs snippet on rgw node:

Out of memory: Killed process 970456 (radosgw) total-vm:7666032kB, anon-rss:2285168kB, file-rss:0kB, shmem-rss:0kB, UID:167 pgtables:7108kB oom_score_adj:0
: A process of this unit has been killed by the OOM killer.

Actual results:
radosgw process killed because of 'Out of memory" while trying to query just one row on a low end cluster.

Expected results:
query should execute fine on a low end cluster as well.

Additional info:
parquet file of 11.95 GB size is downloaded from:
https://www.kaggle.com/datasets/aaronweymouth/nyc-rideshare-raw-data?select=rideshare_data.parquet

======
from the result we can observe that free space is very less, remaining processes(like ceph-osd) also consuming significant amount of memory and I guess the radosgw which is requesting for even more memory is being killed.

so, used another rgw node ip as the endpoint-url where no other ceph daemon is running. then query executed fine and mem utilization is 84% by radosgw. you can see the top output captured below while the query is getting executed.

====
--- Additional comment from gal salomon on 2024-04-14 10:58:19 UTC ---

these findings imply that there isn't anything wrong with the radosgw behavior upon processing Parquet object.
it depends on machine sizing and workload.

this specific 12GB parquet file contains only 6 row-groups (on 365M rows!)
thus, upon `select *` (extract all columns), it "forces" the reader to load a great amount of data. ====

Actions #1

Updated by Casey Bodley almost 2 years ago

  • Priority changed from Normal to High
Actions #2

Updated by J. Eric Ivancich almost 2 years ago

Gal, Can you add a PR to this tracker? Thanks!

Actions #3

Updated by Gal Salomon almost 2 years ago

  • Pull request ID set to 56834
Actions #4

Updated by Casey Bodley almost 2 years ago

  • Status changed from New to Resolved
Actions #5

Updated by Upkeep Bot 10 months ago

  • Merge Commit set to 2f02bcef5db24c8cec94a1e0084229fda2392b35
  • Fixed In set to v19.3.0-3238-g2f02bcef5db
  • Upkeep Timestamp set to 2025-07-11T11:09:02+00:00
Actions #6

Updated by Upkeep Bot 10 months ago

  • Fixed In changed from v19.3.0-3238-g2f02bcef5db to v19.3.0-3238-g2f02bcef5d
  • Upkeep Timestamp changed from 2025-07-11T11:09:02+00:00 to 2025-07-14T23:09:05+00:00
Actions #7

Updated by Upkeep Bot 6 months ago

  • Released In set to v20.2.0~2546
  • Upkeep Timestamp changed from 2025-07-14T23:09:05+00:00 to 2025-11-01T01:18:22+00:00
Actions

Also available in: Atom PDF