Project

General

Profile

Actions

Bug #74713

open

rgw/notifications: Persistent notification queue full even when queue is empty

Added by Krunal Chheda about 2 months ago. Updated 26 days ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Community (dev)
Backport:
squid, tentacle
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v20.3.0-5667-g5e89aff28c
Released In:
Upkeep Timestamp:
2026-02-26T14:52:33+00:00

Description

Per the persistent-notification flow, we first reserve and then commit.

During the reservation step, the code tracks active/pending reservations using urgent_data.reserved_size which is incremented on every call to publish_reserve() here.

That same counter is decremented when the message is committed or aborted. However, the value being decremented does not match what was incremented:

On reserve, the counter is increased by the payload size plus an overhead:
const auto overhead = res_op.entries * QUEUE_ENTRY_OVERHEAD;

On commit/abort, the decrement does not include that overhead.

Over time, this mismatch causes urgent_data.reserved_size to continually grow, eventually triggering “queue full” even when there is still available capacity in the queue.


Related issues 2 (2 open0 closed)

Copied to rgw - Backport #75191: squid: rgw/notifications: Persistent notification queue full even when queue is emptyIn ProgressKrunal ChhedaActions
Copied to rgw - Backport #75192: tentacle: rgw/notifications: Persistent notification queue full even when queue is emptyIn ProgressKrunal ChhedaActions
Actions #1

Updated by Krunal Chheda about 2 months ago

  • Pull request ID set to 67169
Actions #2

Updated by Krunal Chheda about 2 months ago · Edited

Looking at the logs spread across 1 day

2026-02-02T18:25:30.503+0000 7f61577f6700 20 <cls> /root/rpmbuild/ceph-19.2.1.squid_release/src/cls/2pc_queue/cls_2pc_queue.cc:209: INFO: cls_2pc_queue_reserve: current reservations: 5065376 (bytes)

2026-02-03T19:51:02.835+0000 7f61577f6700 20 <cls> /root/rpmbuild/ceph-19.2.1.squid_release/src/cls/2pc_queue/cls_2pc_queue.cc:209: INFO: cls_2pc_queue_reserve: current reservations: 5089826 (bytes)

we see urgent_data.reserved_size keeps increasing
in ideal world that value should increase as we reserve and decrease as we commit/abort
but that value is continuously increasing highlighting the problem

the value is increasing by 10 bytes for every reservation, so once 12.8 M operations have been performed on a bucket, the urgent_data.reserved_size number will reach 128M and further writes to bucket will be blocked until the topic is deleted and re-created.

Actions #3

Updated by Yuval Lifshitz about 1 month ago

  • Tags set to notifications
Actions #4

Updated by Upkeep Bot 26 days ago

  • Status changed from New to Pending Backport
  • Merge Commit set to 5e89aff28c6570f888de14fe56a8c05bbbf3d757
  • Fixed In set to v20.3.0-5667-g5e89aff28c
  • Upkeep Timestamp set to 2026-02-26T14:52:33+00:00
Actions #5

Updated by Upkeep Bot 26 days ago

  • Copied to Backport #75191: squid: rgw/notifications: Persistent notification queue full even when queue is empty added
Actions #6

Updated by Upkeep Bot 26 days ago

  • Copied to Backport #75192: tentacle: rgw/notifications: Persistent notification queue full even when queue is empty added
Actions #7

Updated by Upkeep Bot 26 days ago

  • Tags (freeform) set to backport_processed
Actions

Also available in: Atom PDF