Conversation
… on commit/abort. Signed-off-by: kchheda3 <kchheda3@bloomberg.net>
|
@kchheda3 thanks for the fix. |
@yuvalif depending on version we decide on trusting |
was thinking that we can bump the version of the urgent_data as part of 00ad83d adding a flag indicating that the queue was created with the fix in it - this would indicate that we can trust the reserved size in that queue. |
yeah i did thought about it originally, but really how do you know that someone has started using brand new notifications and they can rely on flag. |
isn't this all happening in an atomic cls call? i'm not sure where locking is necessary. if radosgws issue racing reservations on an un-upgraded queue, the first call will recalculate the size and update the flag atomically - the second call will see that flag and increment as usual |
agree that there should not be a race here. i think that the issue would be that in a mixed version cluster, old cls code writing to a new queue (or to an existing queue on which recalculation was performed) would still leak. |
… errors The urgent_data.reserved_size field was accumulating incorrect values over time due to a mismatch between what was added during reserve() and what was subtracted during commit()/abort(). This caused the reserved_size to grow unbounded, eventually hitting the queue capacity limit and returning ENOSPC errors even when the queue had plenty of actual space. solution: Add a one time self healing capability, where the reservation value is re calculated during the reserve and counter is updated with correct value. Signed-off-by: Krunal Chheda <kchheda3@bloomberg.net>
8a6008b to
7f4eaee
Compare
actually i have now updated the commit to only one time re calculate the value and persist with correct value in the queue. i originally thought it was rgw code, but then realized its cls code and writes will be atomic and serialize, so leveraging that i do the calculation first time and second callers will automatically get the correct value as @cbodley pointed. |
|
jenkins test make check |
|
just to make sure that we covered all cases:
|
|
jenkins test make check |
|
jenkins test make check |
|
Execution complete, tracker approved by @ivancich. tracker detail: https://tracker.ceph.com/issues/75124 |
|
jenkins test make check |
Fixes: https://tracker.ceph.com/issues/74713
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.