os/bluestore: avoid premature onode release.#43770
Conversation
|
Looks good so far, still running QA hence marking WIP for now. |
ca2827e to
b3cc049
Compare
|
jenkins test make check |
This has passed my nightly test run which had been failing in 30-60 mins before. |
|
jenkins test make check |
aclamk
left a comment
There was a problem hiding this comment.
This is some excellent fix-coding.
I spent 1 day trying to poke holes in the solution, and I could not.
Minimalistic but successful. Love that.
The only (optional) improvement would be to sync-up comments with current implementation.
| @@ -3642,11 +3643,12 @@ void BlueStore::Onode::put() { | |||
| // should be the last action since Onode can be released | |||
| // at any point after this decrement | |||
| if (need_unpin) { | |||
There was a problem hiding this comment.
Lets update comments.
Onode can no longer be released at any point.
There was a problem hiding this comment.
There was a problem hiding this comment.
@aclamk - could you please elaborate on the second comment - I can find anything related under the link....
There was a problem hiding this comment.
@aclamk - and it looks like we can now remove that trick with an additional increment for pinned entries. This patch makes that safe. I'm inclined to submit that in a different patch though.
Just asking for your thought for now
There was a problem hiding this comment.
@ifed01 The comment I meant was "// remove will also decrement nref and delete Onode". And It seems I posted broken link, or rebase moved it or something...
In this new situation, there is no way Onode could be deleted there.
"// remove will decrement nref, but Onode will not be deleted yet (put_nref > 1)"
There was a problem hiding this comment.
@ifed01
Regarding removal of artificial reference counter for pinning.
I do not believe we can to that.
BUT:
I look forward to be wrong in this case!
There was a problem hiding this comment.
@ifed01 The comment I meant was "// remove will also decrement nref and delete Onode". And It seems I posted broken link, or rebase moved it or something... In this new situation, there is no way Onode could be deleted there. "// remove will decrement nref, but Onode will not be deleted yet (put_nref > 1)"
OK, thanks. Updated.
b3cc049 to
45e66ed
Compare
This was observed when onode's removal is followed by reading and the latter causes object release before the removal is finalized. The root cause is an improper 'pinned' state assessment in Onode::get More detailed overview is: At some point Onode::get() might face the case when nref == 2 and pinned = true which means parallel incomplete put is running on the onode - ref count is decremented but pinned state is still unmodified (and even lock hasn't been acquired yet). This might finally result in two puts racing over the same onode with nref == 2 which finally results in a premature onode release: // nref =3, pinned = 1 // Thread 1 Thread 2 // o->put() o->get() // --nref(n = 2, pinned=1) // nref++ (n=3, pinned = 1) // return // ... // o->put() // --nref(n = 2) // pinned = 0, // --nref(n = 1) // ocs->_unpin_and_rm(o) -> o->put() // ... // --nref(n = 0) // release o // o->c->get_onode_cache() // FAULT! // The suggested fix is to introduce additional atomic counter tracking running put() functions. And permit onode release when both regular nref and put_nref are both equal to zero. Fixes: https://tracker.ceph.com/issues/53002 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
45e66ed to
96f0efe
Compare
|
Unrelated failures
|
This was observed when onode's removal is followed by reading
and the latter causes object release before the removal is finalized.
The root cause is an improper 'pinned' state assessment in Onode::get
More detailed overview is:
At some point Onode::get() might face the case when nref == 2 and pinned = true
which means parallel incomplete put is running on the onode - ref count is
decremented but pinned state is still unmodified (and even lock hasn't been
acquired yet).
This might finally result in two puts racing over the same onode with nref == 2
which finally results in a premature onode release:
// nref =3, pinned = 1
// Thread 1 Thread 2
// o->put() o->get()
// --nref(n = 2, pinned=1)
// nref++ (n=3, pinned = 1)
// return
// ...
// o->put()
// --nref(n = 2)
// pinned = 0,
// --nref(n = 1)
// ocs->_unpin_and_rm(o) -> o->put()
// ...
// --nref(n = 0)
// release o
// o->c->get_onode_cache()
// FAULT!
//
The suggested fix is to introduce additional atomic counter tracking
running put() functions. And permit onode release when both regular
nref and put_nref are both equal to zero.
Fixes: https://tracker.ceph.com/issues/53002
Signed-off-by: Igor Fedotov igor.fedotov@croit.io
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox