Skip to content

[bluestore] Eliminate recursion from Onode::put()#47896

Closed
aclamk wants to merge 1 commit intoceph:mainfrom
aclamk:wip-aclamk-bs-unwind-onode-put
Closed

[bluestore] Eliminate recursion from Onode::put()#47896
aclamk wants to merge 1 commit intoceph:mainfrom
aclamk:wip-aclamk-bs-unwind-onode-put

Conversation

@aclamk
Copy link
Contributor

@aclamk aclamk commented Sep 1, 2022

Onode::put() removes self from onode_map.
This remove call causes us to call Onode::put() back again.
It requires the Onode::put() to employ clever strategies to handle this recursion.

Now we first extract self from onode_map, complete (logically) current put(),
and then let put() of extracted node run.

I think this change makes code more robust.

This work is in relation to
https://tracker.ceph.com/issues/56382
but I do not claim that it solves it.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@aclamk aclamk requested a review from a team as a code owner September 1, 2022 11:27
@aclamk aclamk force-pushed the wip-aclamk-bs-unwind-onode-put branch from b890b4d to 3865a9a Compare September 1, 2022 11:28
Onode::put() removes self from onode_map.
This remove call causes us to call Onode::put() back again.
It requires the Onode::put() to employ clever strategies to handle this recursion.

Now we first extract self from onode_map, complete (logically) current put(),
and then let put() of extracted node run.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
@aclamk aclamk force-pushed the wip-aclamk-bs-unwind-onode-put branch from 3865a9a to 3edf033 Compare September 1, 2022 11:42
@aclamk aclamk requested a review from ifed01 September 1, 2022 11:46
@aclamk
Copy link
Contributor Author

aclamk commented Sep 1, 2022

@ifed01 Do you think we could benefit from this refactor?

@aclamk
Copy link
Contributor Author

aclamk commented Sep 1, 2022

jenkins test make check

@aclamk
Copy link
Contributor Author

aclamk commented Sep 5, 2022

jenkins test make check

@ifed01
Copy link
Contributor

ifed01 commented Sep 7, 2022

IMO this patch would fail in the following scenario (another thread calls get/put after the first one has decremented nref to 1):

T1                                    T2
nref = 2, exist=false         
put() {
  --nref (=1)                     
                                       get()
                                         ++nref(=2)
                                         <...>
                                       put() {
                                         --nref(=1)
                                         lock()
                                         onode_map._extract()
                                         unlock()
                                         put() {
                                            --nref(=0)
                                            delete (this)
                                         }
                                      }
  // the following code in T1 runs against already released onode...
  lock() 
  onode_map._extract()
  unlock()
  put() {
    --nref (=0)
    delete this
  }
}

@ifed01
Copy link
Contributor

ifed01 commented Sep 7, 2022

of extrac

@ifed01 Do you think we could benefit from this refactor?

Honestly I doubt that.
I can see two parts in your PR:

  1. Postponing last onode ref decrement using _extract/hold. Which is nice but effectively just takes second put out of the lock scope. Not sure it has any benefit by itself.
  2. Eliminating put_nref. This permits a thread to release an onode while another concurrent thread is still accessing this onode from put() method. And this looks unsafe. The idea behind put_nref is to protect onode for that interim state when onode reference is decremented but put method might still use the onode. Now it's gone and I can see no replacement for that....

@github-actions
Copy link

github-actions bot commented Feb 9, 2023

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@aclamk aclamk closed this Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants