Skip to content

squid: mds/quiesce: agent: avoid a race condition with rapid db updates#56984

Merged
batrick merged 1 commit intosquidfrom
wip-lusov-squid-quiesce-agent-race
May 22, 2024
Merged

squid: mds/quiesce: agent: avoid a race condition with rapid db updates#56984
batrick merged 1 commit intosquidfrom
wip-lusov-squid-quiesce-agent-race

Conversation

@leonid-s-usov
Copy link
Contributor

@leonid-s-usov leonid-s-usov commented Apr 18, 2024

Backport

When new roots begin processing but don't yet make it into the currently tracked set, there is a window for the next update with the same roots to treat them as new.

We fix it by simplifying the agent model, getting rid of the intermediate working set. Since we never remove or add items into the current roots collection, it's safe to update the current set directly from the pending set.

The race was due to the fact that db_update() relied on the current to deduce new roots into pending, while the same new root could have already been seen and posted into the working set. This would lead to submitting the same new root twice. Without the working set such race isn't possible.

Fixes: https://tracker.ceph.com/issues/65570
Original-Issue: https://tracker.ceph.com/issues/65545
Original-PR: #56956
Signed-off-by: Leonid Usov leonid.usov@ibm.com
(cherry picked from commit 2a3faf1)

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

When new roots begin processing but don't yet make it into the
currently tracked set, there is a window for the next update
with the same roots to treat them as new.

We fix it by simplifying the agent model, getting rid of
the intermediate `working` set. Since we never remove or add
items into the current roots collection, it's safe to update the
current set directly from the pending set.

The race was due to the fact that `db_update()` relied on the `current`
to deduce new roots into `pending`, while the same new root
could have already been seen and posted into the `working` set.
This would lead to submitting the same new root twice.
Without the `working` set such race isn't possible.

Fixes: https://tracker.ceph.com/issues/65570
Original-Issue: https://tracker.ceph.com/issues/65545
Original-PR: #56956
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
(cherry picked from commit 2a3faf1)
@github-actions github-actions bot added cephfs Ceph File System tests labels Apr 18, 2024
@leonid-s-usov leonid-s-usov requested a review from a team April 18, 2024 13:20
@leonid-s-usov leonid-s-usov changed the title mds/quiesce: agent: avoid a race condition with rapid db updates squid: mds/quiesce: agent: avoid a race condition with rapid db updates Apr 18, 2024
@leonid-s-usov
Copy link
Contributor Author

jenkins test make check

@leonid-s-usov
Copy link
Contributor Author

jenkins test windows

@leonid-s-usov
Copy link
Contributor Author

jenkins test api

@leonid-s-usov leonid-s-usov requested a review from batrick May 1, 2024 15:55
@batrick batrick added this to the v19.1.0 milestone May 17, 2024
@batrick
Copy link
Member

batrick commented May 17, 2024

jenkins test api

@batrick
Copy link
Member

batrick commented May 17, 2024

jenkins test windows

@batrick
Copy link
Member

batrick commented May 17, 2024

This PR is under test in https://tracker.ceph.com/issues/66101.

@batrick batrick modified the milestones: v19.1.0, v19.1.1 May 17, 2024
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batrick batrick merged commit 36c460d into squid May 22, 2024
@batrick batrick deleted the wip-lusov-squid-quiesce-agent-race branch May 22, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants