Skip to content

mds: abort fragment/export when quiesced#57059

Merged
batrick merged 6 commits intoceph:mainfrom
batrick:i65603
May 1, 2024
Merged

mds: abort fragment/export when quiesced#57059
batrick merged 6 commits intoceph:mainfrom
batrick:i65603

Conversation

@batrick
Copy link
Member

@batrick batrick commented Apr 23, 2024

Fixes: https://tracker.ceph.com/issues/65603

Still working on a more clear commit message and maybe a dedicated test.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
  • Component impact
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@github-actions github-actions bot added the cephfs Ceph File System label Apr 23, 2024
@batrick
Copy link
Member Author

batrick commented Apr 24, 2024

I've constructed a unit test to reproduce the issue. I'll finish this up tomorrow.

@github-actions github-actions bot added the tests label Apr 24, 2024
@batrick batrick force-pushed the i65603 branch 2 times, most recently from 41ba94d to 3b57d8c Compare April 25, 2024 01:32
@batrick batrick closed this Apr 25, 2024
@batrick batrick deleted the i65603 branch April 25, 2024 01:32
@batrick batrick restored the i65603 branch April 25, 2024 01:32
@batrick batrick reopened this Apr 25, 2024
@batrick batrick requested a review from leonid-s-usov April 25, 2024 01:33
@batrick batrick changed the title mds: pass bypassfreezing to parent auth pin req mds: abort fragment/export when quiesced Apr 25, 2024
@batrick
Copy link
Member Author

batrick commented Apr 25, 2024

This PR is under test in https://tracker.ceph.com/issues/65661.

Copy link
Contributor

@leonid-s-usov leonid-s-usov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Canceling the fragmentation due to a quiesce is a good solution, and thanks for the test! Does it cover both fragmenting and exporting?

As for the is_wrlocked(quiescelock), I'd prefer to avoid exposing that implementation detail. Isn't that check equivalent to asking whether it's quiesced? Being xlocked is the only reason for a local wrlock to fail, isn't it?

@batrick batrick force-pushed the i65603 branch 2 times, most recently from ef2d139 to bc4e33c Compare April 29, 2024 15:16
@batrick
Copy link
Member Author

batrick commented Apr 29, 2024

Fixed issue observed in

/teuthology/leonidus-2024-04-26_10:54:14-fs-wip-lusov-quiescer-fixes-distro-default-smithi/7674683/teuthology.log

@batrick
Copy link
Member Author

batrick commented Apr 29, 2024

jenkins test make check

batrick added a commit to batrick/ceph that referenced this pull request Apr 29, 2024
* refs/pull/57059/head:
	mds: do not try fragmenting or exporting a quiesced directory
	mds: pass bypassfreezing to parent auth pin req
	qa: add quiesce tests during fragmentation
	qa: translate empty output from rank_tell to empty dict
@batrick
Copy link
Member Author

batrick commented Apr 29, 2024

jenkins test windows

@batrick
Copy link
Member Author

batrick commented Apr 29, 2024

jenkins test make check

batrick added 2 commits April 29, 2024 16:59
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Not all commands return JSON, like `dirfrag split`.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
batrick added 2 commits April 29, 2024 17:08
Reproduces: https://tracker.ceph.com/issues/65603
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Credit to Leonid for first noticing this.

Fixes: https://tracker.ceph.com/issues/65603
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
batrick added 2 commits April 29, 2024 17:08
This is an optimization to obviate repeated calls to acquire_locks.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
And handle inode becoming quiesced after op is created.

Fixes: https://tracker.ceph.com/issues/65603
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick
Copy link
Member Author

batrick commented Apr 29, 2024

This PR is under test in https://tracker.ceph.com/issues/65694.

@batrick
Copy link
Member Author

batrick commented Apr 30, 2024

jenkins test make check

@batrick
Copy link
Member Author

batrick commented Apr 30, 2024

jenkins test api

@batrick
Copy link
Member Author

batrick commented Apr 30, 2024

@batrick
Copy link
Member Author

batrick commented Apr 30, 2024

jenkins test api

@batrick
Copy link
Member Author

batrick commented Apr 30, 2024

jenkins test make check

@batrick
Copy link
Member Author

batrick commented Apr 30, 2024

https://tracker.ceph.com/issues/65716

Not a blocker to merge this in my opinion; it is a related but different type of failure.

@batrick
Copy link
Member Author

batrick commented Apr 30, 2024

jenkins test make check arm64

@batrick
Copy link
Member Author

batrick commented Apr 30, 2024

jenkins test api

@batrick
Copy link
Member Author

batrick commented Apr 30, 2024

jenkins test make check arm64

@batrick batrick merged commit 10acaf6 into ceph:main May 1, 2024
@batrick batrick deleted the i65603 branch May 1, 2024 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants